The problem with traditional RAG
Traditional RAG pipelines follow the same pattern: chunk the document, embed each chunk into a vector, store in a vector database, retrieve by cosine similarity, feed to the model. It's elegant in theory. In practice, it has three fatal flaws.
First, chunk size is a dead end. Small chunks retrieve precisely but lack context. Large chunks have context but retrieve poorly. No fixed size wins on both dimensions. Second, cosine similarity finds semantically similar text — not text that answers the question. Those aren't the same thing. Third, the retrieval step is pure math. The model has zero say in why something is retrieved.
A different approach
ClawIndex skips all of that. Instead of embedding chunks, we build a structured section index of the full document — headers, hierarchy, and enough context for a model to reason about what belongs where. Instead of cosine similarity, the model reads the index and decides what's relevant. The retrieval step is the reasoning step — one pass, not two.
We're calling this category Reasoning-Native Retrieval (RNR). The retriever isn't a lookup table. It's a reasoner.
Benchmark results
We tested both systems on the HotpotQA distractor validation set — 20 questions requiring multi-hop reasoning across multiple Wikipedia articles. FAISS missed one (the David Beckham / Manchester United multi-hop question). ClawIndex got all 20, with zero hallucinations.
| System | Hit Rate | Hallucinations | Avg Latency |
|---|---|---|---|
| ClawIndex (RNR) | 100% (20/20) | 0 | ~4.8s |
| FAISS | 95% (19/20) | N/A | ~62ms |
The tradeoff
ClawIndex runs at ~4.8s per query. FAISS runs in milliseconds. That's a real difference and we're not pretending otherwise. But for async pipelines, batch document processing, compliance workflows, or any use case where a wrong answer costs more than a slow one — that's the right tradeoff.
Real-time chat is the one use case where FAISS wins on latency. Every other use case is a conversation.
No embeddings. No vector DB. No cosine similarity. No chunking. Just a model and a structured index. We think this changes something.
What's next
A 100-question stress test is underway. After that: temporal awareness (RNR + staleness scoring), confidence propagation, and a multi-agent memory layer where the reasoner queries across multiple indexes simultaneously. Results will be posted here.