RAG vs Keyword: Picking the Right Search for Your Org

Every conversation we have about “AI in document management” eventually arrives at: keyword search vs semantic / RAG. The framing is wrong. They're complements, not competitors. The interesting question is what proportion of your users' questions are best served by each.

What keyword search does well

Exact terms (invoice numbers, KRA PINs, contract references)
Named entities (people, organisations)
Documents with technical vocabulary the user has already memorised
Cases where the user already knows the document exists and is looking for it

What semantic / RAG does well

Conceptual questions (“policies about working from home”)
Cross-document synthesis ("what's our total exposure with this counterparty?")
Cases where the user doesn't know the right vocabulary
New users navigating an unfamiliar corpus
Multi-language queries

What users actually ask

We instrumented searches across a sample of tenants and tagged each by category. The breakdown:

Category	% of queries
Exact-term lookup (invoice #, contract ref)	31%
Conceptual / “find me about”	28%
Cross-document aggregation	14%
Named entity lookup	13%
Date-range filter	8%
Other	6%

So:

~44% are well-served by keyword (exact-term + named entity)
~42% are well-served by semantic (conceptual + cross-document)
~14% are filters (date / type) that aren't really “search”

This is almost evenly split. Any system that's good at only one half is failing half your users.

What Papyrus does

We run both pipelines on every query and fuse the results. The ranking function weights:

Keyword match score (BM25)
Semantic similarity score (vector dot product)
Filter constraints (mandatory)
Recency
User's access permissions (mandatory; filters before ranking)

The user sees a single result list. They don't pick “keyword” or “semantic”; they just type.

The Copilot is different

The Copilot is RAG-only. It uses semantic retrieval to gather context, then an LLM to synthesise an answer. It's not a search interface; it's a question-answering interface.

If you want to find a document, use search. If you want an answer, use the Copilot.

What we're watching

The percentage of queries that result in a click-through. Currently ~71%; we'd like to push it higher.
The percentage of queries followed by another query within 30 seconds (a signal that the first didn't satisfy). Currently ~23%; lower is better.
Distribution of clicks across rank positions. If everything users want is at rank 1, ranking is doing its job; if clicks are spread across ranks 1-5, we have ranking work to do.

Closing

Search is a solved problem at the technology level. It's an unsolved problem at the user understanding level. The next decade of search progress isn't about better embeddings; it's about better understanding of what users mean when they type.