Skip to main content
Blog

RAG vs Keyword: Picking the Right Search for Your Org

Both have their place. The right choice depends on the question your users are actually asking — and most orgs are wrong about what their users are asking.

RAG vs Keyword: Picking the Right Search for Your Org

Every conversation we have about “AI in document management” eventually arrives at: keyword search vs semantic / RAG. The framing is wrong. They're complements, not competitors. The interesting question is what proportion of your users' questions are best served by each.

What keyword search does well

  • Exact terms (invoice numbers, KRA PINs, contract references)
  • Named entities (people, organisations)
  • Documents with technical vocabulary the user has already memorised
  • Cases where the user already knows the document exists and is looking for it

What semantic / RAG does well

  • Conceptual questions (“policies about working from home”)
  • Cross-document synthesis ("what's our total exposure with this counterparty?")
  • Cases where the user doesn't know the right vocabulary
  • New users navigating an unfamiliar corpus
  • Multi-language queries

What users actually ask

We instrumented searches across a sample of tenants and tagged each by category. The breakdown:

Category % of queries
Exact-term lookup (invoice #, contract ref) 31%
Conceptual / “find me about” 28%
Cross-document aggregation 14%
Named entity lookup 13%
Date-range filter 8%
Other 6%

So:

  • ~44% are well-served by keyword (exact-term + named entity)
  • ~42% are well-served by semantic (conceptual + cross-document)
  • ~14% are filters (date / type) that aren't really “search”

This is almost evenly split. Any system that's good at only one half is failing half your users.

What Papyrus does

We run both pipelines on every query and fuse the results. The ranking function weights:

  • Keyword match score (BM25)
  • Semantic similarity score (vector dot product)
  • Filter constraints (mandatory)
  • Recency
  • User's access permissions (mandatory; filters before ranking)

The user sees a single result list. They don't pick “keyword” or “semantic”; they just type.

The Copilot is different

The Copilot is RAG-only. It uses semantic retrieval to gather context, then an LLM to synthesise an answer. It's not a search interface; it's a question-answering interface.

If you want to find a document, use search. If you want an answer, use the Copilot.

What we're watching

  • The percentage of queries that result in a click-through. Currently ~71%; we'd like to push it higher.
  • The percentage of queries followed by another query within 30 seconds (a signal that the first didn't satisfy). Currently ~23%; lower is better.
  • Distribution of clicks across rank positions. If everything users want is at rank 1, ranking is doing its job; if clicks are spread across ranks 1-5, we have ranking work to do.

Closing

Search is a solved problem at the technology level. It's an unsolved problem at the user understanding level. The next decade of search progress isn't about better embeddings; it's about better understanding of what users mean when they type.

Rejoining the server...

Rejoin failed... trying again in seconds.

Failed to rejoin.
Please retry or reload the page.

The session has been paused by the server.

Failed to resume the session.
Please retry or reload the page.