AI in Document Management: A Practical Guide
What AI can do in document management today, what it can't, where the risks are, and how to deploy it without the science-fiction expectations.
AI in Document Management: A Practical Guide
The AI conversation in document management has become noisy. Vendors promise magic; practitioners want answers. This guide is for the practitioner.
The four things AI is actually good at
1. Classification
Given a document, AI predicts what kind of document it is. Modern transformer models achieve 90-95% accuracy on common business document types when trained on representative data. This is mature technology and reliable in production.
2. Extraction
Given a document with known structure (invoice, contract, employee record), AI can pull specific fields: invoice number, KRA PIN, contract value, start date. Accuracy is high when the document format is consistent; degrades sharply on novel formats or poor scans.
3. Semantic search and Q&A (RAG)
Given a corpus and a question, AI can find relevant documents (semantic search) and synthesise an answer with citations (RAG / Retrieval-Augmented Generation). Quality depends heavily on retrieval — if the right chunks aren't retrieved, the answer is wrong.
4. Summarisation
Given a long document, AI can produce a usable shorter version. Quality varies; humans should review summaries before acting on them for high-stakes decisions.
The four things AI is not good at (yet)
1. Multi-step reasoning over many documents
Asking “Why did our supplier defaults increase in Q2?” requires the AI to plan, retrieve, hypothesise, validate. Current models attempt this but produce inconsistent results. Don't rely on it for strategic decisions.
2. Forecasting and prediction
AI can summarise past contracts; it cannot reliably predict which contracts will be disputed. Treat predictions as suggestions, not decisions.
3. Anything that requires recent knowledge it wasn't trained on
The Copilot knows your documents. It doesn't know about events that happened last week unless those events are documented in your tenant.
4. Trust-critical work without human review
AI-generated approval decisions, AI-generated legal opinions, AI-issued payments — these should never run without human review. Even at 99% accuracy, the 1% catastrophic failures are unacceptable.
The cost shape
AI operations cost real money. The three tiers Papyrus uses:
| Tier | Use Case | Cost per op |
|---|---|---|
| Tier 1 — Edge (ONNX, mobile) | Classification on device | ~0 |
| Tier 2 — Server (ML.NET) | Document-type classification on upload | sub-cent |
| Tier 3 — Cloud LLM | Q&A, summarisation, complex extraction | cents per call |
Most operations are Tier 1 or 2. Tier 3 is reserved for high-value interactions where the cost is justified by the user time saved.
The accuracy paradox
Counterintuitively, higher accuracy can be more dangerous than medium accuracy. At 95% accuracy, users start trusting the AI; at 99% accuracy, they trust it implicitly. The 1% or 5% errors then slip through unchecked.
The fix: keep human review in the loop even when accuracy is high. Surface AI confidence scores; route low-confidence cases for explicit review.
Where to start with AI deployment
The most boring use case is the best place to start: classification + extraction on supplier invoices. Why?
- High volume (gives AI enough signal to learn)
- Clear schema (invoice fields are well-defined)
- Immediate value (Finance team feels the relief)
- Recoverable errors (a misclassified invoice doesn't sink the company)
Once that's working, expand to contracts, then HR records, then customer correspondence.
What to measure
For any AI feature you deploy:
- Accuracy (against a labelled test set; refresh quarterly)
- Time-to-result (median latency)
- Override rate (how often humans correct AI output — high = AI not well-calibrated)
- Adoption (% of eligible documents/queries actually using AI)
- Cost per operation (especially for Tier 3 cloud calls)
These four numbers tell you whether the AI is delivering value.
What to govern
Per the AI Governance for Document Management guide:
- Model version and training date logged for every AI output
- Audit trail of human corrections for retraining
- Periodic accuracy benchmarks (quarterly is standard)
- Drift detection (alerts when accuracy degrades materially)
- Explainability layer for high-stakes decisions
Conclusion
AI in document management is not magic. It's a set of well-understood tools with well-understood limits. The customers who get the most value treat it as automation — useful, productive, but always under human supervision.
The customers who get burned treat it as oracle. Don't be that customer.