AI in Document Management: A Practical Guide

The AI conversation in document management has become noisy. Vendors promise magic; practitioners want answers. This guide is for the practitioner.

The four things AI is actually good at

1. Classification

Given a document, AI predicts what kind of document it is. Modern transformer models achieve 90-95% accuracy on common business document types when trained on representative data. This is mature technology and reliable in production.

2. Extraction

Given a document with known structure (invoice, contract, employee record), AI can pull specific fields: invoice number, KRA PIN, contract value, start date. Accuracy is high when the document format is consistent; degrades sharply on novel formats or poor scans.

3. Semantic search and Q&A (RAG)

Given a corpus and a question, AI can find relevant documents (semantic search) and synthesise an answer with citations (RAG / Retrieval-Augmented Generation). Quality depends heavily on retrieval — if the right chunks aren't retrieved, the answer is wrong.

4. Summarisation

Given a long document, AI can produce a usable shorter version. Quality varies; humans should review summaries before acting on them for high-stakes decisions.

The four things AI is not good at (yet)

1. Multi-step reasoning over many documents

Asking “Why did our supplier defaults increase in Q2?” requires the AI to plan, retrieve, hypothesise, validate. Current models attempt this but produce inconsistent results. Don't rely on it for strategic decisions.

2. Forecasting and prediction

AI can summarise past contracts; it cannot reliably predict which contracts will be disputed. Treat predictions as suggestions, not decisions.

3. Anything that requires recent knowledge it wasn't trained on

The Copilot knows your documents. It doesn't know about events that happened last week unless those events are documented in your tenant.

4. Trust-critical work without human review

AI-generated approval decisions, AI-generated legal opinions, AI-issued payments — these should never run without human review. Even at 99% accuracy, the 1% catastrophic failures are unacceptable.

The cost shape

AI operations cost real money. The three tiers Papyrus uses:

Tier	Use Case	Cost per op
Tier 1 — Edge (ONNX, mobile)	Classification on device	~0
Tier 2 — Server (ML.NET)	Document-type classification on upload	sub-cent
Tier 3 — Cloud LLM	Q&A, summarisation, complex extraction	cents per call

Most operations are Tier 1 or 2. Tier 3 is reserved for high-value interactions where the cost is justified by the user time saved.

The accuracy paradox

Counterintuitively, higher accuracy can be more dangerous than medium accuracy. At 95% accuracy, users start trusting the AI; at 99% accuracy, they trust it implicitly. The 1% or 5% errors then slip through unchecked.

The fix: keep human review in the loop even when accuracy is high. Surface AI confidence scores; route low-confidence cases for explicit review.

Where to start with AI deployment

The most boring use case is the best place to start: classification + extraction on supplier invoices. Why?

High volume (gives AI enough signal to learn)
Clear schema (invoice fields are well-defined)
Immediate value (Finance team feels the relief)
Recoverable errors (a misclassified invoice doesn't sink the company)

Once that's working, expand to contracts, then HR records, then customer correspondence.

What to measure

For any AI feature you deploy:

Accuracy (against a labelled test set; refresh quarterly)
Time-to-result (median latency)
Override rate (how often humans correct AI output — high = AI not well-calibrated)
Adoption (% of eligible documents/queries actually using AI)
Cost per operation (especially for Tier 3 cloud calls)

These four numbers tell you whether the AI is delivering value.

What to govern

Per the AI Governance for Document Management guide:

Model version and training date logged for every AI output
Audit trail of human corrections for retraining
Periodic accuracy benchmarks (quarterly is standard)
Drift detection (alerts when accuracy degrades materially)
Explainability layer for high-stakes decisions

Conclusion

AI in document management is not magic. It's a set of well-understood tools with well-understood limits. The customers who get the most value treat it as automation — useful, productive, but always under human supervision.

The customers who get burned treat it as oracle. Don't be that customer.