The Mobile Capture Story: From Phone Camera to Audited Record

The most common mobile use case in our customer base isn't approval-on-the-go. It's capture — a field officer taking a photo of a document and having it become an indexed, classified, audited record.

This post is about how that pipeline actually works.

The naïve approach (don't do this)

Phone camera → Email attachment → Inbox → Manual file

What's wrong:

No metadata captured (when? where? by whom?)
Photos are huge, often poorly framed
Email loses chain of custody
Manual filing introduces inconsistency
Nothing is searchable

We see customers operating this way in 2026. It can be replaced in under a week.

The Papyrus pipeline

App captures → Auto-deskew + enhance → Upload with metadata →
Server-side OCR → Classification → Extraction → Filed + indexed

Each stage adds value:

1. Capture

The app uses the device's native camera. We don't try to be clever about exposure — phones are good enough. We do:

Auto-detect document edges in the viewfinder
Show a green outline when the document is in good focus
Capture at the device's full resolution (typically 12 MP+)
Allow multi-page combine before upload

2. Local enhancement

Before upload:

Deskew (rotate so the page is straight)
Perspective correction (so the document is rectangular even when the phone was tilted)
Contrast normalisation (so OCR has a chance)

This runs on the device. Battery cost: ~3% per 10 documents.

3. Upload with metadata

The upload carries:

The processed PDF (typically 200-500 KB per page, not 8 MB raw)
GPS coordinates (with user consent)
Timestamp (device clock + server-side recorded receipt time)
User identity (signed in user)
Device fingerprint (model + OS)

All of this becomes searchable metadata.

4. Server-side OCR

Tesseract 5 runs on the server. We tuned it for the typical phone-captured Kenyan document (mixed English/Swahili, sometimes handwritten margin annotations, sometimes faint print).

5. Classification + extraction

Same as web-uploaded documents.

6. Filing

Based on classification + GPS + the user's role, the document files itself into the right folder. Field officers don't choose; the system does.

The provenance story

When an auditor asks "is this beneficiary record genuine?", we can show:

The photo was taken at GPS coordinates matching the village
The timestamp matches the field officer's logged shift
The device fingerprint matches the issued tablet
The OCR confidence and classification match similar genuine records
The audit log shows no post-upload tampering

That's hard to fake with a paper trail.

What we learned

Three things from running this pipeline at scale:

Photo quality varies wildly. Some phones are better than others; some users hold the phone steadier than others. We need to handle the worst cases gracefully — fallback to “ask the user to retake” when OCR confidence is below a threshold.
Offline matters. Field officers regularly work where there's no signal. The app queues, syncs later. We've learned to make the queue visible — agents need to know “I have 7 unsent items” before they walk home.
GPS is sometimes wrong. Indoor GPS, urban canyon, plain old GPS drift. We capture accuracy along with coordinates and let downstream consumers decide what's good enough.

What's next

We're investigating:

On-device OCR for low-connectivity scenarios (right now, OCR is server-side)
Real-time classification preview ("looks like a vaccination card — confirm?")
Audio transcription for verbal field notes
Better handwriting OCR (Swahili HTR is still imperfect)