The Mobile Capture Story: From Phone Camera to Audited Record
Field officers used to fill paper forms and email photos to head office. The mobile capture pipeline turns a phone into an enterprise document scanner — with provenance.
The Mobile Capture Story: From Phone Camera to Audited Record
The most common mobile use case in our customer base isn't approval-on-the-go. It's capture — a field officer taking a photo of a document and having it become an indexed, classified, audited record.
This post is about how that pipeline actually works.
The naïve approach (don't do this)
Phone camera → Email attachment → Inbox → Manual file
What's wrong:
- No metadata captured (when? where? by whom?)
- Photos are huge, often poorly framed
- Email loses chain of custody
- Manual filing introduces inconsistency
- Nothing is searchable
We see customers operating this way in 2026. It can be replaced in under a week.
The Papyrus pipeline
App captures → Auto-deskew + enhance → Upload with metadata →
Server-side OCR → Classification → Extraction → Filed + indexed
Each stage adds value:
1. Capture
The app uses the device's native camera. We don't try to be clever about exposure — phones are good enough. We do:
- Auto-detect document edges in the viewfinder
- Show a green outline when the document is in good focus
- Capture at the device's full resolution (typically 12 MP+)
- Allow multi-page combine before upload
2. Local enhancement
Before upload:
- Deskew (rotate so the page is straight)
- Perspective correction (so the document is rectangular even when the phone was tilted)
- Contrast normalisation (so OCR has a chance)
This runs on the device. Battery cost: ~3% per 10 documents.
3. Upload with metadata
The upload carries:
- The processed PDF (typically 200-500 KB per page, not 8 MB raw)
- GPS coordinates (with user consent)
- Timestamp (device clock + server-side recorded receipt time)
- User identity (signed in user)
- Device fingerprint (model + OS)
All of this becomes searchable metadata.
4. Server-side OCR
Tesseract 5 runs on the server. We tuned it for the typical phone-captured Kenyan document (mixed English/Swahili, sometimes handwritten margin annotations, sometimes faint print).
5. Classification + extraction
Same as web-uploaded documents.
6. Filing
Based on classification + GPS + the user's role, the document files itself into the right folder. Field officers don't choose; the system does.
The provenance story
When an auditor asks "is this beneficiary record genuine?", we can show:
- The photo was taken at GPS coordinates matching the village
- The timestamp matches the field officer's logged shift
- The device fingerprint matches the issued tablet
- The OCR confidence and classification match similar genuine records
- The audit log shows no post-upload tampering
That's hard to fake with a paper trail.
What we learned
Three things from running this pipeline at scale:
- Photo quality varies wildly. Some phones are better than others; some users hold the phone steadier than others. We need to handle the worst cases gracefully — fallback to “ask the user to retake” when OCR confidence is below a threshold.
- Offline matters. Field officers regularly work where there's no signal. The app queues, syncs later. We've learned to make the queue visible — agents need to know “I have 7 unsent items” before they walk home.
- GPS is sometimes wrong. Indoor GPS, urban canyon, plain old GPS drift. We capture accuracy along with coordinates and let downstream consumers decide what's good enough.
What's next
We're investigating:
- On-device OCR for low-connectivity scenarios (right now, OCR is server-side)
- Real-time classification preview ("looks like a vaccination card — confirm?")
- Audio transcription for verbal field notes
- Better handwriting OCR (Swahili HTR is still imperfect)