Extraction
Pulling structured fields (dates, amounts, parties, references) out of an unstructured document — usually via AI.
Extraction
Extraction is the act of pulling structured fields out of an unstructured document. Given an invoice PDF, extraction produces: invoice number, issue date, due date, vendor name, KRA PIN, line items, totals, tax components.
In Papyrus, extraction runs automatically on every document the AI has classified. The extraction profile is type-specific:
- Invoice extraction: invoice fields
- Contract extraction: parties, dates, value, key clauses
- Employee record extraction: name, ID, dates, role
- KRA receipt extraction: CU number, amounts, dates
Accuracy depends on document structure. Well-formed digital documents extract at 95%+; poor scans drop to 70-85%. The user can manually correct any extracted field; corrections feed into the next training cycle.
Custom fields (tenant-defined) can also be extracted once the tenant has labelled ~50 examples.