Extraction

Extraction is the act of pulling structured fields out of an unstructured document. Given an invoice PDF, extraction produces: invoice number, issue date, due date, vendor name, KRA PIN, line items, totals, tax components.

In Papyrus, extraction runs automatically on every document the AI has classified. The extraction profile is type-specific:

Invoice extraction: invoice fields
Contract extraction: parties, dates, value, key clauses
Employee record extraction: name, ID, dates, role
KRA receipt extraction: CU number, amounts, dates

Accuracy depends on document structure. Well-formed digital documents extract at 95%+; poor scans drop to 70-85%. The user can manually correct any extracted field; corrections feed into the next training cycle.

Custom fields (tenant-defined) can also be extracted once the tenant has labelled ~50 examples.