Skip to main content
Glossary

Extraction

Pulling structured fields (dates, amounts, parties, references) out of an unstructured document — usually via AI.

Extraction

Extraction is the act of pulling structured fields out of an unstructured document. Given an invoice PDF, extraction produces: invoice number, issue date, due date, vendor name, KRA PIN, line items, totals, tax components.

In Papyrus, extraction runs automatically on every document the AI has classified. The extraction profile is type-specific:

  • Invoice extraction: invoice fields
  • Contract extraction: parties, dates, value, key clauses
  • Employee record extraction: name, ID, dates, role
  • KRA receipt extraction: CU number, amounts, dates

Accuracy depends on document structure. Well-formed digital documents extract at 95%+; poor scans drop to 70-85%. The user can manually correct any extracted field; corrections feed into the next training cycle.

Custom fields (tenant-defined) can also be extracted once the tenant has labelled ~50 examples.

See also

Rejoining the server...

Rejoin failed... trying again in seconds.

Failed to rejoin.
Please retry or reload the page.

The session has been paused by the server.

Failed to resume the session.
Please retry or reload the page.