Schema-driven extraction: tell the AI what you need, not how to find it
Traditional document extraction works by defining rules: "look for the text 'Invoice Number' and grab the value to the right." This works until the layout changes, the label is different, or the document is in a new language. Rules are brittle by nature.
The schema approach
Schema-driven extraction flips the model. Instead of telling the system where to look, you describe what you want. You define output columns — each with a name, a description, and optional instructions — and the AI figures out how to fill them from any document.
columns:
- name: Vendor
description: The company that issued the invoice
- name: Total Amount
description: The final amount due, including tax
format: number
- name: Line Items
type: subtable
columns:
- name: Description
- name: Quantity
- name: Unit PriceWhy this is better
- Works across formats — no templates to maintain
- Handles new vendors or document types without reconfiguration
- Plain language instructions, no code required
- Supports nested structures (subtables) out of the box
From schema to structured data
Once your schema is defined, every document uploaded to that workspace is automatically extracted against it. The output is a clean table — one row per document, with every value carrying a confidence score and a link to its source.
The schema is the single source of truth for what your data should look like. Change it, and the system adapts. No rules to rewrite, no templates to update.