Blog
Guide·5 min read

Schema-driven extraction: tell the AI what you need, not how to find it

Traditional document extraction works by defining rules: "look for the text 'Invoice Number' and grab the value to the right." This works until the layout changes, the label is different, or the document is in a new language. Rules are brittle by nature.

The schema approach

Schema-driven extraction flips the model. Instead of telling the system where to look, you describe what you want. You define output columns — each with a name, a description, and optional instructions — and the AI figures out how to fill them from any document.

yaml
columns:
  - name: Vendor
    description: The company that issued the invoice
  - name: Total Amount
    description: The final amount due, including tax
    format: number
  - name: Line Items
    type: subtable
    columns:
      - name: Description
      - name: Quantity
      - name: Unit Price

Why this is better

  • Works across formats — no templates to maintain
  • Handles new vendors or document types without reconfiguration
  • Plain language instructions, no code required
  • Supports nested structures (subtables) out of the box

From schema to structured data

Once your schema is defined, every document uploaded to that workspace is automatically extracted against it. The output is a clean table — one row per document, with every value carrying a confidence score and a link to its source.

The schema is the single source of truth for what your data should look like. Change it, and the system adapts. No rules to rewrite, no templates to update.

Frequently asked questions

Ready to try Orkom?

Start with free credits. Upload your documents and see structured, traceable data in seconds.

Blog | Orkom