Extract
Schema-driven AI extraction for precise structured data.
The Extract module pulls nested, precise properties out of chaotic, flat files. By providing a standard JSON Schema, you instruct the AI engine to locate, format, and return a matching JSON object, effectively turning unstructured documents into structured data records.
┌───────────────┐ ┌─────────────────────────┐
│ Raw Document │ ┌───────────────────┐ │ Extracted JSON Output │
│ (Invoice PDF) │ ───► │ AI Engine │ ───► │ { │
│ │ │ + User Schema │ │ "totalAmount": 450, │
│ │ └───────────────────┘ │ "invoiceNumber": ... │
└───────────────┘ │ } │
└─────────────────────────┘- Guaranteed Schema Adherence: The output always matches your provided JSON Schema.
- Complex Data Extraction: Easily extract nested arrays (like invoice line items) or deep hierarchical objects.
- Contextual Instructions: Append custom instructions to guide the AI's logic for specific fields.
🏗️ Configuration
The Extract module is defined within the extractionConfig block.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
jsonSchema | object | Yes | A valid JSON Schema (Draft 7+) defining the desired output structure. |
llmId | UUID | No | Reference to a pre-saved LLM configuration ID to use for extraction. |
llmConfig | object | No | Inline LLM configuration (overrides llmId or workspace defaults). |
systemPromptAppend | string | No | Extra instructions for the AI (e.g., "Only extract values in USD"). |
🛠️ Extraction Operations
Extraction is typically performed as part of a stateless process run. For a detailed technical reference of every field and parameter, see the API Process Reference.
Execute an Extraction Run
To extract structured data, submit a process request to your workspace with an extractionConfig.
curl -X POST "https://api.axelered.com/v1/w/{workspace_id}/process" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"config": {
"extractionConfig": {
"jsonSchema": {
"type": "object",
"properties": {
"invoiceNumber": { "type": "string" },
"totalAmount": { "type": "number" }
},
"required": ["invoiceNumber", "totalAmount"]
}
}
},
"files": [
{
"url": "https://example.com/invoice.pdf"
}
]
}'The structured result will be available via the extractionUrl returned in the task documents list once the run reaches a completed state.
Read & List Results
To retrieve extracted data across one or multiple documents, use the following endpoints:
- List Extractions: Retrieve all extracted JSON objects for a complete run.
- Read Document Extraction: Directly access the extraction result for a specific document.