Parse
Extract structural layout and text from complex documents into clean Markdown.
Standard PDF parsers often scramble or drop tables, lists, and headings. Our Parse module performs high-fidelity, layout-aware extraction to output clean Markdown that preserves the original document's structural integrity.
┌───────────────────┐ ┌──────────────────────┐ ┌────────────────────┐
│ Raw Document │ │ Layout Analyzer │ │ Clean Markdown │
│ (PDF, Word, etc) │ ────► │ (Table/Header/Flow) │ ────► │ (Structure intact)│
└───────────────────┘ └──────────────────────┘ └────────────────────┘- Headers are preserved as logical hierarchy nodes (
#,##,###). - Complex tables are formatted into native Markdown tables.
- Non-destructive layouts prevent contextual dilution during chunking.
- Reading order is reconstructed to ensure text flow matches human intent.
🏗️ Configuration
The Parse module can be configured using a parserConfig object or by referencing a pre-saved parserId.
Supported Providers
| Provider | Description |
|---|---|
axelered | Our proprietary OCR and layout engine. Best for scanned docs and complex tables. |
mistral | Integration with Mistral's layout-aware OCR. |
markitdown | Fast, rule-based layout extraction for structured digital files (DOCX, XLSX). |
Parameters (axelered provider)
| Parameter | Type | Default | Description |
|---|---|---|---|
maxPixels | int | 11,289,600 | The maximum resolution for processing (approx. 4K). |
jpegQuality | int | 95 | Quality of the intermediate page renders used for analysis. |
🛠️ Parse Operations
Parsing is typically performed as part of a stateless process run. For a detailed technical reference of every field and parameter, see the API Process Reference.
Execute a Parse Run
To trigger parsing, submit a process request to your workspace. In a stateless run, you can upload files directly or provide URLs.
curl -X POST "https://api.axelered.com/v1/w/{workspace_id}/process" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"config": {
"parserConfig": {
"provider": "axelered"
}
},
"files": [
{
"url": "https://example.com/document.pdf"
}
]
}'The output will be available as a rendered Markdown file via the parsingUrl returned in the task documents list once the run reaches a completed state.
Read & List Runs
To track the progress of your parsing tasks or retrieve historical results, use the following endpoints:
- List Process Runs: Retrieve a paginated list of all process runs within your workspace.
- Read Process Run: Fetch the current status and result links for a specific
run_id. - Read Document Parsing: Directly access the parsed Markdown content for a specific document.