Parse

Standard PDF parsers often scramble or drop tables, lists, and headings. Our Parse module performs high-fidelity, layout-aware extraction to output clean Markdown that preserves the original document's structural integrity.

 ┌───────────────────┐       ┌──────────────────────┐       ┌────────────────────┐
 │  Raw Document     │       │  Layout Analyzer     │       │  Clean Markdown    │
 │  (PDF, Word, etc) │ ────► │  (Table/Header/Flow) │ ────► │  (Structure intact)│
 └───────────────────┘       └──────────────────────┘       └────────────────────┘

Headers are preserved as logical hierarchy nodes (#, ##, ###).
Complex tables are formatted into native Markdown tables.
Non-destructive layouts prevent contextual dilution during chunking.
Reading order is reconstructed to ensure text flow matches human intent.

🏗️ Configuration

The Parse module can be configured using a parserConfig object or by referencing a pre-saved parserId.

Supported Providers

Provider	Description
`axelered`	Our proprietary OCR and layout engine. Best for scanned docs and complex tables.
`mistral`	Integration with Mistral's layout-aware OCR.
`markitdown`	Fast, rule-based layout extraction for structured digital files (DOCX, XLSX).

Parameters (`axelered` provider)

Parameter	Type	Default	Description
`maxPixels`	`int`	`11,289,600`	The maximum resolution for processing (approx. 4K).
`jpegQuality`	`int`	`95`	Quality of the intermediate page renders used for analysis.

🛠️ Parse Operations

Parsing is typically performed as part of a stateless process run. For a detailed technical reference of every field and parameter, see the API Process Reference.

Execute a Parse Run

To trigger parsing, submit a process request to your workspace. In a stateless run, you can upload files directly or provide URLs.

curl -X POST "https://api.axelered.com/v1/w/{workspace_id}/process" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "config": {
      "parserConfig": {
        "provider": "axelered"
      }
    },
    "files": [
      {
        "url": "https://example.com/document.pdf"
      }
    ]
  }'

The output will be available as a rendered Markdown file via the parsingUrl returned in the task documents list once the run reaches a completed state.

Read & List Runs

To track the progress of your parsing tasks or retrieve historical results, use the following endpoints:

List Process Runs: Retrieve a paginated list of all process runs within your workspace.
Read Process Run: Fetch the current status and result links for a specific run_id.
Read Document Parsing: Directly access the parsed Markdown content for a specific document.