Stateless API Lifecycle
Learn how to perform immediate batch document workflows without persistence.
The system offers a powerful Stateless Processing Engine (POST /w/{workspace_id}/process).
This API is designed specifically for stateless batch document workflows: you stream raw documents in, specify parsing/AI criteria, and fetch structured outputs (HTML/Markdown schemas, JSON properties, summaries, taxonomic classes, or edited files) without permanently adding files into the system.
⚙️ The Stateless Runtime Engine
When you trigger a process run, files undergo secure on-the-fly pipelines. Because processing high-volume documents (like 500-page scanned manuals) is intensive, the platform accommodates both immediate, synchronous delivery (for light tasks) and unbounded asynchronous orchestration (with active polling).
┌────────────────────────────────────────────────────────┐
│ Stateless Process Run │
│ │
│ ┌────────────────┐ ┌────────────────┐ ┌──────────┐ │
│ │ OCR & Parse │ │ AI Extraction │ │ Summaries│ │
│ │ (HTML/Markdown)│ │ (JSON Schema) │ │ (Short) │ │
│ └───────┬────────┘ └───────┬────────┘ └────┬─────┘ │
└───────────┼──────────────────┼───────────────┼─────────┘
▼ ▼ ▼
Files ──────────► Pipelines ─────► Final Output (Stream)- OCR, Extraction & Analysis: Documents are broken down physically, reading complex formats like table structures and inline bold titles.
- AI Analysis Runs: Handlers use LLM systems to categorize, summarize, or extract custom properties in parallel.
- S3 Stream Results: Output structures (Markdown, HTML summaries, extraction JSON objects) are safely compiled into your secure Workspace storage bucket.
- Enforced TTL (Time To Live): To maintain the stateless integrity of your processor, all outputs are permanently deleted after 24 hours.
⏱️ Triggering Sync vs. Async Runs
Our stateless processor uses a hybrid status delivery mechanism governed by the wait query parameter:
Endpoint: POST /v1/w/{workspaceId}/process
Query Parameters:
-
wait(number, default:0): Maximum time in seconds to wait for the run to reach a terminal state. Set to a positive value (e.g.60) for immediate response. -
idempotency_key(string, optional): A client-provided unique key to prevent duplicate processing tasks. -
Immediate Sync Mode (
wait = 60): If task pipelines complete before yourwaittimeout (e.g. within 60 seconds), the API returns a standard200 OKhousing completed status links and outputs directly. -
Deferred Async Mode (
wait = 0or timeout exceeded): The API returns202 Acceptedimmediately, providing a uniquerun_id. Your software can either pollGET /w/{workspace_id}/process/{run_id}to retrieve active status updates or listen for a Webhook notification (process.completed) when the run reaches a terminal state.
POST /v1/w/{workspaceId}/process?wait=60
│
▼
┌────────────────────┐
│ Reached terminal │
│ state within wait?│
└─────────┬──────────┘
│
┌────────┴────────┐
▼ ▼
┌────────────┐ ┌───────────────────────────┐
│ 200 OK │ │ 202 Accepted │
│ (sync) │ │ Full run object with │
│ Completed │ │ status, id, files, │
│ results │ │ config, outcome: null │
│ with │ │ │
│ outcome │ │ Poll until terminal: │
└────────────┘ │ GET .../process/{runId} │
│ (supports ?wait=N) │
└───────────────────────────┘📦 Multi-File Batching (Staged Runs)
By default, files can be passed as standard multipart arrays. If you are uploading large batches of documents across variable connections, you can initiate a Staged Run:
- Draft Stage: Call
POST /w/{workspaceId}/process?staged=truewith your pipeline configurations. The endpoint returns anidlehandle. - Parallel Individual Chunking: Upload each document individually in parallel to
POST /w/{workspaceId}/process/{runId}/files. - Final Submission: Trigger execution instantly once your batch upload completes using
POST /w/{workspaceId}/process/{runId}/submit.
🛡️ Safe Execution & Task Idempotency
Because processing pipelines utilize expensive model operations, developers can supply a custom query parameter: idempotency_key.
Our runtime hashes this key (SHA-256) and prevents replica task runs within the Workspace bounds. If a matching active or resolved run is captured, the server instantly serves the active run metrics or returns a 409 Conflict to prevent double-billing.
🧩 Available Modules
Each run can combine any of these modules — pass the corresponding config blocks in your request body and the engine executes them in parallel.
Parse
Transform complex documents into high-fidelity Markdown while preserving tables and headers.
Extract
Pull precise, schema-validated JSON objects from unstructured files using AI-driven extraction.
Classify
Automatically categorize documents into custom business taxonomies based on their content.
Document Editing
Automate document modifications and form filling using natural language instructions.
Summarize
Generate structured summaries and key-point highlights at configurable depths.