Stateless API Lifecycle

The system offers a powerful Stateless Processing Engine (POST /w/{workspace_id}/process).

This API is designed specifically for stateless batch document workflows: you stream raw documents in, specify parsing/AI criteria, and fetch structured outputs (HTML/Markdown schemas, JSON properties, summaries, taxonomic classes, or edited files) without permanently adding files into the system.

⚙️ The Stateless Runtime Engine

When you trigger a process run, files undergo secure on-the-fly pipelines. Because processing high-volume documents (like 500-page scanned manuals) is intensive, the platform accommodates both immediate, synchronous delivery (for light tasks) and unbounded asynchronous orchestration (with active polling).

 ┌────────────────────────────────────────────────────────┐
 │                   Stateless Process Run                │
 │                                                        │
 │   ┌────────────────┐ ┌────────────────┐ ┌──────────┐   │
 │   │ OCR & Parse    │ │ AI Extraction  │ │ Summaries│   │
 │   │ (HTML/Markdown)│ │ (JSON Schema)  │ │ (Short)  │   │
 │   └───────┬────────┘ └───────┬────────┘ └────┬─────┘   │
 └───────────┼──────────────────┼───────────────┼─────────┘
             ▼                  ▼               ▼
           Files ──────────► Pipelines ─────► Final Output (Stream)

OCR, Extraction & Analysis: Documents are broken down physically, reading complex formats like table structures and inline bold titles.
AI Analysis Runs: Handlers use LLM systems to categorize, summarize, or extract custom properties in parallel.
S3 Stream Results: Output structures (Markdown, HTML summaries, extraction JSON objects) are safely compiled into your secure Workspace storage bucket.
Enforced TTL (Time To Live): To maintain the stateless integrity of your processor, all outputs are permanently deleted after 24 hours.

⏱️ Triggering Sync vs. Async Runs

Our stateless processor uses a hybrid status delivery mechanism governed by the wait query parameter:

Endpoint: `POST /v1/w/{workspaceId}/process`

Query Parameters:

wait (number, default: 0): Maximum time in seconds to wait for the run to reach a terminal state. Set to a positive value (e.g. 60) for immediate response.
idempotency_key (string, optional): A client-provided unique key to prevent duplicate processing tasks.
Immediate Sync Mode (wait = 60): If task pipelines complete before your wait timeout (e.g. within 60 seconds), the API returns a standard 200 OK housing completed status links and outputs directly.
Deferred Async Mode (wait = 0 or timeout exceeded): The API returns 202 Accepted immediately, providing a unique run_id. Your software can either poll GET /w/{workspace_id}/process/{run_id} to retrieve active status updates or listen for a Webhook notification (process.completed) when the run reaches a terminal state.

              POST /v1/w/{workspaceId}/process?wait=60
                           │
                           ▼
                  ┌────────────────────┐
                  │  Reached terminal  │
                  │  state within wait?│
                  └─────────┬──────────┘
                            │
                   ┌────────┴────────┐
                   ▼                 ▼
            ┌────────────┐    ┌───────────────────────────┐
            │  200 OK    │    │      202 Accepted         │
            │  (sync)    │    │   Full run object with    │
            │  Completed │    │   status, id, files,      │
            │  results   │    │   config, outcome: null   │
            │  with      │    │                           │
            │  outcome   │    │  Poll until terminal:     │
            └────────────┘    │  GET .../process/{runId}  │
                              │  (supports ?wait=N)       │
                              └───────────────────────────┘

📦 Multi-File Batching (Staged Runs)

By default, files can be passed as standard multipart arrays. If you are uploading large batches of documents across variable connections, you can initiate a Staged Run:

Draft Stage: Call POST /w/{workspaceId}/process?staged=true with your pipeline configurations. The endpoint returns an idle handle.
Parallel Individual Chunking: Upload each document individually in parallel to POST /w/{workspaceId}/process/{runId}/files.
Final Submission: Trigger execution instantly once your batch upload completes using POST /w/{workspaceId}/process/{runId}/submit.

🛡️ Safe Execution & Task Idempotency

Because processing pipelines utilize expensive model operations, developers can supply a custom query parameter: idempotency_key.

Our runtime hashes this key (SHA-256) and prevents replica task runs within the Workspace bounds. If a matching active or resolved run is captured, the server instantly serves the active run metrics or returns a 409 Conflict to prevent double-billing.

⚙️ The Stateless Runtime Engine

⏱️ Triggering Sync vs. Async Runs

Endpoint: `POST /v1/w/{workspaceId}/process`

📦 Multi-File Batching (Staged Runs)

🛡️ Safe Execution & Task Idempotency

🧩 Available Modules

Parse

Extract

Classify

Document Editing

Summarize

On this page

Stateless API Lifecycle

⚙️ The Stateless Runtime Engine

⏱️ Triggering Sync vs. Async Runs

Endpoint: POST /v1/w/{workspaceId}/process

📦 Multi-File Batching (Staged Runs)

🛡️ Safe Execution & Task Idempotency

🧩 Available Modules

Parse

Extract

Classify

Document Editing

Summarize

On this page

Endpoint: `POST /v1/w/{workspaceId}/process`