Connectors (Crawlers)

Automatically import documents from external sources into your collections using S3, Web, Google Drive, or FTP/SFTP crawlers.

Beyond direct uploads, documents can be automatically fetched from external sources using Connectors (also called Crawlers). Connectors link your collection to remote file systems and websites, polling for new or updated files on a schedule you define.

 ┌──────────────┐    ┌──────────────────┐    ┌──────────────────┐
 │  External    │───►│   Crawler        │───►│   Collection     │
 │  Source      │    │   Worker         │    │   Documents      │
 │              │    │                  │    │                  │
 │  • S3 Bucket │    │  1. List source  │    │  Ingested &      │
 │  • Web Page  │    │  2. Download new │    │  indexed for     │
 │  • Google Dr │    │  3. Upload to    │    │  search/chat     │
 │  • FTP/SFTP  │    │     collection   │    │                  │
 └──────────────┘    └──────────────────┘    └──────────────────┘

Every file discovered by a connector goes through the same ingestion pipeline (parse → chunk → embed) as a manually uploaded document. Once documents are imported via connectors, they appear in your collection like any uploaded document.

📂 Supported Source Types

Type	Source	Authentication	Description
`S3`	S3-compatible bucket (AWS, MinIO)	Access key / secret in URL	Sync files from a bucket prefix.
`WEB`	Website URL	None / Basic / Bearer	Crawl and index web pages recursively.
`GOOGLE_DRIVE`	Google Drive folder	OAuth 2.0	Watch a shared folder for new files.
`FTP`	FTP server	Username / password	Download files from a remote directory.
`SFTP`	SFTP (SSH) server	Username / password	Securely download files over SSH.

🛠️ Connector Management

Connectors are managed at the collection level. For a detailed technical reference of every field and parameter, see the API Connector Reference.

Create a Connector

Connectors link your collection to external data sources. You can define multiple types of sources including S3, Web, and Google Drive.

# Example: Creating an S3 Connector
curl -X POST "https://api.axelered.com/v1/w/{workspace_id}/col/{collection_id}/crawls" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "crawlType": "S3",
    "url": "s3://admin:password@minio.local:9000/my-bucket/data",
    "cronSchedule": "0 */6 * * *"
  }'

Read, List & Update

To manage your existing connectors and their sync schedules, use the following specialized endpoints:

List Connectors: Retrieve all active connectors within a collection.
Read Connector: Fetch the configuration, authentication status, and last run metadata.
Update Connector: Rotate credentials or modify the sync schedule.

Delete a Connector

Deleting a connector immediately halts its sync schedule. Documents already ingested will remain in the collection.

curl -X DELETE "https://api.axelered.com/v1/w/{workspace_id}/col/{collection_id}/crawls/{crawl_id}" \
  -H "Authorization: Bearer YOUR_API_KEY"

⚙️ Operations & Scheduling

Manual Trigger

Trigger a crawl run immediately, regardless of the cron schedule:

curl -X POST "https://api.axelered.com/v1/w/{workspace_id}/col/{collection_id}/crawls/{crawl_id}/start" \
  -H "Authorization: Bearer YOUR_API_KEY"

Pause / Resume

Pausing stops scheduled runs from starting. Currently processing items will complete.

curl -X POST "https://api.axelered.com/v1/w/{workspace_id}/col/{collection_id}/crawls/{crawl_id}/pause"
curl -X POST "https://api.axelered.com/v1/w/{workspace_id}/col/{collection_id}/crawls/{crawl_id}/resume"

Scheduling with Cron

Connectors support standard 5-field cron expressions (* * * * *).

Example	Schedule
`0 /6 * *`	Every 6 hours
`0 0 * * *`	Daily at midnight
`/30 * * *`	Every 30 minutes

On this page