Connectors (Crawlers)
Automatically import documents from external sources into your collections using S3, Web, Google Drive, or FTP/SFTP crawlers.
Beyond direct uploads, documents can be automatically fetched from external sources using Connectors (also called Crawlers). Connectors link your collection to remote file systems and websites, polling for new or updated files on a schedule you define.
┌──────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ External │───►│ Crawler │───►│ Collection │
│ Source │ │ Worker │ │ Documents │
│ │ │ │ │ │
│ • S3 Bucket │ │ 1. List source │ │ Ingested & │
│ • Web Page │ │ 2. Download new │ │ indexed for │
│ • Google Dr │ │ 3. Upload to │ │ search/chat │
│ • FTP/SFTP │ │ collection │ │ │
└──────────────┘ └──────────────────┘ └──────────────────┘Every file discovered by a connector goes through the same ingestion pipeline (parse → chunk → embed) as a manually uploaded document. Once documents are imported via connectors, they appear in your collection like any uploaded document.
📂 Supported Source Types
| Type | Source | Authentication | Description |
|---|---|---|---|
S3 | S3-compatible bucket (AWS, MinIO) | Access key / secret in URL | Sync files from a bucket prefix. |
WEB | Website URL | None / Basic / Bearer | Crawl and index web pages recursively. |
GOOGLE_DRIVE | Google Drive folder | OAuth 2.0 | Watch a shared folder for new files. |
FTP | FTP server | Username / password | Download files from a remote directory. |
SFTP | SFTP (SSH) server | Username / password | Securely download files over SSH. |
🛠️ Connector Management
Connectors are managed at the collection level. For a detailed technical reference of every field and parameter, see the API Connector Reference.
Create a Connector
Connectors link your collection to external data sources. You can define multiple types of sources including S3, Web, and Google Drive.
# Example: Creating an S3 Connector
curl -X POST "https://api.axelered.com/v1/w/{workspace_id}/col/{collection_id}/crawls" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"crawlType": "S3",
"url": "s3://admin:password@minio.local:9000/my-bucket/data",
"cronSchedule": "0 */6 * * *"
}'Read, List & Update
To manage your existing connectors and their sync schedules, use the following specialized endpoints:
- List Connectors: Retrieve all active connectors within a collection.
- Read Connector: Fetch the configuration, authentication status, and last run metadata.
- Update Connector: Rotate credentials or modify the sync schedule.
Delete a Connector
Deleting a connector immediately halts its sync schedule. Documents already ingested will remain in the collection.
curl -X DELETE "https://api.axelered.com/v1/w/{workspace_id}/col/{collection_id}/crawls/{crawl_id}" \
-H "Authorization: Bearer YOUR_API_KEY"⚙️ Operations & Scheduling
Manual Trigger
Trigger a crawl run immediately, regardless of the cron schedule:
curl -X POST "https://api.axelered.com/v1/w/{workspace_id}/col/{collection_id}/crawls/{crawl_id}/start" \
-H "Authorization: Bearer YOUR_API_KEY"Pause / Resume
Pausing stops scheduled runs from starting. Currently processing items will complete.
curl -X POST "https://api.axelered.com/v1/w/{workspace_id}/col/{collection_id}/crawls/{crawl_id}/pause"
curl -X POST "https://api.axelered.com/v1/w/{workspace_id}/col/{collection_id}/crawls/{crawl_id}/resume"Scheduling with Cron
Connectors support standard 5-field cron expressions (* * * * *).
| Example | Schedule |
|---|---|
0 */6 * * * | Every 6 hours |
0 0 * * * | Daily at midnight |
*/30 * * * * | Every 30 minutes |