Skip to main content

Batch Processing

Curacel Doc Extractor supports batch processing for handling multiple documents efficiently. This guide covers how to use batch processing features and best practices.

Overview

Batch processing allows you to:

  • Process up to 10 documents in a single request
  • Monitor processing status for large batches
  • Retrieve results asynchronously
  • Optimize API usage and reduce overhead

Basic Batch Processing

Making a Batch Request

curl -X POST "https://extract.curacel.co/api/annotate/batch" \
-H "X-API-Key: your_api_key_here" \
-H "Content-Type: application/json" \
-d '{
"files": [
{
"type": "url",
"file": {
"name": "document1.pdf",
"content": "https://example.com/document1.pdf"
}
},
{
"type": "url",
"file": {
"name": "document2.pdf",
"content": "https://example.com/document2.pdf"
}
}
],
"fields": ["first_name", "last_name", "email", "phone_number"]
}'

Batch Response

{
"success": true,
"data": {
"document1.pdf": {
"first_name": "John",
"last_name": "Doe",
"email": "john.doe@example.com",
"phone_number": "+1234567890"
},
"document2.pdf": {
"first_name": "Jane",
"last_name": "Smith",
"email": "jane.smith@example.com",
"phone_number": "+0987654321"
}
},
"message": "Batch extraction completed successfully"
}

Asynchronous Processing

For large batches, processing may be asynchronous. In this case, you'll receive a batch ID to track the job.

Asynchronous Response

{
"success": true,
"data": {
"batch_id": "batch_123456789",
"status": "processing",
"total_files": 10,
"files_processed": 0,
"estimated_completion": "2024-01-15T10:30:00Z"
},
"message": "Batch processing started"
}

Checking Job Status

curl -X GET "https://extract.curacel.ai/api/annotate/status/batch_123456789" \
-H "X-API-Key: your_api_key_here"

Status Response

{
"success": true,
"data": {
"job_id": "batch_123456789",
"status": "processing",
"progress": 60,
"created_at": "2024-01-15T10:00:00Z",
"updated_at": "2024-01-15T10:15:00Z",
"files_processed": 6,
"total_files": 10,
"estimated_completion": "2024-01-15T10:30:00Z"
},
"message": "Job status retrieved successfully"
}

Retrieving Results

curl -X GET "https://extract.curacel.co/api/annotate/result/batch_123456789" \
-H "X-API-Key: your_api_key_here"

Results Response

{
"success": true,
"data": {
"document1.pdf": {
"first_name": "John",
"last_name": "Doe",
"email": "john.doe@example.com",
"phone_number": "+1234567890"
},
"document2.pdf": {
"first_name": "Jane",
"last_name": "Smith",
"email": "jane.smith@example.com",
"phone_number": "+0987654321"
}
},
"message": "Results retrieved successfully"
}

Common Issues

  1. Batch Size Exceeded

    • Error: "Batch size exceeds maximum limit"
    • Solution: Split into smaller batches
  2. File Size Too Large

    • Error: "File size exceeds maximum limit"
    • Solution: Compress or split large files
  3. Processing Timeout

    • Error: "Processing timeout"
    • Solution: Use asynchronous processing for large batches
  4. Rate Limit Exceeded

    • Error: "Rate limit exceeded"
    • Solution: Implement rate limiting and retry logic