Batch Processing
Curacel Doc Extractor supports batch processing for handling multiple documents efficiently. This guide covers how to use batch processing features and best practices.
Overview
Batch processing allows you to:
- Process up to 10 documents in a single request
- Monitor processing status for large batches
- Retrieve results asynchronously
- Optimize API usage and reduce overhead
Basic Batch Processing
Making a Batch Request
curl -X POST "https://extract.curacel.co/api/annotate/batch" \
  -H "X-API-Key: your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "files": [
      {
        "type": "url",
        "file": {
          "name": "document1.pdf",
          "content": "https://example.com/document1.pdf"
        }
      },
      {
        "type": "url",
        "file": {
          "name": "document2.pdf",
          "content": "https://example.com/document2.pdf"
        }
      }
    ],
    "fields": ["first_name", "last_name", "email", "phone_number"]
  }'
Batch Response
{
  "success": true,
  "data": {
    "document1.pdf": {
      "first_name": "John",
      "last_name": "Doe",
      "email": "john.doe@example.com",
      "phone_number": "+1234567890"
    },
    "document2.pdf": {
      "first_name": "Jane",
      "last_name": "Smith",
      "email": "jane.smith@example.com",
      "phone_number": "+0987654321"
    }
  },
  "message": "Batch extraction completed successfully"
}
Asynchronous Processing
For large batches, processing may be asynchronous. In this case, you'll receive a batch ID to track the job.
Asynchronous Response
{
  "success": true,
  "data": {
    "batch_id": "batch_123456789",
    "status": "processing",
    "total_files": 10,
    "files_processed": 0,
    "estimated_completion": "2024-01-15T10:30:00Z"
  },
  "message": "Batch processing started"
}
Checking Job Status
curl -X GET "https://extract.curacel.ai/api/annotate/status/batch_123456789" \
  -H "X-API-Key: your_api_key_here"
Status Response
{
  "success": true,
  "data": {
    "job_id": "batch_123456789",
    "status": "processing",
    "progress": 60,
    "created_at": "2024-01-15T10:00:00Z",
    "updated_at": "2024-01-15T10:15:00Z",
    "files_processed": 6,
    "total_files": 10,
    "estimated_completion": "2024-01-15T10:30:00Z"
  },
  "message": "Job status retrieved successfully"
}
Retrieving Results
curl -X GET "https://extract.curacel.co/api/annotate/result/batch_123456789" \
  -H "X-API-Key: your_api_key_here"
Results Response
{
  "success": true,
  "data": {
    "document1.pdf": {
      "first_name": "John",
      "last_name": "Doe",
      "email": "john.doe@example.com",
      "phone_number": "+1234567890"
    },
    "document2.pdf": {
      "first_name": "Jane",
      "last_name": "Smith",
      "email": "jane.smith@example.com",
      "phone_number": "+0987654321"
    }
  },
  "message": "Results retrieved successfully"
}
Common Issues
- Batch Size Exceeded - Error: "Batch size exceeds maximum limit"
- Solution: Split into smaller batches
 
- File Size Too Large - Error: "File size exceeds maximum limit"
- Solution: Compress or split large files
 
- Processing Timeout - Error: "Processing timeout"
- Solution: Use asynchronous processing for large batches
 
- Rate Limit Exceeded - Error: "Rate limit exceeded"
- Solution: Implement rate limiting and retry logic