Getting Started
Welcome to Curacel Doc Extractor! This guide will help you make your first API call and get started with document processing.
Prerequisites
Before you begin, ensure you have:
- An API key for the production environment
- Basic knowledge of HTTP requests and JSON
Quick Start
Step 1: Set Up Your Environment
First, set up your environment with the necessary credentials:
# Set your API key
export DOC_EXTRACTOR_API_KEY="your_production_api_key_here"
# Set the base URL for production
export DOC_EXTRACTOR_BASE_URL="https://extract.curacel.co/api"
Step 2: Make Your First API Call
Let's extract data from a sample document:
curl -X POST "https://api.doc-extractor.curacel.co/api/extract" \
-H "X-API-Key: your_production_api_key_here" \
-H "Content-Type: application/json" \
-d '{
"files": [
{
"type": "url",
"file": {
"name": "sample.pdf",
"content": "https://example.com/sample.pdf"
}
}
],
"fields": ["first_name", "last_name", "email", "phone_number"],
"location": "Kenya"
"data_type": "finance"
}'
Step 3: Understand the Response
A successful response will look like this:
{
"success": true,
"data": {
[
"sample.pdf": {
"first_name": "John",
"last_name": "Doe",
"email": "john.doe@example.com",
"phone_number": "+1234567890"
}
]
},
"message": "Data extracted successfully"
}
Understanding the Request Structure
File Input Types
The API supports three types of file inputs:
1. URL Input
{
"type": "url",
"file": {
"name": "document.pdf",
"content": "https://example.com/document.pdf"
}
}
2. Base64 Input
{
"type": "base64",
"file": {
"name": "document.pdf",
"content": "data:application/pdf;base64,JVBERi0xLjQKJcfsj6IKNSAwIG9iago8PAovVHlwZSAvUGFnZQovUGFyZW50IDMgMCBSCi9NZWRpYUJveCBbMCAwIDU5NSA4NDJdCi9SZXNvdXJjZXMgPDwKL0ZvbnQgPDwKL0YxIDIgMCBSCj4+Cj4+Ci9Db250ZW50cyA0IDAgUgo+PgoKZW5kb2JqCg=="
}
}
Field Configuration
Specify which fields you want to extract:
{
"fields": [
"first_name",
"last_name",
"email",
"phone_number",
"address",
"date_of_birth"
]
}
Code Examples
JavaScript/Node.js
const axios = require("axios");
async function extractDocumentData() {
try {
const response = await axios.post(
"https://extract.curacel.co/api/annotate",
{
files: [
{
type: "url",
file: {
name: "document.pdf",
content: "https://example.com/document.pdf",
},
},
],
fields: ["first_name", "last_name", "email", "phone_number"],
},
{
headers: {
"X-API-Key": process.env.DOC_EXTRACTOR_API_KEY,
"Content-Type": "application/json",
},
},
);
console.log("Extracted data:", response.data);
return response.data;
} catch (error) {
console.error("Error:", error.response?.data || error.message);
throw error;
}
}
// Usage
extractDocumentData();
Python
import requests
import os
def extract_document_data():
url = 'https://extract.curacel.co/api/annotate'
headers = {
'X-API-Key': os.getenv('DOC_EXTRACTOR_API_KEY'),
'Content-Type': 'application/json'
}
data = {
'files': [
{
'type': 'url',
'file': {
'name': 'document.pdf',
'content': 'https://example.com/document.pdf'
}
}
],
'fields': ['first_name', 'last_name', 'email', 'phone_number']
}
try:
response = requests.post(url, json=data, headers=headers)
response.raise_for_status()
result = response.json()
print('Extracted data:', result)
return result
except requests.exceptions.RequestException as e:
print('Error:', e)
raise
# Usage
extract_document_data()
PHP
<?php
function extractDocumentData() {
$url = 'https://extract.curacel.co/api/annotate';
$headers = [
'X-API-Key: ' . $_ENV['DOC_EXTRACTOR_API_KEY'],
'Content-Type: application/json'
];
$data = [
'files' => [
[
'type' => 'url',
'file' => [
'name' => 'document.pdf',
'content' => 'https://example.com/document.pdf'
]
]
],
'fields' => ['first_name', 'last_name', 'email', 'phone_number']
];
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($data));
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if ($httpCode === 200) {
$result = json_decode($response, true);
echo 'Extracted data: ' . json_encode($result) . PHP_EOL;
return $result;
} else {
echo 'Error: HTTP ' . $httpCode . ' - ' . $response . PHP_EOL;
return false;
}
}
// Usage
extractDocumentData();
?>
Batch Processing
For processing multiple documents at once:
curl -X POST "https://extract.curacel.co/api/annotate" \
-H "X-API-Key: your_sandbox_api_key_here" \
-H "Content-Type: application/json" \
-d '{
"files": [
{
"type": "url",
"file": {
"name": "document1.pdf",
"content": "https://example.com/document1.pdf"
}
},
{
"type": "url",
"file": {
"name": "document2.pdf",
"content": "https://example.com/document2.pdf"
}
}
],
"fields": ["first_name", "last_name", "email", "phone_number"]
}'
Error Handling
Common Error Responses
400 Bad Request
{
"status": false,
"message": "Invalid file format or missing required fields"
}
401 Unauthorized
{
"status": false,
"message": "Invalid API key"
}
422 Unprocessable Entity
{
"status": false,
"message": "Document processing failed"
}
Next Steps
Now that you've made your first API call:
- Explore the API Reference: Check out all available endpoints
- Test Different Document Types: Try various file formats
- Implement Error Handling: Add robust error handling to your code
- Set Up Production: Configure your production environment
- Monitor Usage: Track your API usage and limits
Support
If you need help:
- Documentation: Check our comprehensive guides
- API Reference: Explore all available endpoints
- Support: Contact us at support@curacel.ai
- Community: Join our developer community