Skip to main content

Getting Started

Welcome to Curacel Doc Extractor! This guide will help you make your first API call and get started with document processing.

Prerequisites

Before you begin, ensure you have:

  • An API key for the production environment
  • Basic knowledge of HTTP requests and JSON

Quick Start

Step 1: Set Up Your Environment

First, set up your environment with the necessary credentials:

# Set your API key
export DOC_EXTRACTOR_API_KEY="your_production_api_key_here"

# Set the base URL for production
export DOC_EXTRACTOR_BASE_URL="https://extract.curacel.co/api"

Step 2: Make Your First API Call

Let's extract data from a sample document:

curl -X POST "https://api.doc-extractor.curacel.co/api/extract" \
-H "X-API-Key: your_production_api_key_here" \
-H "Content-Type: application/json" \
-d '{
"files": [
{
"type": "url",
"file": {
"name": "sample.pdf",
"content": "https://example.com/sample.pdf"
}
}
],
"fields": ["first_name", "last_name", "email", "phone_number"],
"location": "Kenya"
"data_type": "finance"
}'

Step 3: Understand the Response

A successful response will look like this:

{
"success": true,
"data": {
[
"sample.pdf": {
"first_name": "John",
"last_name": "Doe",
"email": "john.doe@example.com",
"phone_number": "+1234567890"
}
]
},
"message": "Data extracted successfully"
}

Understanding the Request Structure

File Input Types

The API supports three types of file inputs:

1. URL Input

{
"type": "url",
"file": {
"name": "document.pdf",
"content": "https://example.com/document.pdf"
}
}

2. Base64 Input

{
"type": "base64",
"file": {
"name": "document.pdf",
"content": "data:application/pdf;base64,JVBERi0xLjQKJcfsj6IKNSAwIG9iago8PAovVHlwZSAvUGFnZQovUGFyZW50IDMgMCBSCi9NZWRpYUJveCBbMCAwIDU5NSA4NDJdCi9SZXNvdXJjZXMgPDwKL0ZvbnQgPDwKL0YxIDIgMCBSCj4+Cj4+Ci9Db250ZW50cyA0IDAgUgo+PgoKZW5kb2JqCg=="
}
}

Field Configuration

Specify which fields you want to extract:

{
"fields": [
"first_name",
"last_name",
"email",
"phone_number",
"address",
"date_of_birth"
]
}

Code Examples

JavaScript/Node.js

const axios = require("axios");

async function extractDocumentData() {
try {
const response = await axios.post(
"https://extract.curacel.co/api/annotate",
{
files: [
{
type: "url",
file: {
name: "document.pdf",
content: "https://example.com/document.pdf",
},
},
],
fields: ["first_name", "last_name", "email", "phone_number"],
},
{
headers: {
"X-API-Key": process.env.DOC_EXTRACTOR_API_KEY,
"Content-Type": "application/json",
},
},
);

console.log("Extracted data:", response.data);
return response.data;
} catch (error) {
console.error("Error:", error.response?.data || error.message);
throw error;
}
}

// Usage
extractDocumentData();

Python

import requests
import os

def extract_document_data():
url = 'https://extract.curacel.co/api/annotate'

headers = {
'X-API-Key': os.getenv('DOC_EXTRACTOR_API_KEY'),
'Content-Type': 'application/json'
}

data = {
'files': [
{
'type': 'url',
'file': {
'name': 'document.pdf',
'content': 'https://example.com/document.pdf'
}
}
],
'fields': ['first_name', 'last_name', 'email', 'phone_number']
}

try:
response = requests.post(url, json=data, headers=headers)
response.raise_for_status()

result = response.json()
print('Extracted data:', result)
return result

except requests.exceptions.RequestException as e:
print('Error:', e)
raise

# Usage
extract_document_data()

PHP

<?php
function extractDocumentData() {
$url = 'https://extract.curacel.co/api/annotate';

$headers = [
'X-API-Key: ' . $_ENV['DOC_EXTRACTOR_API_KEY'],
'Content-Type: application/json'
];

$data = [
'files' => [
[
'type' => 'url',
'file' => [
'name' => 'document.pdf',
'content' => 'https://example.com/document.pdf'
]
]
],
'fields' => ['first_name', 'last_name', 'email', 'phone_number']
];

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($data));
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$response = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);

if ($httpCode === 200) {
$result = json_decode($response, true);
echo 'Extracted data: ' . json_encode($result) . PHP_EOL;
return $result;
} else {
echo 'Error: HTTP ' . $httpCode . ' - ' . $response . PHP_EOL;
return false;
}
}

// Usage
extractDocumentData();
?>

Batch Processing

For processing multiple documents at once:

curl -X POST "https://extract.curacel.co/api/annotate" \
-H "X-API-Key: your_sandbox_api_key_here" \
-H "Content-Type: application/json" \
-d '{
"files": [
{
"type": "url",
"file": {
"name": "document1.pdf",
"content": "https://example.com/document1.pdf"
}
},
{
"type": "url",
"file": {
"name": "document2.pdf",
"content": "https://example.com/document2.pdf"
}
}
],
"fields": ["first_name", "last_name", "email", "phone_number"]
}'

Error Handling

Common Error Responses

400 Bad Request

{
"status": false,
"message": "Invalid file format or missing required fields"
}

401 Unauthorized

{
"status": false,
"message": "Invalid API key"
}

422 Unprocessable Entity

{
"status": false,
"message": "Document processing failed"
}

Next Steps

Now that you've made your first API call:

  1. Explore the API Reference: Check out all available endpoints
  2. Test Different Document Types: Try various file formats
  3. Implement Error Handling: Add robust error handling to your code
  4. Set Up Production: Configure your production environment
  5. Monitor Usage: Track your API usage and limits

Support

If you need help:

  • Documentation: Check our comprehensive guides
  • API Reference: Explore all available endpoints
  • Support: Contact us at support@curacel.ai
  • Community: Join our developer community