vlm.md
← All Recipes · Document Understanding · Intermediate

Extract Key Clauses from Contract Images

Have your agent automatically extract parties, amounts, dates, penalty clauses, and jurisdiction from scanned contracts and output structured JSON for approval workflows or risk systems.

5/13/2026 · vlm.md · Recommended models: Claude 3.5 SonnetGPT-4oGemini 1.5 Pro

Scenario

Your agent processes contract scans uploaded by legal or procurement teams and automatically extracts:

  • Party A and Party B full names
  • Contract amount and currency
  • Signing date, effective date, expiry date
  • Main performance obligations (brief summary)
  • Penalty clause (if present)
  • Governing court or arbitration body

Extracted data flows into approval systems or contract registers — humans only review flagged anomalies.

ModelWhen to use
Claude 3.5 SonnetBest long-document understanding; cleanest clause boundary detection — first choice
GPT-4oStrong on mixed-language contracts; fast; good for batch processing
Gemini 1.5 ProBest value for very long contracts (20+ pages) with 1M context

Contracts are dense and format-variable. Claude’s long-context comprehension is the most consistent of the three.

Prompt Template

You are a contract information extraction expert. Extract the following fields from the image and return ONLY valid JSON — no explanation, no markdown.

Fields:
- party_a: Full legal name of Party A (the side labeled "Party A" or "Client" in the contract)
- party_b: Full legal name of Party B
- contract_amount: Total contract value (number, no currency symbol; use the total price if multiple amounts appear)
- currency: Currency code (USD / EUR / CNY etc.)
- signing_date: Date signed (YYYY-MM-DD, or null)
- effective_date: Date the contract takes effect (YYYY-MM-DD; if not stated, same as signing_date)
- expiry_date: Contract end or termination date (YYYY-MM-DD, or null)
- obligations_summary: Party B's main obligations, max 60 words
- penalty_clause: Verbatim excerpt of the penalty/liquidated damages clause, max 60 words; null if absent
- jurisdiction: Name of governing court or arbitration body; null if absent

Return null for any field not found. Do not guess or infer.

Code

import anthropic
import base64
import json
import re
from pathlib import Path

client = anthropic.Anthropic()

PROMPT = """You are a contract information extraction expert. Extract the following fields from the image and return ONLY valid JSON — no explanation, no markdown.

Fields:
- party_a: Full legal name of Party A
- party_b: Full legal name of Party B
- contract_amount: Total contract value (number, no currency symbol)
- currency: Currency code
- signing_date: Date signed (YYYY-MM-DD)
- effective_date: Effective date (YYYY-MM-DD)
- expiry_date: Expiry date (YYYY-MM-DD)
- obligations_summary: Party B's main obligations, max 60 words
- penalty_clause: Penalty/liquidated damages excerpt, max 60 words; null if absent
- jurisdiction: Governing court or arbitration body; null if absent

Return null for any field not found. Do not guess."""


def extract_contract(image_path: str) -> dict:
    data = base64.standard_b64encode(Path(image_path).read_bytes()).decode()
    suffix = Path(image_path).suffix.lower().lstrip(".")
    media_type = {"jpg": "image/jpeg", "jpeg": "image/jpeg", "png": "image/png"}.get(
        suffix, "image/jpeg"
    )

    message = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "image",
                        "source": {"type": "base64", "media_type": media_type, "data": data},
                    },
                    {"type": "text", "text": PROMPT},
                ],
            }
        ],
    )

    raw = message.content[0].text.strip()
    raw = re.sub(r"^```(?:json)?\s*|\s*```$", "", raw, flags=re.MULTILINE).strip()
    return json.loads(raw)


if __name__ == "__main__":
    result = extract_contract("contract.jpg")
    print(json.dumps(result, indent=2))

Expected output:

{
  "party_a": "Acme Technologies Inc.",
  "party_b": "CloudServ Solutions LLC",
  "contract_amount": 240000,
  "currency": "USD",
  "signing_date": "2024-03-01",
  "effective_date": "2024-03-01",
  "expiry_date": "2025-02-28",
  "obligations_summary": "Party B shall complete system deployment within 30 days of contract execution and provide 12 months of maintenance support.",
  "penalty_clause": "For each day of delay beyond the agreed delivery date, Party B shall pay liquidated damages equal to 0.1% of the total contract value, not to exceed 10%.",
  "jurisdiction": "Superior Court of California, County of San Francisco"
}

Gotchas

Gotcha 1: Party A/B role confusion

In some contracts “Party A” is the buyer, in others it’s the service provider. The model sometimes swaps them based on assumed roles. Fix: add “Identify Party A strictly by the label ‘Party A’ in the contract text — do not infer from context.”

Gotcha 2: Multiple amounts — wrong one extracted

Contracts often list deposit, milestone payments, and total value. Without explicit guidance the model may return any of these. Add: “Use the total contract value. If not explicitly labeled as total, sum all payment amounts.”

Gotcha 3: Low-resolution scans miss fine print

Penalty clauses and jurisdiction sections are often in small print (8–10pt). Scans below 150 DPI cause the model to miss or misread these. Check resolution before sending:

from PIL import Image

def check_dpi(path: str) -> int:
    with Image.open(path) as img:
        dpi = img.info.get("dpi", (72, 72))
        return int(dpi[0])

if check_dpi("contract.jpg") < 150:
    print("Warning: low scan resolution — extraction may be inaccurate")

Gotcha 4: Key clauses are on the last pages

Penalty and jurisdiction clauses almost always appear in the final pages of a contract. If you only send the first page, these fields return null. For multi-page contracts, merge all pages into one request or extract page-by-page and merge results (prefer non-null values).