The AI Developer Assistant: Claude Haiku Integration

Payment API documentation is comprehensive by necessity. Endpoint references, authentication flows, webhook verification, error codes, currency handling, provider-specific quirks -- a developer integrating 0fee.dev for the first time faces hundreds of pages of documentation. Most will not read it. They will search, skim, copy a code snippet, and hope it works.

What if the documentation could talk back?

The AI Developer Assistant is an in-dashboard chatbot powered by Claude Haiku that understands the entire 0fee.dev API surface. It answers integration questions, generates working code snippets, looks up transaction statuses, explains error codes, and recommends providers -- all in the context of the developer's specific application. This article covers the architecture, the model selection rationale, the custom tool design, and the implementation plan.

Status note: The AI Developer Assistant is a planned feature, fully designed and specified but not yet shipped. This article documents the architecture and decisions that will guide its implementation.

Why an AI Assistant in a Payment Dashboard

Developer support in fintech follows a predictable pattern. A developer signs up, reads the quickstart guide, makes a test payment, and then hits their first real integration challenge. Maybe their webhook is not firing. Maybe they are confused about idempotency keys. Maybe they need code in Go but the documentation only shows JavaScript examples.

At this point, the developer has three options:

Search the docs. Works if the docs are well-organized and the developer knows what to search for. Fails when the developer does not know the right terminology.
Open a support ticket. Works, but introduces latency. Hours or days to get an answer to a question that has a thirty-second answer.
Ask the AI assistant. Instant, contextual, available 24/7, and capable of generating code in the developer's preferred language.

Option 3 does not replace options 1 and 2. It reduces the frequency with which developers need them.

Architecture

The system has three layers: a frontend chat component, a backend API endpoint, and the Anthropic API with Claude Haiku.

Frontend (Dashboard)
  AIAssistant.tsx
    - Chat message list
    - Code syntax highlighting
    - Markdown rendering
    - Collapsible context sidebar
    - Language selector
         |
         | POST /v1/ai/chat
         v
Backend (FastAPI)
  routes/ai.py
    - Authentication (API key required)
    - Rate limiting (50 req/hour per app)
    - Request validation
    - Tool execution
         |
         | Anthropic API
         v
Claude Haiku
  - System prompt with OpenAPI spec
  - SDK documentation
  - 5 custom tools
  - Streaming response

Why This Architecture

We considered three alternatives:

Client-side AI (API key in browser). Rejected because it exposes the Anthropic API key and prevents rate limiting.
Separate microservice. Rejected as over-engineering for a single endpoint. The AI chat is one more route in the existing FastAPI backend.
Backend proxy to Anthropic. Chosen. The backend authenticates the developer, enforces rate limits, executes tool calls against the database, and streams the response to the frontend.

Why Claude Haiku

Model selection for a developer assistant involves three constraints: response quality, latency, and cost at scale.

Cost Analysis

Model	Input Cost	Output Cost	Est. per Request	Monthly (1,000 devs, 10 queries/day)
Claude Haiku	$0.25/1M tokens	$1.25/1M tokens	~$0.003	$100--300
Claude Sonnet	$3.00/1M tokens	$15.00/1M tokens	~$0.04	$1,200--3,600

At scale, the difference is stark. One thousand developers making ten queries per day generates 300,000 requests per month. With Haiku, that costs $100--300. With Sonnet, $1,200--3,600. For a platform charging 0.99% per transaction, the AI assistant needs to be a feature that drives adoption, not a cost center that eats margins.

Quality Assessment

Haiku handles the core use cases well:

API documentation questions. The system prompt contains the complete OpenAPI spec, so Haiku retrieves and formats the relevant section. This is retrieval, not reasoning.
Code snippet generation. Given examples in the system prompt, Haiku generates syntactically correct code in all supported languages. The patterns are templatable.
Error code explanation. Mapping error codes to human-readable explanations is a lookup task.
Transaction status checks. The tool provides the data; Haiku formats the response.

Where Haiku falls short -- complex multi-step debugging, architectural recommendations, nuanced security analysis -- we can escalate to Sonnet in a future version. The architecture supports model routing per query complexity.

Latency

Haiku's time-to-first-token is significantly faster than Sonnet. For a chat interface, perceived responsiveness matters more than total generation time. Streaming (covered below) further improves the experience.

The System Prompt

The system prompt is the foundation of the assistant's knowledge. It embeds the complete API documentation directly, rather than relying on retrieval-augmented generation (RAG):

pythondef _build_system_prompt(self) -> str:
    # Load API documentation
    with open("../docs/api-reference.md", "r") as f:
        api_docs = f.read()

    with open("../docs/sdk-reference.md", "r") as f:
        sdk_docs = f.read()

    return f"""You are 0fee.dev's AI Developer Assistant. You help developers
integrate our payment gateway into their applications.

## Your Role
- Answer integration questions clearly and concisely
- Generate working code examples in the developer's preferred language
- Explain error codes with specific solutions
- Guide webhook implementation step by step
- Recommend best practices for payment integration

## API Documentation
{api_docs}

## SDK Documentation
{sdk_docs}

## Integration Examples
[Key patterns for JavaScript, Python, PHP, Go]

## Response Guidelines
1. Always provide code examples when the question involves implementation
2. Use the developer's preferred language (default: JavaScript)
3. Include error handling in all code examples
4. Explain WHY, not just HOW
5. Keep responses focused and actionable
6. When referencing endpoints, include the HTTP method and path
7. For webhook-related questions, always mention signature verification
"""

Why Embedded Context Instead of RAG

For a corpus the size of 0fee.dev's documentation (roughly 15,000 tokens), embedding the full text in the system prompt is simpler and more reliable than building a vector database, chunking documents, and running similarity searches. The entire API reference fits within Haiku's context window with room to spare.

RAG introduces failure modes that embedded context avoids: irrelevant chunks returned by similarity search, incomplete context when the answer spans multiple chunks, and the operational overhead of keeping the vector database in sync with documentation changes.

If the documentation grows by 10x, we will reconsider. At current size, embedded context is the correct choice.

The Five Custom Tools

Custom tools give the AI assistant the ability to perform actions -- not just answer questions from static documentation.

1. get_api_docs()

python{
    "name": "get_api_docs",
    "description": "Fetch a specific section of the API documentation",
    "input_schema": {
        "type": "object",
        "properties": {
            "section": {
                "type": "string",
                "enum": ["authentication", "payments", "webhooks", "checkout",
                         "customers", "invoices", "countries", "currencies",
                         "errors", "rate-limits"],
                "description": "The documentation section to retrieve"
            }
        },
        "required": ["section"]
    }
}

This tool exists for cases where the full documentation is too long to include in every response. The AI can fetch a specific section to provide a focused answer.

2. get_transaction_status()

python{
    "name": "get_transaction_status",
    "description": "Look up a transaction by ID to check its current status, amount, provider, and timeline",
    "input_schema": {
        "type": "object",
        "properties": {
            "transaction_id": {
                "type": "string",
                "description": "The transaction ID (txn_xxx format)"
            }
        },
        "required": ["transaction_id"]
    }
}

When a developer asks "Why did transaction txn_abc123 fail?", the AI can look up the transaction in the database (scoped to the developer's app, enforcing data isolation) and provide a specific answer: "Transaction txn_abc123 failed at 14:32 UTC with error code provider_declined. The provider (Stripe) returned reason: insufficient_funds. The customer's card was declined."

3. generate_code_snippet()

python{
    "name": "generate_code_snippet",
    "description": "Generate a working code snippet for a specific integration use case",
    "input_schema": {
        "type": "object",
        "properties": {
            "language": {
                "type": "string",
                "enum": ["javascript", "python", "php", "go", "rust", "java", "dart"],
                "description": "Programming language for the snippet"
            },
            "use_case": {
                "type": "string",
                "description": "What the code should accomplish (e.g., 'create a payment', 'verify webhook signature', 'list transactions with pagination')"
            }
        },
        "required": ["language", "use_case"]
    }
}

This tool generates code using the appropriate SDK rather than raw HTTP calls. A request for "create a payment in Go" returns code using the zerofee-go SDK, not net/http boilerplate.

4. get_provider_info()

python{
    "name": "get_provider_info",
    "description": "Get available payment providers for a specific country and optional payment method",
    "input_schema": {
        "type": "object",
        "properties": {
            "country_code": {
                "type": "string",
                "description": "ISO 3166-1 alpha-2 country code (e.g., CI, NG, US, FR)"
            },
            "payment_method": {
                "type": "string",
                "description": "Optional payment method filter (e.g., mobile_money, card, bank_transfer)"
            }
        },
        "required": ["country_code"]
    }
}

A developer building for Senegal can ask "What payment methods are available in Senegal?" and get a real-time answer pulled from the provider routing table: "In Senegal (SN), 0fee.dev supports: Orange Money via PawaPay (priority 1) and Hub2 (priority 2), Free Money via PawaPay, Wave via PawaPay, and Visa/Mastercard via Stripe."

5. explain_error()

python{
    "name": "explain_error",
    "description": "Explain an error code and provide troubleshooting steps",
    "input_schema": {
        "type": "object",
        "properties": {
            "error_code": {
                "type": "string",
                "description": "The error code to explain (e.g., 'invalid_api_key', 'provider_timeout', 'insufficient_funds')"
            }
        },
        "required": ["error_code"]
    }
}

Error messages in payment systems are often cryptic. "provider_declined" tells the developer nothing actionable. The explain_error tool maps error codes to detailed explanations with specific remediation steps.

Rate Limiting

The rate limiter enforces 50 requests per hour per application. This is generous enough for active development (a developer asking a question every 72 seconds for an hour straight) but prevents abuse.

python# backend/routes/ai.py

from fastapi import HTTPException
from datetime import datetime, timedelta
from collections import defaultdict

# In-memory rate limiting (upgrade to Redis for production)
request_counts: dict[str, list[datetime]] = defaultdict(list)

def check_rate_limit(app_id: str, limit: int = 50, window: int = 3600):
    now = datetime.utcnow()
    cutoff = now - timedelta(seconds=window)

    # Remove expired entries
    request_counts[app_id] = [
        t for t in request_counts[app_id] if t > cutoff
    ]

    if len(request_counts[app_id]) >= limit:
        raise HTTPException(
            status_code=429,
            detail={
                "error": "rate_limit_exceeded",
                "message": f"AI assistant is limited to {limit} requests per hour",
                "retry_after": int(
                    (request_counts[app_id][0] + timedelta(seconds=window) - now).total_seconds()
                )
            }
        )

    request_counts[app_id].append(now)

Response Streaming

Streaming is essential for chat interfaces. A two-second delay before any text appears feels broken. Streaming the first tokens within 200 milliseconds and then filling in the rest over the next few seconds feels responsive.

python# backend/services/ai.py

async def chat_stream(self, message, context, history, app_id):
    messages = self._build_messages(history, message)

    with self.client.messages.stream(
        model=self.model,
        max_tokens=4096,
        system=self.system_prompt,
        messages=messages,
        tools=self.tools
    ) as stream:
        for text in stream.text_stream:
            yield f"data: {json.dumps({'type': 'text', 'content': text})}\n\n"

    yield f"data: {json.dumps({'type': 'done'})}\n\n"

The frontend consumes this as a Server-Sent Events (SSE) stream:

typescriptasync function sendMessage(message: string) {
  const response = await fetch('/v1/ai/chat/stream', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${apiKey}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ message, context, history })
  });

  const reader = response.body?.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader!.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split('\n').filter(l => l.startsWith('data: '));

    for (const line of lines) {
      const data = JSON.parse(line.slice(6));
      if (data.type === 'text') {
        appendToCurrentMessage(data.content);
      }
    }
  }
}

Frontend: The AI Assistant Page

The AI Assistant page (AIAssistant.tsx) is a full-width chat interface with a collapsible context sidebar.

Layout

+------------------+------------------------------+------------------+
|                  |                              |                  |
|    Dashboard     |      Chat Messages           |    Context       |
|    Sidebar       |                              |    Sidebar       |
|                  |   [User message]             |                  |
|    [AI Asst.]  > |   [AI response with code]    |  Language: [JS]  |
|                  |   [User message]             |                  |
|                  |   [AI response]              |  Quick Actions:  |
|                  |                              |  - Generate code |
|                  |                              |  - Check txn     |
|                  |                              |  - Explain error |
|                  |                              |                  |
|                  |  +------------------------+  |  Recent Code:    |
|                  |  | Ask about 0fee.dev...  |  |  - payment.py    |
|                  |  +------------------------+  |  - webhook.js    |
|                  |         [Send]               |                  |
+------------------+------------------------------+------------------+

Features

Markdown rendering. AI responses render with proper headings, lists, bold, and inline code.
Code block highlighting. Fenced code blocks render with syntax highlighting and a one-click copy button.
Message history. The chat maintains conversation context within the session.
Context sidebar. Displays conversation topics, quick action buttons, a language selector, and recently generated code snippets.
Loading state. A typing indicator appears while waiting for the AI response.

Security

Input Sanitization

User messages are sanitized before being sent to the Anthropic API. This prevents prompt injection attacks where a user crafts a message designed to override the system prompt:

pythondef sanitize_input(message: str) -> str:
    # Remove potential system prompt overrides
    message = message.replace("System:", "")
    message = message.replace("Human:", "")
    message = message.replace("Assistant:", "")

    # Limit message length
    return message[:4000]

Output Filtering

AI responses are scanned before being sent to the frontend to ensure they do not contain sensitive data that might have leaked from the system prompt or tool responses:

API keys are redacted.
Internal infrastructure details are stripped.
Database connection strings are blocked.

Context Isolation

The get_transaction_status tool is scoped to the developer's application. It queries WHERE app_id = :app_id -- a developer cannot look up another developer's transactions, even if they know the transaction ID.

Audit Logging

Every AI interaction is logged with the app ID, the message, the response, token usage, and timestamp. This enables:

Cost tracking per application.
Abuse detection (repetitive prompt injection attempts).
Quality improvement (identifying common questions that the documentation should address directly).

Implementation Phases

The implementation is divided into four phases:

Phase 1: Backend (Priority: Critical)

Add anthropic to requirements.txt.
Create services/ai.py with Claude Haiku integration.
Create routes/ai.py with /v1/ai/chat endpoint.
Build system prompt with embedded API documentation.
Implement rate limiting.

Phase 2: Sidebar Update (Priority: High)

Add collapsible sidebar with smooth CSS transitions.
Add "AI Assistant" menu item with sparkles icon.
Persist collapse state in localStorage.

Phase 3: Frontend (Priority: High)

Create AIAssistant.tsx with chat interface.
Implement markdown rendering and code highlighting.
Add streaming response consumption.
Build context sidebar with quick actions.
Add route in index.tsx.

Phase 4: Testing and Refinement (Priority: High)

Test basic chat flow end-to-end.
Test all five custom tools.
Test streaming response handling.
Test rate limiting behavior.
Refine system prompt based on test conversations.

Success Metrics

Metric	Target	Measurement
Activation rate	40%+ of developers use the assistant	Dashboard analytics
Integration time reduction	50% decrease	Time from signup to first live payment
Support ticket reduction	30% decrease	Ticket volume comparison
Code snippet usage	60%+ of generated snippets are copied	Copy button click tracking
User satisfaction	80%+ positive	Thumbs up/down on responses

The Meta-Narrative

There is something recursive about building an AI developer assistant for a platform that was itself built by an AI. Claude (Opus/Sonnet) built 0fee.dev as the AI CTO. Now Claude Haiku will help other developers integrate with what Claude built. The documentation that Haiku references was written during the build sessions. The code snippets it generates use SDKs that were designed in Sessions 079 and 080.

This is not AI replacing developers. This is AI at every layer of the stack: building the platform, writing the documentation, and then helping developers use both. The human in the loop -- Thales, the CEO -- provides the vision, the priorities, and the product decisions. The AI handles everything that can be automated, at every level of the pyramid.

The AI Developer Assistant is the logical conclusion of the 0fee.dev thesis: if an AI CTO can build a payment orchestrator, an AI assistant can certainly help developers use one.

This article is part of the "How We Built 0fee.dev" series. 0fee.dev is a payment orchestrator covering 53+ providers across 200+ countries, built by Juste A. GNIMAVO and Claude from Abidjan with zero human engineers. Follow the series for the complete build story.