Security Audit: What We Found and How We Fixed It

Every payment platform has security weaknesses. The question is whether you find them before your attackers do. After 60+ sessions of building 0fee.dev at startup speed, we conducted a thorough security audit. The findings were sobering. Some issues were critical. Some were architectural limitations of our initial choices. All of them needed fixing.

This article is a transparent account of what we found and how we addressed it. We publish this not to embarrass ourselves but because the fintech community benefits from honest post-mortems. If you are building a payment platform, these are the pitfalls to avoid.

Critical Issues

1. Environment Mismatch in Routing

The most dangerous bug we found was an environment mismatch in the payment routing logic. The routing engine could, under specific conditions, send a live payment to a test provider or vice versa.

python# BEFORE: The bug
async def route_payment(app_id: str, payment_data: dict):
    app = await get_app(app_id)
    providers = await get_available_providers(
        country=payment_data["country"],
        method=payment_data["method"],
    )
    # BUG: Not filtering by app.mode (test/live)
    best_provider = select_best_provider(providers)
    return best_provider

The get_available_providers function returned all providers matching the country and method, regardless of whether the app was in test or live mode. A test-mode app could theoretically route to a live Stripe instance.

python# AFTER: The fix
async def route_payment(app_id: str, payment_data: dict):
    app = await get_app(app_id)
    providers = await get_available_providers(
        country=payment_data["country"],
        method=payment_data["method"],
        mode=app.mode,  # Explicit mode filtering
    )
    if not providers:
        raise HTTPException(400, f"No {app.mode} providers available for this payment")
    best_provider = select_best_provider(providers)
    return best_provider

This was a one-line fix with enormous implications. Without it, a test API key could trigger real charges.

2. Missing Seed Data Validation

The seed data script created test users with known credentials but did not validate that those credentials were removed before production deployment. We added a startup check:

python# main.py startup check
@app.on_event("startup")
async def validate_no_seed_data():
    if ENVIRONMENT == "production":
        seed_users = await db.scalars(
            select(User).where(User.email.in_([
                "[email protected]",
                "[email protected]",
                "[email protected]",
            ]))
        )
        if seed_users.all():
            logger.critical(
                "SEED DATA DETECTED IN PRODUCTION. "
                "Remove test users before deployment."
            )
            raise RuntimeError("Seed data present in production database")

3. No Credential Validation on Provider Setup

When developers configured payment providers, the system accepted any credentials without validation. A developer could enter invalid Stripe keys and only discover the problem when the first payment failed.

python# AFTER: Credential validation
async def configure_provider(
    app_id: str,
    provider_name: str,
    credentials: dict,
) -> AppProvider:
    provider = get_provider_instance(provider_name)

    # Validate credentials by making a test API call
    validation = await provider.validate_credentials(credentials)
    if not validation.valid:
        raise HTTPException(400, f"Invalid credentials: {validation.error}")

    # Store only after validation passes
    encrypted_creds = encrypt_credentials(credentials)
    ...

Each provider adapter now implements a validate_credentials method that makes a lightweight API call (e.g., fetching account details) to verify the credentials are functional.

Security Fixes

4. Fixed Encryption Salt

The encryption service for storing provider credentials used a static salt. This meant that identical credentials encrypted for different apps would produce identical ciphertext, making it possible to detect when two apps used the same provider credentials.

python# BEFORE: Static salt
class EncryptionService:
    SALT = b"0fee_static_salt_v1"  # Same for all encryptions

    def encrypt(self, plaintext: str) -> str:
        kdf = PBKDF2HMAC(
            algorithm=hashes.SHA256(),
            length=32,
            salt=self.SALT,
            iterations=100_000,
        )
        ...

python# AFTER: Random salt per encryption
class EncryptionService:
    def encrypt(self, plaintext: str) -> str:
        salt = os.urandom(16)  # Unique salt per encryption
        kdf = PBKDF2HMAC(
            algorithm=hashes.SHA256(),
            length=32,
            salt=salt,
            iterations=100_000,
        )
        key = base64.urlsafe_b64encode(kdf.derive(self.master_key))
        fernet = Fernet(key)
        encrypted = fernet.encrypt(plaintext.encode())
        # Prepend salt to ciphertext for decryption
        return base64.b64encode(salt + encrypted).decode()

    def decrypt(self, ciphertext: str) -> str:
        raw = base64.b64decode(ciphertext)
        salt = raw[:16]  # Extract salt
        encrypted = raw[16:]
        kdf = PBKDF2HMAC(
            algorithm=hashes.SHA256(),
            length=32,
            salt=salt,
            iterations=100_000,
        )
        key = base64.urlsafe_b64encode(kdf.derive(self.master_key))
        fernet = Fernet(key)
        return fernet.decrypt(encrypted).decode()

With a random salt per encryption, identical plaintext produces different ciphertext every time.

5. API Key Masking

API keys were being logged and returned in full in some API responses. We implemented consistent masking:

python# utils/masking.py
def mask_api_key(key: str) -> str:
    """Show only prefix and last 4 characters."""
    if not key or len(key) < 12:
        return "****"
    prefix = key[:8]   # e.g., "sk_live_"
    suffix = key[-4:]
    return f"{prefix}...{suffix}"

def mask_for_logging(data: dict) -> dict: """Recursively mask sensitive fields in log output.""" sensitive_keys = { "api_key", "secret_key", "password", "token", "client_secret", "webhook_secret", "private_key", } masked = {} for k, v in data.items(): if k.lower() in sensitive_keys: masked[k] = "*MASKED*" elif isinstance(v, dict): masked[k] = mask_for_logging(v) else: masked[k] = v return masked ```

Every API response that includes key information now returns the masked version. Logs use mask_for_logging to prevent credentials from appearing in log files.

6. Authentication Rate Limiting

The login and OAuth endpoints had no rate limiting, making them vulnerable to brute-force attacks:

python# middleware/rate_limit.py
from datetime import datetime, timedelta
from collections import defaultdict

class AuthRateLimiter:
    def __init__(self, max_attempts: int = 5, window: int = 300):
        self.max_attempts = max_attempts
        self.window = window  # seconds
        self.attempts: dict[str, list[datetime]] = defaultdict(list)

    def check(self, identifier: str) -> bool:
        """Returns True if the request should be allowed."""
        now = datetime.utcnow()
        cutoff = now - timedelta(seconds=self.window)

        # Clean old attempts
        self.attempts[identifier] = [
            t for t in self.attempts[identifier] if t > cutoff
        ]

        if len(self.attempts[identifier]) >= self.max_attempts:
            return False

        self.attempts[identifier].append(now)
        return True

auth_limiter = AuthRateLimiter(max_attempts=5, window=300) BLANK @router.post("/auth/login") async def login(request: Request, data: LoginRequest): client_ip = request.client.host BLANK if not auth_limiter.check(client_ip): raise HTTPException( status_code=429, detail="Too many login attempts. Try again in 5 minutes.", headers={"Retry-After": "300"}, ) ... ```

Five attempts per IP address within a 5-minute window. After that, the client receives a 429 with a Retry-After header.

7. Webhook Verification Gaps

Incoming webhooks from payment providers were not consistently verified. Some providers (Stripe) have robust signature verification. Others were accepted without validation:

python# BEFORE: Some providers had no webhook verification
@router.post("/webhooks/{provider}")
async def handle_webhook(provider: str, request: Request):
    body = await request.body()
    # Process directly without verification for some providers
    await process_webhook(provider, body)

python# AFTER: All providers verify webhooks
@router.post("/webhooks/{provider}")
async def handle_webhook(provider: str, request: Request):
    body = await request.body()
    headers = dict(request.headers)

    provider_instance = get_provider_instance(provider)

    # Every provider must implement verify_webhook
    if not await provider_instance.verify_webhook(body, headers):
        logger.warning(f"Webhook verification failed for {provider}")
        raise HTTPException(400, "Webhook verification failed")

    await process_webhook(provider, body)

The base provider class now defines verify_webhook as an abstract method. Every adapter must implement it. Providers that do not offer signature verification use alternative validation (e.g., checking that the source IP matches known provider IPs).

8. SQL Injection Risks

Several early endpoints used string formatting in SQL queries:

python# BEFORE: SQL injection vulnerability
@router.get("/transactions")
async def list_transactions(status: str = None):
    query = f"SELECT * FROM transactions WHERE status = '{status}'"
    result = await db.execute(text(query))

python# AFTER: Parameterized queries (via ORM)
@router.get("/transactions")
async def list_transactions(status: str = None):
    query = select(Transaction)
    if status:
        query = query.where(Transaction.status == status)
    result = await db.scalars(query)

The migration to SQLAlchemy ORM models (covered in the SQLAdmin article) eliminated all raw SQL queries, which resolved the injection risk systemically rather than one query at a time.

Architectural Issues

9. SQLite Single-Writer Limitation

SQLite allows only one writer at a time. With concurrent API requests, write operations would serialize, creating a bottleneck. More critically, the WAL (Write-Ahead Logging) mode could serve stale reads to new connections.

This was not a bug to fix but an architecture to replace. The SQLite-to-PostgreSQL migration (covered in article 055) was the solution. PostgreSQL handles concurrent writes with MVCC (Multi-Version Concurrency Control), eliminating both the write bottleneck and the stale-read problem.

10. Missing Idempotency

Payment operations must be idempotent. If a network timeout causes a retry, the second request should not create a second payment. Several endpoints lacked idempotency enforcement:

python# AFTER: Idempotency key support
@router.post("/payments")
async def create_payment(
    data: PaymentCreate,
    idempotency_key: str = Header(None, alias="Idempotency-Key"),
):
    if idempotency_key:
        existing = await get_by_idempotency_key(idempotency_key)
        if existing:
            return existing  # Return cached response

    payment = await process_payment(data)

    if idempotency_key:
        await store_idempotency_key(idempotency_key, payment, ttl=86400)

    return payment

Idempotency keys are stored for 24 hours. The same key always returns the same response, preventing duplicate charges.

11. Missing Event History

The system lacked a comprehensive event log for payment state changes. When a payment transitioned from pending to completed, only the final state was stored. There was no record of when each transition occurred or what triggered it.

python# AFTER: Payment event history
class PaymentEvent(Base):
    __tablename__ = "payment_events"

    id = Column(String, primary_key=True)
    transaction_id = Column(String, ForeignKey("transactions.id"), nullable=False)
    event_type = Column(String, nullable=False)  # created, processing, completed, failed
    previous_status = Column(String, nullable=True)
    new_status = Column(String, nullable=False)
    provider_data = Column(JSON, nullable=True)  # Raw provider response
    created_at = Column(DateTime, server_default=func.now())

async def transition_payment_status( transaction_id: str, new_status: str, provider_data: dict = None, ): tx = await get_transaction(transaction_id) previous = tx.status BLANK event = PaymentEvent( id=generate_id(), transaction_id=transaction_id, event_type=f"status_{new_status}", previous_status=previous, new_status=new_status, provider_data=provider_data, ) db.add(event) BLANK tx.status = new_status await db.commit() ```

12. Missing Health Monitoring

There was no health check endpoint and no provider health monitoring. If Stripe went down, the system would keep routing payments to Stripe and returning errors.

python# AFTER: Health check endpoint
@router.get("/health")
async def health_check():
    checks = {
        "database": await check_database_health(),
        "redis": await check_redis_health(),
        "providers": await check_provider_health(),
    }

    all_healthy = all(c["status"] == "healthy" for c in checks.values())

    return {
        "status": "healthy" if all_healthy else "degraded",
        "checks": checks,
        "timestamp": datetime.utcnow().isoformat(),
    }

async def check_provider_health() -> dict: """Check the health of all active providers.""" results = {} for name, provider in provider_registry.items(): try: status = await asyncio.wait_for( provider.health_check(), timeout=5.0 ) results[name] = {"status": "healthy", "latency_ms": status.latency} except (asyncio.TimeoutError, Exception) as e: results[name] = {"status": "unhealthy", "error": str(e)} BLANK healthy_count = sum(1 for r in results.values() if r["status"] == "healthy") return { "status": "healthy" if healthy_count == len(results) else "degraded", "providers": results, } ```

The Four-Phase Remediation Roadmap

We prioritized fixes into four phases:

Phase	Focus	Timeline	Items
Phase 1	Critical security	Immediate	Environment mismatch, seed data check, credential validation
Phase 2	Security hardening	Week 1	Encryption salt, API key masking, rate limiting, webhook verification
Phase 3	Architecture	Weeks 2-4	PostgreSQL migration, idempotency, event history
Phase 4	Monitoring	Weeks 4-6	Health endpoints, provider monitoring, alerting

Phase 1 was completed within hours of the audit. These were issues that could cause financial harm. Phase 2 was completed within the first week. Phase 3 aligned with the planned PostgreSQL migration. Phase 4 was rolled into the production readiness work.

What We Learned

Speed and security are in tension, and speed usually wins early on. The first 60 sessions prioritized feature development. Security was not ignored, but it was not the primary focus. The audit showed the cost of that trade-off. For a payment platform, we should have conducted the audit after session 30, not session 60.

Raw SQL is a liability. Every raw SQL query is a potential injection vector. The ORM migration eliminated an entire class of vulnerabilities. If you are building a new project, start with ORM from day one.

Static encryption salts defeat the purpose of encryption. A static salt is only marginally better than no salt. Always use random salts and prepend them to the ciphertext.

Webhook verification is not optional for payment platforms. An unverified webhook can be spoofed, causing the system to mark a payment as completed when it was not. Every provider webhook must be verified.

Publish your security findings. The instinct is to hide weaknesses. But the fintech community benefits from transparency. If this article prevents another team from making the same mistakes, it was worth publishing.

This article is part of the "How We Built 0fee.dev" series. 0fee.dev is a payment orchestrator covering 53+ providers across 200+ countries, built by Juste A. GNIMAVO and Claude from Abidjan with zero human engineers. Follow the series for the complete build story.