Chapter 26: Rate Limiting and Quotas

External API access needs guardrails. Without rate limiting, a single runaway script could exhaust database connections, rack up LLM costs, or overwhelm downstream providers. Astrelo uses two layers of protection: per-key rate limiting and monthly usage quotas.

Who Gets Rate Limited?

An important distinction: JWT users (browser sessions) are exempt from both rate limiting and quotas. These protections apply only to API key requests — external integrations hitting Astrelo’s public API.

Why? Browser users are inherently rate-limited by human interaction speed. A user clicking buttons in the UI will never generate 100 requests per second. API keys, however, can be used by automated scripts, so they need mechanical guardrails.

Layer 1: Per-Key Rate Limiting

The rate limiter lives in src/infrastructure/auth/dbRateLimiter.ts and uses a fixed-window algorithm backed by PostgreSQL:


// src/infrastructure/auth/dbRateLimiter.ts
const RATE_LIMIT = 100;       // Max requests per window
const WINDOW_SECONDS = 60;    // Window size: 1 minute
 
export async function checkRateLimit(apiKeyId: string): Promise<RateLimitResult> {
  const key = `ratelimit:${apiKeyId}`;
  const windowStart = new Date(Date.now() - WINDOW_SECONDS * 1000);
 
  const result = await pool.query(
    `INSERT INTO rate_limit_buckets (key, count, window_start)
     VALUES ($1, 1, NOW())
     ON CONFLICT (key) DO UPDATE SET
       count = CASE
         WHEN rate_limit_buckets.window_start < $2 THEN 1
         ELSE rate_limit_buckets.count + 1
       END,
       window_start = CASE
         WHEN rate_limit_buckets.window_start < $2 THEN NOW()
         ELSE rate_limit_buckets.window_start
       END
     RETURNING count, window_start`,
    [key, windowStart]
  );
 
  const count = result.rows[0].count;
  return {
    allowed: count <= RATE_LIMIT,
    remaining: Math.max(0, RATE_LIMIT - count),
    resetAt: new Date(result.rows[0].window_start.getTime() + WINDOW_SECONDS * 1000),
  };
}

How the Fixed-Window Works

The rate_limit_buckets table has three columns: key (PK), count, and window_start.

The single INSERT ... ON CONFLICT DO UPDATE query does everything atomically:

First request in a window: Inserts a new row with count: 1 and window_start: NOW()
Subsequent requests in the same window: window_start >= windowStart (the window hasn’t expired), so count increments by 1
First request in a new window: window_start < windowStart (the old window has expired), so count resets to 1 and window_start resets to NOW()

No cleanup job needed. No separate expiry mechanism. The window resets itself on the next request.

Fail-Open Design


try {
  const result = await checkRateLimit(apiKeyId);
  if (!result.allowed) {
    return res.status(429).json({ error: 'Rate limit exceeded' });
  }
} catch (error) {
  // Database error — fail open (allow the request)
  console.warn('[RateLimit] Check failed, allowing request:', error);
}

If the rate limit check itself fails (database connection dropped, query timeout), the request is allowed through. This is a deliberate choice: a rate limiter should protect the system, not become a single point of failure. Better to occasionally allow an extra request than to block legitimate traffic because the rate limit table is temporarily unreachable.

Standard Headers

Every API response includes rate limit headers:


X-RateLimit-Limit: 100
X-RateLimit-Remaining: 73
X-RateLimit-Reset: 1711234620

These follow the standard convention that API consumers expect. The Reset timestamp tells the client when the current window expires, so well-behaved clients can self-throttle.

Layer 2: Monthly Usage Quotas

Beyond per-minute rate limits, API keys have monthly usage budgets tracked in the usage_quotas table:


// src/infrastructure/auth/usageTracker.ts
export async function checkQuota(
  userId: string,
  endpoint: string
): Promise<QuotaCheckResult | null> {
  // Determine which quota category this endpoint falls under
  const category = getQuotaCategory(endpoint);
 
  const quota = await pool.query(
    `SELECT * FROM usage_quotas WHERE user_id = $1`,
    [userId]
  );
 
  if (!quota.rows[0]) return null; // No quota row = unlimited
 
  const row = quota.rows[0];
 
  // Safety-net auto-reset: if reset_at has already passed, reset inline
  if (new Date(row.reset_at) < new Date()) {
    await pool.query(
      `UPDATE usage_quotas SET
         current_api_count = 0,
         current_enrichment_count = 0,
         current_discovery_count = 0,
         reset_at = NOW() + INTERVAL '1 month'
       WHERE user_id = $1`,
      [userId]
    );
    return null; // After reset, all quotas are clear
  }
 
  // Check the specific category
  const current = row[`current_${category}_count`];
  const limit = row[`monthly_${category}_limit`];
 
  if (current >= limit) {
    return { exceeded: true, category, current, limit };
  }
 
  return null; // Within quota
}

Three Quota Categories


function getQuotaCategory(endpoint: string): 'api' | 'enrichment' | 'discovery' {
  if (endpoint.includes('bulk-enrich') || endpoint.includes('discovery/enrich')) {
    return 'enrichment';
  }
  if (endpoint.includes('goldilocks') || endpoint.includes('discovery/prospects')
      || endpoint.includes('recommendations') || endpoint.includes('ranking/calculate')) {
    return 'discovery';
  }
  return 'api'; // Default: general API usage
}

Category	Default Monthly Limit	What Counts
`api`	1,000	All API requests not in the other two categories
`enrichment`	50	Bulk enrichment and discovery enrichment
`discovery`	25	Goldilocks recommendations, prospect discovery, scoring

Enrichment and discovery have lower limits because they’re expensive operations — each enrichment call may trigger multiple web searches and LLM calls.

Usage Tracking

After the handler completes, usage is incremented:


export async function incrementUsage(
  userId: string,
  category: 'api' | 'enrichment' | 'discovery'
): Promise<void> {
  const column = `current_${category}_count`;
  await pool.query(
    `UPDATE usage_quotas SET ${column} = ${column} + 1, updated_at = NOW()
     WHERE user_id = $1`,
    [userId]
  );
}

This runs in the finally block of the auth middleware — fire-and-forget, never blocks the response:


finally {
  incrementUsage(userId, category).catch(() => {});
  logUsage(apiKeyId, userId, endpoint, method, statusCode).catch(() => {});
}

Both incrementUsage and logUsage swallow errors silently. If the increment fails, the user gets a free request. If logging fails, we lose one audit entry. Neither failure should affect the user’s experience.

The Full API Key Flow

Here’s the complete flow when an API key request arrives:


Request arrives with X-API-Key header
  ↓
1. verifyApiKey() — hash the key, look up in api_keys table
   → 401 if not found or inactive
   → 401 if expired
  ↓
2. checkRateLimit(apiKeyId)
   → 429 if rate limit exceeded
   → Set X-RateLimit-* headers
  ↓
3. checkQuota(userId, endpoint)
   → 429 if monthly quota exceeded
  ↓
4. handler() — process the actual request
  ↓
5. finally:
   → incrementUsage(userId, category)  [fire-and-forget]
   → logUsage(apiKeyId, userId, ...)    [fire-and-forget]

Rate limit is checked first (cheap — single DB query). Quota is checked second (slightly more expensive — reads the full quota row). The handler only runs if both pass.

The Usage Logs Table

Every API key request is logged for analytics:


api_usage_logs (8 cols):
  id UUID PK
  api_key_id UUID FK → api_keys
  user_id UUID FK → users
  endpoint VARCHAR(255)
  method VARCHAR(10)
  status_code INT
  response_time_ms INT
  created_at TIMESTAMPTZ

This table answers questions like:

“Which API key is generating the most traffic?” (GROUP BY api_key_id)
“Which endpoints are slowest?” (AVG(response_time_ms) GROUP BY endpoint)
“How many 500 errors are we returning?” (WHERE status_code = 500)

Alert Pipeline Rate Limiting

Beyond API-level protection, the alert pipeline has its own rate limit:


// Max 20 alerts per hour per user
const recentCount = await getRecentAlertCount(userId, 60);
if (recentCount >= 20) {
  return { alertsCreated: 0 };
}
 
const allowedCount = Math.max(0, 20 - recentCount);
const finalMatches = dedupedMatches.slice(0, allowedCount);

This prevents a CRM bulk update from flooding a user with hundreds of alerts. The cap is 20 per hour — high enough that real events get through, low enough that a mass import doesn’t destroy the signal-to-noise ratio.

Key Takeaways

JWT users are exempt — rate limiting and quotas protect against automated API abuse, not human UI usage.
Fixed-window rate limiting uses a single atomic upsert — no cleanup jobs, no expiry mechanisms, self-resetting.
Fail-open design means the rate limiter never becomes a single point of failure.
Three quota categories (API, enrichment, discovery) with different limits reflect the different costs of each operation.
Fire-and-forget tracking ensures usage logging never blocks or degrades the user’s request.
Alert pipeline rate limiting (20/hour) protects the notification feed from CRM bulk operations.

Next chapter: the final piece — how all of this gets deployed to AWS Amplify.