Chapter 3: The Database

The database is the foundation of everything. Every score, every deal, every alert, every chat message — it all lives in PostgreSQL. Understanding the database means understanding the application.

Why PostgreSQL?

Astrelo uses PostgreSQL (specifically AWS Aurora PostgreSQL, which is Amazon’s managed version). PostgreSQL was chosen over alternatives like MySQL or MongoDB for several reasons:

JSONB columns — PostgreSQL can store and query JSON data natively. This is used extensively for flexible data like context_data on alerts, criteria on ICP profiles, and config_data on settings. You get the flexibility of a document database with the reliability of a relational one.
Array columns — TEXT[] and FLOAT8[] columns store lists directly (tech stacks, embeddings, event subscriptions) without needing a separate table for each.
Partial indexes — indexes that only cover rows matching a condition. For example, WHERE signal_hash IS NOT NULL indexes only alerts that have a hash, making dedup checks lightning fast without indexing every row.
gen_random_uuid() — built-in UUID generation for primary keys, no extension needed.

Connection Pooling: Why We Don’t Open a New Connection Each Time

Every SQL query requires a database connection. Creating one takes 50-100ms (TCP handshake, SSL negotiation, authentication). If every API request opened a new connection, you’d waste 50-100ms before any actual work happens.

Connection pooling solves this. At startup, the application creates a “pool” of 30 connections:


// src/infrastructure/database/connection.ts, lines 55-68
const poolConfig = {
  host: _readEnv('DB_HOST'),
  port: _readEnvNum('DB_PORT', 5432),
  database: _readEnv('DB_NAME'),
  user: _readEnv('DB_USER'),
  password: _readEnv('DB_PASSWORD'),
  max: 30,                        // Maximum 30 simultaneous connections
  min: 2,                         // Keep at least 2 alive even when idle
  idleTimeoutMillis: 30000,       // Close idle connections after 30 seconds
  connectionTimeoutMillis: 30000, // Give up connecting after 30 seconds
  keepAlive: true,                // Send TCP keepalive packets
};

When a query needs a connection, it borrows one from the pool. When the query finishes, the connection goes back to the pool for the next request. This is like a library lending books — you don’t buy a new book every time you want to read.


Request 1: borrows connection #7  → query → returns #7 to pool
Request 2: borrows connection #7  → query → returns #7 to pool  (same connection!)
Request 3: borrows connection #12 → query → returns #12 to pool

If all 30 connections are in use and request #31 arrives, it waits in a queue until one frees up. If it waits longer than connectionTimeoutMillis (30 seconds), it fails with a timeout error.

The Hot Reload Problem


// src/infrastructure/database/connection.ts, lines 115-118
declare global {
  var _pgPool: Pool | undefined;
  var _pgPoolInitialized: boolean | undefined;
}

In development, Next.js hot-reloads your code every time you save a file. Without special handling, each reload would create a new pool of 30 connections. After 10 saves, you’d have 300 connections — your database would refuse new connections.

The solution: store the pool on globalThis (a JavaScript global that survives hot reloads). Before creating a new pool, check if one already exists:


if (globalThis._pgPool) {
  return globalThis._pgPool;  // Reuse existing pool
}
// ... create new pool only if none exists
globalThis._pgPool = pool;

This is a common pattern in Next.js applications — you’ll see it in any project that uses database connections in development.

Parameterized Queries: The #1 Security Rule

Every database query in Astrelo uses parameterized queries. This is non-negotiable.


// DANGEROUS — SQL Injection vulnerability:
const email = req.body.email;  // User input!
pool.query(`SELECT * FROM users WHERE email = '${email}'`);
 
// What if email = "'; DROP TABLE users; --" ?
// The query becomes:
// SELECT * FROM users WHERE email = ''; DROP TABLE users; --'
// Your entire users table is deleted.


// SAFE — Parameterized query:
pool.query('SELECT * FROM users WHERE email = $1', [email]);
 
// $1 is ALWAYS treated as a data value, never as SQL code.
// Even if email = "'; DROP TABLE users; --", it just searches
// for a user with that literal string as their email (finds nothing).

The $1, $2, $3 placeholders are PostgreSQL’s parameterization syntax. The values are passed as a separate array. The database engine handles escaping — you never need to sanitize input yourself.

Every query in the codebase follows this pattern. You’ll see it hundreds of times:


pool.query('SELECT * FROM companies WHERE user_id = $1 AND domain = $2', [userId, domain]);
pool.query('UPDATE deals SET outcome = $1 WHERE id = $2 AND user_id = $3', ['won', dealId, userId]);
pool.query('DELETE FROM tasks WHERE id = $1 AND user_id = $2', [taskId, userId]);

Multi-Tenancy: How One Database Serves Many Users

Astrelo is multi-tenant — many users share the same database tables. User A’s deals and User B’s deals are in the same deals table. They never see each other’s data because every query includes WHERE user_id = $1.


-- User A sees only their companies:
SELECT * FROM companies WHERE user_id = 'user-a-uuid';
 
-- User B sees only their companies:
SELECT * FROM companies WHERE user_id = 'user-b-uuid';
 
-- Same table, completely isolated views.

This is called row-level multi-tenancy. The alternative is schema-level multi-tenancy (separate database schemas per user) or database-level (separate databases per user). Row-level is simplest but requires discipline — if you forget WHERE user_id = $1 on even one query, you’ve created a data leak.

To prevent this, the userId is always the first parameter, passed in from requireAuth:


async function handler(req: AuthenticatedRequest, res: NextApiResponse) {
  const userId = req.userId!;  // From middleware — guaranteed correct
  const companies = await pool.query(
    'SELECT * FROM companies WHERE user_id = $1',
    [userId]  // Always filtered by authenticated user
  );
}

The Schema: How 60 Tables Relate

The tables organize into logical groups. Here are the core relationships:


users (the root of everything)
  │
  ├── icp_profiles (Ideal Customer Profile configurations)
  │     └── company_dismissals (companies excluded from this ICP)
  │
  ├── companies (prospect companies from CRM)
  │     ├── contacts (people at those companies)
  │     │     └── contact_scores (persona importance/accessibility scores)
  │     ├── scores (ML fit/intent/composite scores)
  │     ├── deals (sales opportunities)
  │     │     └── deal_contacts (many-to-many: which contacts on which deals)
  │     ├── intent_signals (buying signals from web/CRM)
  │     ├── tasks (AI-generated or manual action items)
  │     └── touch_events (emails, calls, meetings — all interactions)
  │
  ├── realtime_alerts (proactive intelligence notifications)
  │     └── pending_autonomous_actions (AI-drafted actions awaiting approval)
  │
  ├── deal_predictions (ML predictions: slip risk, loss risk, champion cold)
  │
  ├── chat_conversations → chat_messages (Cosmo chat history)
  │
  ├── crm_connections (HubSpot/Salesforce OAuth tokens)
  │
  └── slack_connections (Slack workspace integration)

Primary Keys and UUIDs

Every table uses UUID primary keys instead of auto-incrementing integers:


id UUID PRIMARY KEY DEFAULT gen_random_uuid()

Why UUIDs?

No sequence collisions: Multiple servers can create records simultaneously without coordinating
No information leakage: Sequential IDs reveal how many records exist (/api/users/12345 tells you there are at least 12,345 users)
Safe for client exposure: Showing a UUID in a URL doesn’t let someone guess other IDs

The tradeoff: UUIDs are larger (16 bytes vs 4 bytes for an integer) and slightly slower for lookups. For Astrelo’s scale, this is negligible.

Foreign Keys and Referential Integrity

Foreign keys enforce relationships between tables:


-- A deal must belong to a valid user and company:
CREATE TABLE deals (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL REFERENCES users(id),     -- Must exist in users table
  company_id UUID REFERENCES companies(id),        -- Must exist in companies table
  ...
);

If you try to insert a deal with a company_id that doesn’t exist in the companies table, PostgreSQL refuses with a foreign key violation error. This prevents “orphaned” data — deals pointing to companies that were deleted.

Some foreign keys have cascade behaviors:


-- From migration 296:
trigger_alert_id UUID REFERENCES realtime_alerts(id) ON DELETE SET NULL

ON DELETE SET NULL means: if the referenced alert is deleted, don’t delete this action — just set trigger_alert_id to NULL. This is how we can safely clean up demo alerts without breaking the pending actions table.

The Migration System: How the Schema Evolves

When Astrelo was first built, there might have been 10 tables. Now there are 60. How did it grow from 10 to 60 without losing data? Migrations.

A migration is a SQL file that makes one change to the schema. Each migration has a numeric ID and runs exactly once:


src/infrastructure/database/migrations/
├── 200_consolidate_jobs_table.sql        ← Destructive (V2 consolidation)
├── 201_consolidate_intent_signals.sql    ← Destructive
├── ...
├── 206_companies_external_id_unique.sql  ← Safe (additive only)
├── 207_add_contact_enrichment.sql        ← Safe
├── ...
├── 296_create_pending_autonomous_actions.sql  ← Latest

Why They Start at 200

Astrelo underwent a major schema consolidation. Before it, there were migrations 103-153 (the “V1” schema). These got messy — multiple overlapping tables for jobs, signals, and scoring. The 200-series was a clean restart:

Migrations 200-205 are destructive — they DROP TABLE old fragmented tables and CREATE TABLE unified replacements
Migrations 206+ are safe — they only add things, never drop

The Safety Pattern: IF NOT EXISTS

Every safe migration is idempotent — running it twice produces the same result as running it once:


-- Safe: creates the table only if it doesn't already exist
CREATE TABLE IF NOT EXISTS deal_predictions (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  ...
);
 
-- Safe: adds the column only if it doesn't already exist
ALTER TABLE realtime_alerts
  ADD COLUMN IF NOT EXISTS ai_content JSONB;
 
-- Safe: creates the index only if it doesn't already exist
CREATE INDEX IF NOT EXISTS idx_alerts_user_unviewed
  ON realtime_alerts (user_id)
  WHERE viewed_at IS NULL AND dismissed_at IS NULL;

This is critical in production. If a deployment fails halfway through migrations, you can restart without errors — already-applied changes are silently skipped.

How the Runner Works

The migration runner (scripts/migrate.js) does this:

Creates a tracking table if it doesn’t exist:


CREATE TABLE IF NOT EXISTS schema_migrations (
  filename VARCHAR(255) PRIMARY KEY,
  applied_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Queries which migrations have already been applied:
```
SELECT filename FROM schema_migrations;
```
For each migration in the SAFE_MIGRATIONS array that hasn’t been applied:
- Opens a transaction (BEGIN)
- Runs the SQL file
- Records the migration: INSERT INTO schema_migrations (filename) VALUES ($1)
- Commits the transaction (COMMIT)
- If anything fails: ROLLBACK — the migration and the tracking record are both undone

The recording and the migration happen in the same transaction. This is a crucial correctness guarantee: you can never have a migration recorded as applied when its SQL was rolled back, or vice versa.

Common SQL Patterns in the Codebase

The Upsert Pattern (INSERT … ON CONFLICT)

Scores are recalculated periodically. Instead of checking “does a score exist for this company?” and then deciding between INSERT and UPDATE, we use an upsert:


INSERT INTO scores (user_id, company_id, fit_score, intent_score, composite_score, scored_at)
VALUES ($1, $2, $3, $4, $5, NOW())
ON CONFLICT (user_id, company_id) DO UPDATE SET
  fit_score = EXCLUDED.fit_score,
  intent_score = EXCLUDED.intent_score,
  composite_score = EXCLUDED.composite_score,
  scored_at = NOW();

ON CONFLICT (user_id, company_id) — if a row with this user+company pair already exists:

DO UPDATE SET — update the existing row
EXCLUDED.fit_score — refers to the value we tried to insert

This is one SQL statement that handles both “create new” and “update existing.” The scores table has a UNIQUE(user_id, company_id) constraint that makes this possible.

Batch Inserts with UNNEST

When inserting many rows at once (like webhook events or alert matches), building individual INSERT statements would be slow. Instead, we use PostgreSQL’s UNNEST:


INSERT INTO webhook_events (user_id, provider, event_id, event_type, entity_type, entity_id, payload)
SELECT $1, unnest($2::text[]), unnest($3::text[]), unnest($4::text[]),
       unnest($5::text[]), unnest($6::text[]), unnest($7::jsonb[])
ON CONFLICT (provider, event_id) WHERE event_id IS NOT NULL DO NOTHING

UNNEST takes an array and expands it into rows. If you pass arrays of 10 elements each, you get 10 rows inserted in one statement. This is dramatically faster than 10 separate INSERT statements because it’s one network round-trip to the database instead of ten.

Partial Indexes

Regular indexes cover every row. Partial indexes only cover rows matching a condition:


CREATE UNIQUE INDEX idx_realtime_alerts_signal_hash
  ON realtime_alerts (signal_hash)
  WHERE signal_hash IS NOT NULL;

This index:

Only includes rows where signal_hash is not null (ignoring nulls)
Enforces uniqueness only on those rows
Is much smaller than a full index (fewer rows to store)
Makes dedup lookups fast while allowing multiple rows with NULL signal_hash

The JSONB Pattern

Several tables use JSONB columns for flexible, schema-less data:


-- ICP criteria can have any structure:
criteria JSONB DEFAULT '{}'
 
-- Alert context is different per trigger type:
context_data JSONB
 
-- Action parameters vary by action type:
action_params JSONB NOT NULL DEFAULT '{}'

JSONB is binary JSON — PostgreSQL stores it in an optimized format that supports indexing and querying:


-- Query inside a JSONB column:
SELECT * FROM realtime_alerts
WHERE context_data->>'companyName' = 'Acme Corp';
 
-- Check if a JSONB field exists:
SELECT * FROM configs
WHERE config_data ? 'autoExecuteEnabled';

The flexibility of JSONB means we don’t need a new table or migration every time we want to store a new piece of context data. The tradeoff: there’s no schema enforcement on JSONB columns — the application code is responsible for writing the correct structure.

Key Takeaways

Connection pooling reuses database connections instead of creating new ones per request. The pool has 30 slots and connections are borrowed/returned.
Parameterized queries ($1, $2) prevent SQL injection by separating code from data. This is the single most important security pattern.
Multi-tenancy is enforced by WHERE user_id = $1 on every query. The authenticated user’s ID comes from JWT middleware.
Migrations evolve the schema incrementally. Each migration runs once, in a transaction, with IF NOT EXISTS safety.
Upserts (ON CONFLICT DO UPDATE) handle create-or-update in one statement.
JSONB columns store flexible data without schema changes, used for alert context, ICP criteria, and action parameters.

Next chapter: we’ll look at how the frontend fetches and displays data using React Query — the library that manages all communication between the browser and the API.