Astrelo runs 18 scheduled jobs that power everything from alert evaluation to pipeline snapshots. These jobs are invisible to the user but essential β without them, alerts wouldnβt fire, predictions wouldnβt update, and the daily digest wouldnβt arrive.
The Infrastructure: AWS EventBridge β API Routes
Cron jobs arenβt traditional Unix cron. Theyβre AWS EventBridge rules that send HTTP POST requests to Next.js API routes:
AWS EventBridge Rule (schedule: rate(1 minute))
β
HTTP POST /api/cron/process-alert-evaluation
Header: Authorization: Bearer ${CronSecret}
β
Next.js API route handler
β
Database operations, LLM calls, external API callsThe deploy/cron-schedules.yml CloudFormation template provisions the full infrastructure:
# EventBridge connection with auth header
CronConnection:
Type: AWS::Events::Connection
Properties:
AuthorizationType: API_KEY
AuthParameters:
ApiKeyAuthParameters:
ApiKeyName: Authorization
ApiKeyValue: !Sub 'Bearer ${CronSecret}'
# API destination with rate limit
CronApiDestination:
Type: AWS::Events::ApiDestination
Properties:
ConnectionArn: !GetAtt CronConnection.Arn
HttpMethod: POST
InvocationEndpoint: !Sub '${AppUrl}/api/cron/*'
InvocationRateLimitPerSecond: 5
# Individual rules (one per cron job)
ProcessAlertEvaluation:
Type: AWS::Events::Rule
Properties:
ScheduleExpression: 'rate(1 minute)'
Targets:
- Arn: !GetAtt CronApiDestination.Arn
HttpParameters:
PathParameterValues:
- 'process-alert-evaluation'Why HTTP routes instead of Lambda functions? Because the codebase is a Next.js monolith deployed on AWS Amplify. API routes share the same database pool, same service layer, same utility functions. A separate Lambda would need to duplicate all of that. HTTP-based cron keeps everything in one deployable unit.
The 18 Scheduled Jobs
| Job | Frequency | What It Does |
|---|---|---|
process-alert-evaluation | Every 1 min | Claims and processes alert_evaluation jobs from the queue |
process-alert-ai-content | Every 1 min | Generates AI analysis for alerts via 70B LLM |
process-webhooks | Every 1 min | Processes queued webhook events |
process-playbooks | Every 5 min | Executes pending playbook steps |
proactive-triggers | Every 15 min | Evaluates proactive trigger conditions |
execute-pending-actions | Every 5 min | Auto-executes approved autonomous actions |
behavioral-triggers | Every 4 hours | Detects activity slumps and behavioral patterns |
rebuild-rep-profiles | 2:00 AM daily | Recalculates rep performance profiles from deal history |
cleanup-alert-data | 3:00 AM daily | Expires old alerts, cleans stale predictions |
discovery-daily | 3:00 AM daily | Discovers new prospect companies via ML |
rebuild-persona-profiles | 3:30 AM daily | Recalculates persona winning profiles |
evaluate-segments | 4:00 AM daily | Evaluates exploratory segments for absorption/abandonment |
recalculate-learned-weights | 4:30 AM daily | Re-learns per-user intent weights from deal outcomes |
pipeline-snapshot | 5:00 AM daily | Captures daily pipeline state for trending |
process-deal-predictions | 5:30 AM daily | Generates slip/loss/champion predictions |
revenue-at-risk | 6:00 AM daily | Calculates revenue at risk |
cfo-briefing | 7:30 AM daily | Generates executive briefing |
tasks-daily-digest | 8:00 AM daily | Sends daily task digest to Slack |
The daily jobs run in a specific order: rebuild profiles β clean up β discover β snapshot β predict β alert. Each depends on the previous stepβs output: you rebuild profiles before predicting, predict before alerting, alert before digesting.
Authentication Pattern
Every cron endpoint verifies the secret:
const CRON_SECRET = process.env.CRON_SECRET;
export default async function handler(req: NextApiRequest, res: NextApiResponse) {
if (req.method !== 'POST') {
return res.status(405).json({ error: 'Method not allowed' });
}
const authHeader = req.headers.authorization;
if (CRON_SECRET && authHeader !== `Bearer ${CRON_SECRET}`) {
return res.status(401).json({ error: 'Unauthorized' });
}
// ... process jobs
}The CRON_SECRET && check means: if the secret isnβt configured (local dev), skip auth. In production, the secret is always set.
Job Claiming: SKIP LOCKED
The critical pattern for concurrent-safe job processing:
UPDATE jobs SET status = 'processing', started_at = NOW()
WHERE id IN (
SELECT id FROM jobs
WHERE job_type = 'alert_evaluation'
AND status = 'pending'
AND created_at > NOW() - INTERVAL '1 hour'
ORDER BY created_at ASC
FOR UPDATE SKIP LOCKED
LIMIT 10
)
RETURNING *Letβs break this apart:
FOR UPDATEβ Locks selected rows so no other transaction can claim themSKIP LOCKEDβ If a row is already locked by another transaction, skip it instead of waitingLIMIT 10β Claim a batch of 10 jobs at a time
This is an atomic claim-and-update in a single query. If two cron invocations overlap (EventBridge fires before the previous run completes), they safely claim non-overlapping batches. No race conditions, no double-processing.
The 1-Hour Staleness Guard
AND created_at > NOW() - INTERVAL '1 hour'Jobs older than 1 hour are ignored. This prevents the system from processing stale events that no longer reflect the current state of the CRM. If a webhook event is an hour old, the deal may have changed again β processing the old event could produce a misleading alert.
Retry Logic
When a job fails, itβs retried up to 3 times:
const attempts = (job.attempts || 0) + 1;
const maxAttempts = job.max_attempts || 3;
await pool.query(
`UPDATE jobs SET
status = CASE WHEN $2 >= $3 THEN 'failed' ELSE 'pending' END,
attempts = $2,
error = $4,
completed_at = CASE WHEN $2 >= $3 THEN NOW() ELSE NULL END
WHERE id = $1`,
[job.id, attempts, maxAttempts, errorMessage]
);On failure: if attempts < max, status goes back to 'pending' (will be retried next cron cycle). If attempts >= max, status becomes 'failed' (permanently). The error column stores the last error message for debugging.
Timeout Budgets: The Three-Tier Pattern
AWS Amplify serverless functions have a 30-second timeout. Cron handlers use a three-tier budget to stay within this limit:
const REQUEST_DEADLINE_MS = 25_000; // Total cron budget: 25 seconds
const USER_TIMEOUT_MS = 20_000; // Per-user timeout: 20 seconds
const PER_JOB_TIMEOUT_MS = 10_000; // Per-job timeout: 10 secondsTier 1 β Request deadline: Check before each loop iteration:
for (const user of users) {
if (Date.now() - startTime > REQUEST_DEADLINE_MS) {
timedOut = true;
break; // Stop gracefully β remaining users will be processed next invocation
}
// ... process user
}Tier 2 β Per-user timeout: Promise.race ensures no single user can hang the entire run:
await Promise.race([
processUserPredictions(userId),
new Promise<never>((_, reject) =>
setTimeout(() => reject(new Error(`User ${userId} timeout`)), USER_TIMEOUT_MS)
),
]);Promise.race returns whichever promise settles first. If processUserPredictions takes 25 seconds, the timeout promise rejects after 20 seconds, and we move to the next user.
Tier 3 β Per-job timeout: Inside the per-user processing, individual jobs also have timeouts (10 seconds for alert evaluation, 8 seconds for AI content generation).
Why Three Tiers?
Consider a scenario with 50 users. Without the request deadline, the cron might process 10 users and then get killed at 30 seconds by AWS β losing any partially-processed work. With the deadline, it processes as many users as it can in 25 seconds, responds with { timedOut: true }, and the remaining users are processed in the next invocation (1 minute later).
Without the per-user timeout, a single user with 500 stalled deals could consume the entire 25-second budget. The per-user cap ensures fair distribution across users.
Multi-User Processing Loop
Jobs that process all users follow a consistent pattern:
// Fetch all active users
const usersRes = await pool.query(
`SELECT DISTINCT user_id FROM [relevant_table]
WHERE [relevant_conditions]`
);
let processed = 0, failed = 0, timedOut = false;
for (const { user_id: userId } of usersRes.rows) {
if (Date.now() - startTime > REQUEST_DEADLINE_MS) {
timedOut = true;
break;
}
try {
const count = await Promise.race([
processUser(userId),
timeoutPromise(USER_TIMEOUT_MS),
]);
processed += count;
} catch (error) {
failed++;
logError({ source: 'cron-name', userId, error });
// Continue β don't let one user's failure block others
}
}
return res.status(200).json({
processed,
failed,
timedOut,
usersProcessed: usersRes.rows.length - (timedOut ? remainingUsers : 0),
durationMs: Date.now() - startTime,
});The response always includes operational metrics. A monitoring system can alert when failed > 0 or timedOut: true becomes chronic.
The Daily Job Sequence
Daily jobs are staggered across the early morning hours to avoid resource contention:
2:00 AM β rebuild-rep-profiles (recalculate from deal history)
3:00 AM β cleanup-alert-data (expire old alerts, clean stale data)
3:00 AM β discovery-daily (ML prospect discovery)
3:30 AM β rebuild-persona-profiles (persona winning profiles)
4:00 AM β evaluate-segments (exploratory segment lifecycle)
4:30 AM β recalculate-learned-weights (intent weight learning)
5:00 AM β pipeline-snapshot (capture daily state)
5:30 AM β process-deal-predictions (generate predictions)
6:00 AM β revenue-at-risk (risk calculations)
7:30 AM β cfo-briefing (executive summary)
8:00 AM β tasks-daily-digest (Slack digest)
9:00 AM β scheduled-reports (generated reports)The 30-minute gaps give each job time to complete before the next one starts. Rebuild β clean β discover β learn β snapshot β predict β brief β digest. Data flows downstream through the day.
Key Takeaways
-
18 scheduled jobs range from every-minute processing (alert evaluation) to daily batches (pipeline snapshots).
-
EventBridge β HTTP POST keeps cron handlers in the same codebase as the rest of the application β shared code, shared database pool.
-
SKIP LOCKED enables concurrent-safe job claiming without race conditions.
-
Three-tier timeout budgets (request β user β job) ensure graceful degradation under infrastructure limits.
-
Retry with max attempts gives transient failures a chance to recover while permanently marking persistent failures.
-
Daily jobs are staggered with 30-minute gaps to avoid resource contention and establish data flow dependencies.
Next chapter: rate limiting and quotas β how the system protects itself from abuse.