CRM integration is where Astrelo gets its data. Without HubSpot or Salesforce, there are no companies, no contacts, no deals — and no scores. This chapter traces the complete integration from “Connect HubSpot” button click to “500 companies synced.”
OAuth: The Three-Legged Dance
Astrelo doesn’t store your HubSpot password. Instead, it uses OAuth 2.0 — a protocol where HubSpot itself verifies your identity and grants Astrelo limited access. This is the same flow you use when you click “Sign in with Google” on a website.
The flow has three “legs”:
1. User clicks "Connect HubSpot" → Astrelo redirects to HubSpot
2. User logs into HubSpot and clicks "Grant Access" → HubSpot redirects back with a code
3. Astrelo exchanges the code for access tokens → Tokens stored in databaseLeg 1: Authorization Request
// src/pages/api/integrations/hubspot/authorize.ts, lines 30-80
// 1. Check if user already has a valid connection
const existingConnection = await pool.query(
`SELECT id, account_name, expires_at, is_active
FROM crm_connections
WHERE user_id = $1 AND provider = 'hubspot' AND is_active = true`,
[req.userId]
);
// Skip OAuth if token is still valid and no force reconnect
if (req.query.force !== 'true' && isTokenValid) {
res.redirect('/app?hubspot_already_connected=true');
return;
}
// 2. Build anti-CSRF state parameter
const state = Buffer.from(JSON.stringify({
userId: req.userId,
timestamp: Date.now()
})).toString('base64');
// 3. Redirect to HubSpot's authorization page
const authUrl = new URL('https://app.hubspot.com/oauth/authorize');
authUrl.searchParams.set('client_id', clientId);
authUrl.searchParams.set('redirect_uri', redirectUri);
authUrl.searchParams.set('scope', scopes);
authUrl.searchParams.set('state', state);
authUrl.searchParams.set('prompt', 'consent');
res.redirect(302, authUrl.toString());The state parameter is critical security. It encodes the user’s ID and a timestamp, Base64-encoded. When HubSpot redirects back, we decode and verify this state — if someone tries to redirect a different user’s session to this callback, the state won’t match and we reject it. This prevents CSRF attacks (Cross-Site Request Forgery).
Leg 2: Token Exchange
After the user grants access, HubSpot redirects to /api/integrations/hubspot/callback with an authorization code:
// src/pages/api/integrations/hubspot/callback.ts, lines 39-99
// 1. Validate state (10-minute max age)
const stateData = JSON.parse(Buffer.from(state, 'base64').toString());
const STATE_MAX_AGE = 10 * 60 * 1000;
if (Date.now() - stateData.timestamp > STATE_MAX_AGE) {
return res.redirect('/app?error=oauth_state_expired');
}
// 2. Exchange code for tokens
const formData = new URLSearchParams();
formData.append('grant_type', 'authorization_code');
formData.append('client_id', process.env.HUBSPOT_CLIENT_ID!);
formData.append('client_secret', process.env.HUBSPOT_CLIENT_SECRET!);
formData.append('redirect_uri', redirectUri);
formData.append('code', code);
const tokenResponse = await axios.post(
'https://api.hubapi.com/oauth/v1/token',
formData.toString(),
{ headers: { 'Content-Type': 'application/x-www-form-urlencoded' } }
);
const { access_token, refresh_token, expires_in } = tokenResponse.data;
// 3. Get account info
const accountInfoResponse = await axios.get(
'https://api.hubapi.com/oauth/v1/access-tokens/' + access_token
);
const { hub_id, user, hub_domain } = accountInfoResponse.data;
// 4. Store connection
await crmConnectionService.createOrUpdate({
user_id: userId,
provider: 'hubspot',
access_token,
refresh_token,
expires_at: new Date(Date.now() + expires_in * 1000),
metadata: { hub_id, user, hub_domain, scopes: tokenResponse.data.scope }
});The callback also creates a default ICP profile if the user doesn’t have one, so scoring can begin immediately after the first sync.
Leg 3: Automatic Token Refresh
HubSpot access tokens expire after 6 hours. The token manager transparently refreshes them:
// src/infrastructure/providers/hubspot/tokenManager.ts, lines 18-52
async getValidAccessToken(userId: string): Promise<string> {
const connection = await crmConnectionService.getByUserIdAndProvider(userId, 'hubspot');
// Check if token will expire within 5 minutes
const expiresAt = new Date(connection.expires_at);
const willExpireSoon = new Date(Date.now() + 5 * 60 * 1000);
if (expiresAt <= willExpireSoon) {
return await this.refreshAccessToken(userId, connection.refresh_token!);
}
return connection.access_token;
}The 5-minute buffer is deliberate. If a token has 4 minutes left and a sync takes 5 minutes, the token would expire mid-sync. By refreshing proactively, we avoid mid-operation failures.
The Sync Pipeline
Once connected, the sync pipeline pulls all companies, contacts, and deals from HubSpot into Astrelo’s tables.
Step 1: Fetch All Companies (Paginated)
HubSpot’s API returns at most 100 companies per request. We paginate through all of them:
// src/infrastructure/providers/hubspot/client.ts, lines 129-170
async listAllCompanies(userId: string, limit: number = 100, after?: string) {
const client = await this.getAxiosInstance(userId);
const params = {
limit,
properties: [
'name', 'domain', 'numberofemployees', 'annualrevenue', 'country',
'state', 'city', 'industry', 'description', 'phone',
'linkedin_company_page', 'hubspot_owner_id', 'lifecyclestage'
].join(',')
};
if (after) params.after = after;
const response = await client.get(`/crm/v3/objects/companies`, { params });
return {
companies: response.data.results.map(r => this.mapHubSpotToCompany(r.properties, r.id)),
paging: response.data.paging
};
}Step 2: Map HubSpot Fields to Astrelo Schema
HubSpot property names don’t match Astrelo’s column names. The mapper translates:
HubSpot Property → Astrelo Column
─────────────────────────────────────────────
companies.name → companies.company_name
companies.domain → companies.domain (normalized)
companies.numberofemployees → companies.employee_count (parsed)
companies.annualrevenue → companies.annual_revenue + revenue_range
companies.industry → companies.industry + NAICS/SIC codes
companies.country + state → companies.hq_region ("US-CA")
companies.hubspot_owner_id → companies.crm_rep_id (via owner mapping)The industry field gets special treatment — it’s mapped to NAICS and SIC codes for the embedding system (Chapter 6).
Step 3: Batch Insert with UNNEST
Instead of 500 individual INSERT statements, we use PostgreSQL’s UNNEST to insert all companies in one query:
// src/infrastructure/providers/hubspot/syncService.ts (simplified)
const result = await pool.query(`
INSERT INTO companies (
domain, company_name, industry, naics, employee_count,
revenue_range, hq_country, hq_region, external_source,
external_provider_id, user_id, synced_at
)
SELECT
unnest($1::text[]), -- domains array
unnest($2::text[]), -- names array
unnest($3::text[]), -- industries array
unnest($4::text[]), -- naics codes array
unnest($5::int[]), -- employee counts array
unnest($6::text[]), -- revenue ranges array
unnest($7::text[]), -- countries array
unnest($8::text[]), -- regions array
'hubspot',
unnest($9::text[]), -- hubspot IDs array
$10::uuid, -- user_id (same for all)
NOW()
ON CONFLICT (user_id, domain) WHERE domain IS NOT NULL
DO UPDATE SET
company_name = EXCLUDED.company_name,
industry = EXCLUDED.industry,
employee_count = EXCLUDED.employee_count,
synced_at = NOW()
RETURNING domain, (xmax = 0) AS was_insert
`, [domains, names, industries, naics, employeeCounts, ...]);
stats.newCompanies = result.rows.filter(r => r.was_insert).length;
stats.updatedCompanies = result.rows.filter(r => !r.was_insert).length;The RETURNING domain, (xmax = 0) AS was_insert trick deserves explanation. In PostgreSQL, xmax is a system column that tracks the transaction ID of the last update. For newly inserted rows, xmax = 0. For updated rows (the ON CONFLICT DO UPDATE path), xmax > 0. This lets us count inserts vs. updates in a single query.
Step 4: Entity Resolution for Contacts
Contacts need to be linked to companies. This is harder than it sounds — a contact might reference a HubSpot company ID that doesn’t match any Astrelo company UUID.
The sync uses a two-pass resolution strategy:
// src/infrastructure/providers/hubspot/syncService.ts, lines 446-600
// Pass 1: Match by HubSpot company ID
const companyByHubspotId = new Map<string, string>();
const companyResult = await pool.query(`
SELECT id, external_provider_id FROM companies
WHERE external_source = 'hubspot'
AND external_provider_id = ANY($1::text[])
`, [hubspotCompanyIds]);
for (const row of companyResult.rows) {
companyByHubspotId.set(row.external_provider_id, row.id);
}
// Pass 2: Match by email domain (fallback)
const companyByDomain = new Map<string, string>();
const domainResult = await pool.query(`
SELECT id, REGEXP_REPLACE(LOWER(domain), '^www\\.', '') as normalized_domain
FROM companies
WHERE REGEXP_REPLACE(LOWER(domain), '^www\\.', '') = ANY($1::text[])
`, [emailDomains]);For each contact:
- Try matching by HubSpot company ID (most reliable)
- If no match, extract the email domain (
sarah@cloudreach.io→cloudreach.io) and match against company domains
After the bulk insert, a final pass fixes “orphaned” contacts — those that didn’t match during sync but whose email domains now match a company:
WITH domain_matches AS (
SELECT ct.id as contact_id, c.id as company_id
FROM contacts ct
INNER JOIN companies c ON
REGEXP_REPLACE(LOWER(c.domain), '^www\\.', '') = LOWER(SPLIT_PART(ct.email, '@', 2))
WHERE ct.company_id IS NULL AND ct.email LIKE '%@%'
)
UPDATE contacts SET company_id = dm.company_id
FROM domain_matches dm WHERE contacts.id = dm.contact_idThis is entity resolution — the problem of determining that “HubSpot Company #12345” and “Astrelo Company abc-def-ghi” represent the same real-world entity. It’s solved through a combination of provider IDs and domain matching.
Key Takeaways
-
OAuth 2.0 keeps credentials out of Astrelo. The user authenticates directly with HubSpot, and Astrelo only stores access/refresh tokens.
-
Token refresh happens proactively (5-minute buffer) to prevent mid-operation failures.
-
Batch UNNEST inserts with
ON CONFLICThandle the entire sync in minimal database round-trips. -
Entity resolution uses two strategies: provider ID matching (reliable) and email domain matching (fallback).
-
The
xmax = 0trick distinguishes inserts from updates in a single RETURNING clause.
Next chapter: how Salesforce integration differs from HubSpot — same patterns, different API surface.