Skip to Content
Part 3: IntegrationsCh 11: HubSpot

CRM integration is where Astrelo gets its data. Without HubSpot or Salesforce, there are no companies, no contacts, no deals — and no scores. This chapter traces the complete integration from “Connect HubSpot” button click to “500 companies synced.”

OAuth: The Three-Legged Dance

Astrelo doesn’t store your HubSpot password. Instead, it uses OAuth 2.0 — a protocol where HubSpot itself verifies your identity and grants Astrelo limited access. This is the same flow you use when you click “Sign in with Google” on a website.

The flow has three “legs”:

1. User clicks "Connect HubSpot" → Astrelo redirects to HubSpot 2. User logs into HubSpot and clicks "Grant Access" → HubSpot redirects back with a code 3. Astrelo exchanges the code for access tokens → Tokens stored in database

Leg 1: Authorization Request

// src/pages/api/integrations/hubspot/authorize.ts, lines 30-80 // 1. Check if user already has a valid connection const existingConnection = await pool.query( `SELECT id, account_name, expires_at, is_active FROM crm_connections WHERE user_id = $1 AND provider = 'hubspot' AND is_active = true`, [req.userId] ); // Skip OAuth if token is still valid and no force reconnect if (req.query.force !== 'true' && isTokenValid) { res.redirect('/app?hubspot_already_connected=true'); return; } // 2. Build anti-CSRF state parameter const state = Buffer.from(JSON.stringify({ userId: req.userId, timestamp: Date.now() })).toString('base64'); // 3. Redirect to HubSpot's authorization page const authUrl = new URL('https://app.hubspot.com/oauth/authorize'); authUrl.searchParams.set('client_id', clientId); authUrl.searchParams.set('redirect_uri', redirectUri); authUrl.searchParams.set('scope', scopes); authUrl.searchParams.set('state', state); authUrl.searchParams.set('prompt', 'consent'); res.redirect(302, authUrl.toString());

The state parameter is critical security. It encodes the user’s ID and a timestamp, Base64-encoded. When HubSpot redirects back, we decode and verify this state — if someone tries to redirect a different user’s session to this callback, the state won’t match and we reject it. This prevents CSRF attacks (Cross-Site Request Forgery).

Leg 2: Token Exchange

After the user grants access, HubSpot redirects to /api/integrations/hubspot/callback with an authorization code:

// src/pages/api/integrations/hubspot/callback.ts, lines 39-99 // 1. Validate state (10-minute max age) const stateData = JSON.parse(Buffer.from(state, 'base64').toString()); const STATE_MAX_AGE = 10 * 60 * 1000; if (Date.now() - stateData.timestamp > STATE_MAX_AGE) { return res.redirect('/app?error=oauth_state_expired'); } // 2. Exchange code for tokens const formData = new URLSearchParams(); formData.append('grant_type', 'authorization_code'); formData.append('client_id', process.env.HUBSPOT_CLIENT_ID!); formData.append('client_secret', process.env.HUBSPOT_CLIENT_SECRET!); formData.append('redirect_uri', redirectUri); formData.append('code', code); const tokenResponse = await axios.post( 'https://api.hubapi.com/oauth/v1/token', formData.toString(), { headers: { 'Content-Type': 'application/x-www-form-urlencoded' } } ); const { access_token, refresh_token, expires_in } = tokenResponse.data; // 3. Get account info const accountInfoResponse = await axios.get( 'https://api.hubapi.com/oauth/v1/access-tokens/' + access_token ); const { hub_id, user, hub_domain } = accountInfoResponse.data; // 4. Store connection await crmConnectionService.createOrUpdate({ user_id: userId, provider: 'hubspot', access_token, refresh_token, expires_at: new Date(Date.now() + expires_in * 1000), metadata: { hub_id, user, hub_domain, scopes: tokenResponse.data.scope } });

The callback also creates a default ICP profile if the user doesn’t have one, so scoring can begin immediately after the first sync.

Leg 3: Automatic Token Refresh

HubSpot access tokens expire after 6 hours. The token manager transparently refreshes them:

// src/infrastructure/providers/hubspot/tokenManager.ts, lines 18-52 async getValidAccessToken(userId: string): Promise<string> { const connection = await crmConnectionService.getByUserIdAndProvider(userId, 'hubspot'); // Check if token will expire within 5 minutes const expiresAt = new Date(connection.expires_at); const willExpireSoon = new Date(Date.now() + 5 * 60 * 1000); if (expiresAt <= willExpireSoon) { return await this.refreshAccessToken(userId, connection.refresh_token!); } return connection.access_token; }

The 5-minute buffer is deliberate. If a token has 4 minutes left and a sync takes 5 minutes, the token would expire mid-sync. By refreshing proactively, we avoid mid-operation failures.

The Sync Pipeline

Once connected, the sync pipeline pulls all companies, contacts, and deals from HubSpot into Astrelo’s tables.

Step 1: Fetch All Companies (Paginated)

HubSpot’s API returns at most 100 companies per request. We paginate through all of them:

// src/infrastructure/providers/hubspot/client.ts, lines 129-170 async listAllCompanies(userId: string, limit: number = 100, after?: string) { const client = await this.getAxiosInstance(userId); const params = { limit, properties: [ 'name', 'domain', 'numberofemployees', 'annualrevenue', 'country', 'state', 'city', 'industry', 'description', 'phone', 'linkedin_company_page', 'hubspot_owner_id', 'lifecyclestage' ].join(',') }; if (after) params.after = after; const response = await client.get(`/crm/v3/objects/companies`, { params }); return { companies: response.data.results.map(r => this.mapHubSpotToCompany(r.properties, r.id)), paging: response.data.paging }; }

Step 2: Map HubSpot Fields to Astrelo Schema

HubSpot property names don’t match Astrelo’s column names. The mapper translates:

HubSpot Property → Astrelo Column ───────────────────────────────────────────── companies.name → companies.company_name companies.domain → companies.domain (normalized) companies.numberofemployees → companies.employee_count (parsed) companies.annualrevenue → companies.annual_revenue + revenue_range companies.industry → companies.industry + NAICS/SIC codes companies.country + state → companies.hq_region ("US-CA") companies.hubspot_owner_id → companies.crm_rep_id (via owner mapping)

The industry field gets special treatment — it’s mapped to NAICS and SIC codes for the embedding system (Chapter 6).

Step 3: Batch Insert with UNNEST

Instead of 500 individual INSERT statements, we use PostgreSQL’s UNNEST to insert all companies in one query:

// src/infrastructure/providers/hubspot/syncService.ts (simplified) const result = await pool.query(` INSERT INTO companies ( domain, company_name, industry, naics, employee_count, revenue_range, hq_country, hq_region, external_source, external_provider_id, user_id, synced_at ) SELECT unnest($1::text[]), -- domains array unnest($2::text[]), -- names array unnest($3::text[]), -- industries array unnest($4::text[]), -- naics codes array unnest($5::int[]), -- employee counts array unnest($6::text[]), -- revenue ranges array unnest($7::text[]), -- countries array unnest($8::text[]), -- regions array 'hubspot', unnest($9::text[]), -- hubspot IDs array $10::uuid, -- user_id (same for all) NOW() ON CONFLICT (user_id, domain) WHERE domain IS NOT NULL DO UPDATE SET company_name = EXCLUDED.company_name, industry = EXCLUDED.industry, employee_count = EXCLUDED.employee_count, synced_at = NOW() RETURNING domain, (xmax = 0) AS was_insert `, [domains, names, industries, naics, employeeCounts, ...]); stats.newCompanies = result.rows.filter(r => r.was_insert).length; stats.updatedCompanies = result.rows.filter(r => !r.was_insert).length;

The RETURNING domain, (xmax = 0) AS was_insert trick deserves explanation. In PostgreSQL, xmax is a system column that tracks the transaction ID of the last update. For newly inserted rows, xmax = 0. For updated rows (the ON CONFLICT DO UPDATE path), xmax > 0. This lets us count inserts vs. updates in a single query.

Step 4: Entity Resolution for Contacts

Contacts need to be linked to companies. This is harder than it sounds — a contact might reference a HubSpot company ID that doesn’t match any Astrelo company UUID.

The sync uses a two-pass resolution strategy:

// src/infrastructure/providers/hubspot/syncService.ts, lines 446-600 // Pass 1: Match by HubSpot company ID const companyByHubspotId = new Map<string, string>(); const companyResult = await pool.query(` SELECT id, external_provider_id FROM companies WHERE external_source = 'hubspot' AND external_provider_id = ANY($1::text[]) `, [hubspotCompanyIds]); for (const row of companyResult.rows) { companyByHubspotId.set(row.external_provider_id, row.id); } // Pass 2: Match by email domain (fallback) const companyByDomain = new Map<string, string>(); const domainResult = await pool.query(` SELECT id, REGEXP_REPLACE(LOWER(domain), '^www\\.', '') as normalized_domain FROM companies WHERE REGEXP_REPLACE(LOWER(domain), '^www\\.', '') = ANY($1::text[]) `, [emailDomains]);

For each contact:

  1. Try matching by HubSpot company ID (most reliable)
  2. If no match, extract the email domain (sarah@cloudreach.io → cloudreach.io) and match against company domains

After the bulk insert, a final pass fixes “orphaned” contacts — those that didn’t match during sync but whose email domains now match a company:

WITH domain_matches AS ( SELECT ct.id as contact_id, c.id as company_id FROM contacts ct INNER JOIN companies c ON REGEXP_REPLACE(LOWER(c.domain), '^www\\.', '') = LOWER(SPLIT_PART(ct.email, '@', 2)) WHERE ct.company_id IS NULL AND ct.email LIKE '%@%' ) UPDATE contacts SET company_id = dm.company_id FROM domain_matches dm WHERE contacts.id = dm.contact_id

This is entity resolution — the problem of determining that “HubSpot Company #12345” and “Astrelo Company abc-def-ghi” represent the same real-world entity. It’s solved through a combination of provider IDs and domain matching.

Key Takeaways

  1. OAuth 2.0 keeps credentials out of Astrelo. The user authenticates directly with HubSpot, and Astrelo only stores access/refresh tokens.

  2. Token refresh happens proactively (5-minute buffer) to prevent mid-operation failures.

  3. Batch UNNEST inserts with ON CONFLICT handle the entire sync in minimal database round-trips.

  4. Entity resolution uses two strategies: provider ID matching (reliable) and email domain matching (fallback).

  5. The xmax = 0 trick distinguishes inserts from updates in a single RETURNING clause.

Next chapter: how Salesforce integration differs from HubSpot — same patterns, different API surface.

Last updated on