Chapter 7: Fit Scoring

The Fit Score answers one question: “How similar is this prospect to the companies you’ve already won?”

A company that matches your winning profile in industry and size should score high. A company in a completely different industry at a wildly different size should score low. This chapter explains the math that makes that happen.

Fit Score Formula

Fit = (NAICS Similarity × 60%) + (Size Similarity × 40%)

Component	Weight	Description
NAICS Similarity	60%	Cosine similarity between company NAICS embedding and winning profile centroid
Size Similarity	40%	Z-score distance between company employee count and ICP midpoint

The Fit Score Formula

Fit is a weighted combination of two components:


Fit Score = (NAICS Similarity × 50%) + (Size Similarity × 35%) + (Tech Stack Signal × 15%)


// src/domain/scoring/constants/index.ts, lines 97-110
export const DEFAULT_WEIGHTS = {
  FIT_NAICS: 0.50,      // Industry embedding similarity
  FIT_SIZE: 0.35,       // Employee count + revenue range
  FIT_TECH_STACK: 0.15, // Technology stack overlap
};

Each component produces a score from 0 to 100. The weighted sum is also 0 to 100. Let’s examine each one.

Component 1: NAICS Similarity (50% of Fit)

This uses the embedding system from Chapter 6. The process:

Build the winning profile embedding — Average the NAICS embeddings of all your closed-won deals
Get the prospect’s embedding — Look up or generate the NAICS embedding for the prospect’s industry
Compute cosine similarity — How close are they in the 384-dimensional space?
Scale to 0-100 — Multiply the raw similarity (0.0 to 1.0) by 100

If your winning profile is heavy in “Computer Systems Design” (NAICS 541512) and a prospect is in “Custom Computer Programming” (NAICS 541511), the cosine similarity might be 0.92 → 92 points. If the prospect is in “Cattle Ranching” (NAICS 112111), the similarity might be 0.15 → 15 points.

This is why embeddings are powerful — they capture that “Computer Systems Design” and “Custom Computer Programming” are close relatives, while “Cattle Ranching” is a different universe. Exact NAICS code matching would miss the relationship between 541512 and 541511.

Component 2: Size Similarity (35% of Fit)

Size similarity measures how close a prospect’s employee count and revenue are to your winning sweet spot. The math here is more nuanced than a simple range check.

The Z-Score Approach

Instead of asking “is this company in the 50-200 employee range?”, we ask “how many standard deviations away from our winning average is this company?” This is a z-score.


// src/domain/scoring/services/fit/fitScoreService.ts, lines 326-345
if (hasEmployee && winningProfile.employeeStats) {
  const stats = winningProfile.employeeStats;
  if (stats.weightedStdDev > 0) {
    // z-score = (value - mean) / standardDeviation
    employeeZScore = (company.employeeCount! - stats.weightedMean) / stats.weightedStdDev;
 
    // Convert z-score to 0-100 score
    employeeScore = this.zScoreToFitScore(employeeZScore);
  }
}

The z-score tells you how “normal” a value is:

z = 0: Exactly at the mean (your ideal size)
z = ±1: Within one standard deviation (pretty close)
z = ±2: Two standard deviations away (getting unusual)
z = ±3: Three standard deviations (very unusual)

Converting Z-Score to a Score

The conversion uses the Gaussian bell curve — the same function that defines the normal distribution:


// src/domain/scoring/services/fit/fitScoreService.ts, lines 512-522
private zScoreToFitScore(zScore: number): number {
  const absZ = Math.abs(zScore);
 
  // score = 100 × e^(-0.5 × z²)
  const score = 100 * Math.exp(-0.5 * absZ * absZ);
 
  return Math.max(0, Math.min(100, score));
}

The formula: score = 100 × e^(-0.5 × z²)

This is elegant. Let’s trace the values:

Employee Count	Z-Score	Score	Meaning
150 (at mean)	0.0	100	Perfect match
220 (1σ away)	1.0	60.7	Good match
300 (1.5σ away)	1.5	32.5	Mediocre match
400 (2σ away)	2.0	13.5	Poor match
600 (3σ away)	3.0	1.1	Terrible match

The bell curve is symmetric — a company that’s too small gets the same penalty as one that’s too large, as long as they’re the same number of standard deviations away.

Hard Penalties for Extreme Outliers

The Gaussian decay is smooth but slow — even at z = 4, the score isn’t quite zero. For companies that are absurdly off-target, we apply hard floors:


// src/domain/scoring/services/fit/fitSimilarityCalculator.ts, lines 53-80
if (employeeCount < icpMin * 0.25) {
  employeeSimilarity = 0.1;   // Way too small (e.g., 5 employees when min is 50)
} else if (employeeCount > icpMax * 4) {
  employeeSimilarity = 0.05;  // Way too large (catches enterprises with 1.7B "employees")
} else if (employeeCount > icpMax * 2) {
  employeeSimilarity = 0.2;   // Too large but not absurd
} else {
  // Within reasonable range — use Gaussian
  const zScore = Math.abs((employeeCount - mean) / stdDev);
  employeeSimilarity = Math.exp(-0.5 * zScore * zScore);
}

This prevents a company with 500,000 employees from getting a non-trivial score when your sweet spot is 100-300. The Gaussian alone would give them about 0.01 — technically non-zero but practically useless. The hard penalty makes it explicitly 0.05.

Why Weighted by Deal Value?

The mean and standard deviation aren’t simple averages — they’re weighted by deal value:


// src/domain/scoring/services/learning/patternAnalyzer.ts, lines 30-50
const totalWeight = validPairs.reduce((sum, p) => sum + p.weight, 0);
 
// Weighted mean: sum(value × weight) / sum(weight)
const weightedMean = validPairs.reduce(
  (sum, p) => sum + p.value * p.weight, 0
) / totalWeight;
 
// Weighted variance: sum(weight × (value - mean)²) / sum(weight)
const weightedVariance = validPairs.reduce(
  (sum, p) => sum + p.weight * Math.pow(p.value - weightedMean, 2), 0
) / totalWeight;

If you closed a $500K deal with a 200-person company and a $10K deal with a 50-person company, the 200-person company has 50× more influence on the mean. Your winning profile reflects where the real revenue comes from, not just deal count.

Size Embedding Generation

For companies where we need a vector representation of their size (used in some scoring paths), we generate a size embedding:


// src/domain/scoring/services/fit/fitSimilarityCalculator.ts, lines 128-163
export function generateSizeEmbedding(employeeCount?: number, revenueRange?: string): number[] {
  const embedding = new Array(EMBEDDING_DIMENSION).fill(0);
 
  if (employeeCount && employeeCount > 0) {
    const logEmployees = Math.log10(employeeCount);
    const normalizedEmployees = (logEmployees - 2) / 3; // Center around 100 employees
    for (let i = 0; i < EMBEDDING_DIMENSION / 2; i++) {
      embedding[i] = normalizedEmployees * Math.cos(i * 0.1);
    }
  }
 
  if (revenueValue > 0) {
    const logRevenue = Math.log10(revenueValue);
    const normalizedRevenue = (logRevenue - 6) / 4; // Center around $1M
    for (let i = EMBEDDING_DIMENSION / 2; i < EMBEDDING_DIMENSION; i++) {
      embedding[i] = normalizedRevenue * Math.sin(i * 0.1);
    }
  }
 
  return normalizeVector(embedding);
}

Why logarithmic? Company sizes span enormous ranges — from 5 employees to 500,000. On a linear scale, the difference between 5 and 50 (45) is invisible compared to 50,000 and 500,000 (450,000). Logarithms compress this:

log₁₀(5) = 0.70
log₁₀(50) = 1.70
log₁₀(500) = 2.70
log₁₀(50,000) = 4.70
log₁₀(500,000) = 5.70

Now the jumps are uniform: each 10× increase is exactly 1.0 apart. This makes the embedding space sensible — going from 50 to 500 employees is as significant as going from 5,000 to 50,000.

The Loss Penalty: Learning from Failures

Fit scoring isn’t just about matching winners — it also learns from losses. If you’ve repeatedly lost deals at a certain type of company, that type should score lower.

Graduated Penalty Tiers


// src/domain/scoring/constants/index.ts, lines 43-49
export const LOSS_PENALTY_SCALE = {
  TIER_1: { minLosses: 1,  maxLosses: 5,  penaltyPercent: 5 },
  TIER_2: { minLosses: 6,  maxLosses: 10, penaltyPercent: 10 },
  TIER_3: { minLosses: 11, maxLosses: 15, penaltyPercent: 15 },
  TIER_4: { minLosses: 16, maxLosses: 25, penaltyPercent: 20 },
  TIER_5: { minLosses: 26, maxLosses: Infinity, penaltyPercent: 25 },
};

The penalty increases with the number of similar losses — but never exceeds 25%. This reflects a real sales insight: losing one deal in a segment is noise. Losing 26 deals in a segment is a pattern.

Similarity-Scaled Penalty

The penalty is further scaled by how similar the prospect is to your losses:


// src/domain/scoring/services/fit/fitScoreService.ts, lines 379-420
if (avgLossSimilarity > 50 && lossCount > 0) {
  const penaltyPercent = this.getGraduatedPenalty(lossCount);
 
  // Scale by similarity: 50% similar = 0% penalty, 100% similar = 100% penalty
  const similarityFactor = (avgLossSimilarity - 50) / 50;
  const effectivePenalty = penaltyPercent * similarityFactor;
 
  fitScore = rawFitScore * (1 - effectivePenalty / 100);
}

If a company is 75% similar to your loss profile and you’ve lost 8 deals in that segment:

Base penalty: 10% (Tier 2: 6-10 losses)
Similarity factor: (75 - 50) / 50 = 0.50
Effective penalty: 10% × 0.50 = 5%
Final score: rawScore × 0.95

If the company is 95% similar to losses:

Similarity factor: (95 - 50) / 50 = 0.90
Effective penalty: 10% × 0.90 = 9%
Final score: rawScore × 0.91

This prevents overcorrection. A company that’s 51% similar to your losses (barely matching) gets almost no penalty. A company that’s 99% similar gets the full penalty.

Putting It All Together

Let’s score a hypothetical prospect:

Your winning profile:

Primary NAICS: 541512 (Computer Systems Design)
Mean employee count: 150 (σ = 75, weighted by deal value)
Mean revenue: $20M (σ = $15M)

Prospect: “CloudTech Solutions”

NAICS: 541511 (Custom Computer Programming)
Employees: 200
Revenue: $25M
Loss similarity: 30% (below 50% threshold → no penalty)

Calculation:

NAICS Similarity: cosine(“541512” embedding, “541511” embedding) = 0.92 → 92 points
Employee z-score: (200 - 150) / 75 = 0.67 → 100 × e^(-0.5 × 0.67²) = 100 × 0.80 = 79.8 points
Revenue z-score: (25M - 20M) / 15M = 0.33 → 100 × e^(-0.5 × 0.33²) = 100 × 0.95 = 94.7 points
Size Similarity: (79.8 + 94.7) / 2 = 87.3 points
Tech Stack: assume 60 points (some overlap)

Final Fit Score:


(92 × 0.50) + (87.3 × 0.35) + (60 × 0.15)
= 46.0 + 30.6 + 9.0
= 85.6

CloudTech Solutions scores 85.6/100 on fit — a strong match. The NAICS similarity carries the most weight, and the size is within one standard deviation.

Key Takeaways

Fit = NAICS (50%) + Size (35%) + Tech (15%). Industry match is the strongest signal.
Z-scores measure deviation from your winning average. The Gaussian bell curve converts “how many standard deviations away” to a 0-100 score.
Deal value weights the statistics. Bigger deals have more influence on what your “winning profile” looks like.
Loss penalties are graduated and similarity-scaled. More losses = bigger penalty, but only if the prospect actually resembles your losses.
Logarithmic scaling handles the size range. From 5-person startups to 500K-person enterprises, log scale keeps the math sensible.

Next chapter: we add the second dimension — Intent Scoring, which measures how actively a prospect is engaging with you right now.