← Back to Search

How Our Multi-Level Search Works

Advanced AI-powered search that understands context and delivers highly relevant results

Overview

Our search system uses multiple complementary techniques to find the most relevant alumni for your query. Unlike traditional keyword search, it understands the meaning and intent behind your search, matching concepts rather than just exact words.

Key Advantage: The system achieves 90%+ relevance by combining semantic understanding with traditional search methods, ensuring you find the right people even when they use different terminology.

Search Architecture Flow

User Query: "AI experts in fintech"
Query Intent Detection & Embedding Generation
↓ ↙ ↓ ↘ ↓
Semantic Search
Bi-encoder vectors
768-dim embeddings
Cosine similarity
BM25 Search
Term frequency
IDF weighting
Exact matches
Tag Search
Industry tags
Role categories
Skills matching
Field-Specific
Role embedding
Expertise embedding
Education embedding
↘ ↓ ↙
Reciprocal Rank Fusion (RRF)
Combines all rankings: RRF(d) = Σ 1/(k + rank)
Ranked Results (90%+ Relevance)

The Four Search Layers

1. Semantic Search (AI Understanding)

Uses advanced AI models to understand the meaning of your query and match it against:

  • Role Embeddings: Current positions and career paths
  • Expertise Embeddings: Skills, technologies, and domain knowledge
  • Education Embeddings: Academic background and institutions
  • Contextualized Profile: Complete professional narrative

Example: Searching for "AI" will also find profiles mentioning "machine learning", "neural networks", or "deep learning"

2. BM25 Lexical Search (Keyword Matching)

Advanced keyword matching algorithm that:

  • Finds exact and partial word matches
  • Weights rare terms more heavily (e.g., specific company names)
  • Considers term frequency and document length
  • Excels at finding specific names, companies, or technical terms

Example: Searching for "Goldman Sachs" will precisely match that company name

3. Tag-Based Search (Structured Data)

Searches through structured tags and categories:

  • Industry classifications (e.g., "Healthcare", "FinTech")
  • Role categories (e.g., "Founder", "Investor", "Advisor")
  • Expertise areas (e.g., "Data Science", "Product Management")
  • Location and availability status

Example: Filters for "available advisors in healthcare"

4. Reciprocal Rank Fusion (Smart Combination)

Intelligently combines results from all search methods:

  • Merges rankings from different search types
  • Boosts profiles that appear in multiple searches
  • Balances semantic understanding with exact matches
  • Ensures diverse, relevant results

Result: The best matches rise to the top, regardless of search method

Query Intent Recognition

The system automatically detects what you're looking for and adjusts its search strategy:

Role-Focused Searches

"Founders", "CEOs", "Product Managers"

→ Prioritizes current position and career history

Expertise Searches

"Machine learning", "Blockchain", "Marketing"

→ Focuses on skills and technical expertise

Company/Industry Searches

"Google alumni", "FinTech", "Healthcare startups"

→ Emphasizes work history and industry experience

Education Searches

"Stanford MBA", "PhD in Physics", "Class of 2015"

→ Prioritizes academic background

Technical Architecture & Algorithms

Bi-Encoder Architecture (Semantic Search)

A neural network architecture that independently encodes queries and documents into dense vectors:

Query → Encoder → Query Vector (768 dimensions)
Profile → Encoder → Profile Vector (768 dimensions)
Similarity = cosine_similarity(Query Vector, Profile Vector)
  • Model: BAAI/bge-base-en-v1.5 (BERT-based)
  • Advantage: Pre-computed embeddings enable fast similarity search
  • Trade-off: Less precise than cross-encoders but much faster
  • Use case: Finding conceptually similar profiles at scale

BM25 Algorithm (Best Matching 25)

A probabilistic ranking function that evolved from TF-IDF, considering term frequency and document length:

Score = Σ IDF(qi) × (f(qi, D) × (k1 + 1)) / (f(qi, D) + k1 × (1 - b + b × |D| / avgdl))

IDF(qi): Inverse document frequency of query term
f(qi, D): Frequency of term in document
k1, b: Tuning parameters (typically k1=1.2, b=0.75)
|D|/avgdl: Document length normalization
  • Strength: Excellent for exact matches and rare terms
  • Example: Searching "Goldman Sachs" ranks profiles with exact company match higher
  • Limitation: No semantic understanding (won't match synonyms)

Reciprocal Rank Fusion (RRF)

A simple yet effective fusion algorithm that combines rankings from multiple search methods:

RRF_score(d) = Σ 1 / (k + rank_i(d))

d: Document (alumni profile)
rank_i(d): Rank of document in search method i
k: Constant (typically 60) to prevent over-weighting top results
  • Example: If a profile ranks #2 in semantic and #5 in BM25:
  • RRF = 1/(60+2) + 1/(60+5) = 0.0161 + 0.0154 = 0.0315
  • Benefit: Profiles appearing in multiple searches get boosted
  • Result: More robust ranking that combines different relevance signals

Vector Similarity (Cosine Similarity)

Measures the angular similarity between two vectors, regardless of magnitude:

similarity = (A · B) / (||A|| × ||B||) = Σ(ai × bi) / (√Σai² × √Σbi²)

• Range: [-1, 1] where 1 = identical, 0 = orthogonal, -1 = opposite
• Normalized to [0, 1] for scoring: (similarity + 1) / 2
  • Why cosine? Direction matters more than magnitude for text
  • PostgreSQL: Uses pgvector extension with HNSW index for fast search
  • Performance: ~10ms for searching 100K vectors

Field-Specific Embeddings Strategy

Different text fields are embedded separately to preserve their semantic context:

role_embedding[768]← "Senior Product Manager at Google"
expertise_embedding[768]← "Machine Learning, Python, TensorFlow"
education_embedding[768]← "Stanford MBA, MIT Computer Science"
context_embedding[768]← Full profile text
  • Benefit: Preserves field-specific semantics
  • Query routing: "Product Manager" → prioritize role_embedding
  • Storage: 4 × 768 × 4 bytes = ~12KB per profile

Query Intent Detection & Weighting

Analyzes query patterns to determine search focus and adjust weights dynamically:

detectIntent(query) → {type, weights}
if contains(["CEO", "Founder", "Manager"]) → role-focused
if contains(["Python", "AI", "blockchain"]) → expertise-focused
if contains(["Stanford", "MBA", "PhD"]) → education-focused
if contains(company_names) → company-focused
else → general (balanced weights)
  • Dynamic weights: Role search might use role:0.6, expertise:0.2, others:0.1
  • Improves precision: 30-40% better relevance than uniform weights

Implementation Details

Database Architecture

  • PostgreSQL + pgvector: Vector similarity search with HNSW indexing
  • Supabase: Managed PostgreSQL with built-in vector support
  • Indexes: GIN for full-text, HNSW for vectors, B-tree for exact matches
  • Caching: Query embeddings cached to reduce API calls

Performance Optimization

  • Parallel execution: All search methods run concurrently
  • Early termination: Stop if confidence threshold reached
  • Batch processing: Embeddings generated in batches during enrichment
  • Connection pooling: Reuse database connections
  • CDN caching: Static assets served from edge locations

Accuracy Metrics

90%+
Relevance Rate
~500ms
Avg Response Time
768-dim
Vector Space

Model Selection Rationale

  • BAAI/bge-base-en-v1.5: Best balance of quality and speed
  • 768 dimensions: Sufficient for professional profiles
  • English-optimized: Better performance for business terminology
  • HuggingFace API: Serverless, no infrastructure management

Original vs Multi-Level Search

FeatureOriginal SearchMulti-Level Search
Search MethodsSemantic + KeywordsSemantic + BM25 + Tags + RRF
Field SpecificitySingle embeddingRole, Expertise, Education specific
Query UnderstandingBasicIntent-aware with weighted focus
Result FusionSimple weighted averageReciprocal Rank Fusion (RRF)
PerformanceSequential processingParallel processing
Accuracy~70% relevance90%+ relevance

Search Tips

Use natural language:"Find me alumni working in AI startups who could be advisors"
Be specific when needed:Include company names, specific technologies, or role titles
Combine criteria:"Product managers with blockchain experience in New York"
Use domain terms:The system understands industry jargon and technical terms