Epic: From Scrape to Suggestion — Building an AI-Powered Recruiting Pipeline

This series documents building an AI-augmented recruiting pipeline from scratch. The system scrapes healthcare job listings, extracts structured data with LLMs, evaluates parser accuracy with test datasets, and generates contextual SMS suggestions for recruiters.

The Problem

Healthcare recruiting runs on unstructured data: job descriptions written in natural language, candidate preferences captured in free-form text, and conversations that need context to be effective.

Traditional approaches require armies of data entry specialists or produce low-quality matches. LLMs offer a middle path: structured extraction with human oversight.

The Architecture

┌─────────────┐    ┌──────────────┐    ┌───────────────┐
│   Scraper   │ →  │  LLM Parser  │ →  │  Evaluator    │
│  (Faraday)  │    │  (OpenAI)    │    │  (YAML tests) │
└─────────────┘    └──────────────┘    └───────────────┘
       ↓                  ↓                    ↓
┌─────────────┐    ┌──────────────┐    ┌───────────────┐
│ Job Listings │ → │  Job Matcher │ →  │  Conversation │
│  (Postgres) │    │  (SQL + Ruby)│    │    Agent      │
└─────────────┘    └──────────────┘    └───────────────┘

Each component is independently testable. LLM calls are isolated behind provider abstractions. The evaluation harness catches regressions before production.

What You'll Learn

Building a Resumable Scraper — Concurrent HTTP fetching with checkpoint recovery
Structured Extraction with JSON Schemas — Using OpenAI's structured outputs for reliable parsing
Evaluation-Driven ML Development — Building a test harness before shipping to production
Contextual Message Generation — LLM conversation agents that maintain state and context

Why This Matters

Small teams can't afford dedicated ML engineers or data science departments. But they can use LLMs effectively if they build the right infrastructure:

Provider abstractions let you switch models without code changes
Evaluation harnesses catch regressions before users do
Human-in-the-loop systems maintain quality without manual data entry
Structured outputs turn LLM responses into database rows

The four articles in this series show exactly how to build each piece.