Epic: From Scrape to Suggestion — Building an AI-Powered Recruiting Pipeline

This series documents building an AI-augmented recruiting pipeline from scratch. The system scrapes healthcare job listings, extracts structured data with LLMs, evaluates parser accuracy with test datasets, and generates contextual SMS suggestions for recruiters.

The Problem

Healthcare recruiting runs on unstructured data: job descriptions written in natural language, candidate preferences captured in free-form text, and conversations that need context to be effective.

Traditional approaches require armies of data entry specialists or produce low-quality matches. LLMs offer a middle path: structured extraction with human oversight.

The Architecture

┌─────────────┐    ┌──────────────┐    ┌───────────────┐
│   Scraper   │ →  │  LLM Parser  │ →  │  Evaluator    │
│  (Faraday)  │    │  (OpenAI)    │    │  (YAML tests) │
└─────────────┘    └──────────────┘    └───────────────┘
       ↓                  ↓                    ↓
┌─────────────┐    ┌──────────────┐    ┌───────────────┐
│ Job Listings │ → │  Job Matcher │ →  │  Conversation │
│  (Postgres) │    │  (SQL + Ruby)│    │    Agent      │
└─────────────┘    └──────────────┘    └───────────────┘

Each component is independently testable. LLM calls are isolated behind provider abstractions. The evaluation harness catches regressions before production.

What You'll Learn

  1. Building a Resumable Scraper — Concurrent HTTP fetching with checkpoint recovery
  2. Structured Extraction with JSON Schemas — Using OpenAI's structured outputs for reliable parsing
  3. Evaluation-Driven ML Development — Building a test harness before shipping to production
  4. Contextual Message Generation — LLM conversation agents that maintain state and context

Why This Matters

Small teams can't afford dedicated ML engineers or data science departments. But they can use LLMs effectively if they build the right infrastructure:

  • Provider abstractions let you switch models without code changes
  • Evaluation harnesses catch regressions before users do
  • Human-in-the-loop systems maintain quality without manual data entry
  • Structured outputs turn LLM responses into database rows

The four articles in this series show exactly how to build each piece.

← Back to all articles