This is part 1 of the AI Recruiting Pipeline Epic.
When you first integrate an LLM, the temptation is to call the API directly:
client = OpenAI::Client.new(access_token: ENV["OPENAI_API_KEY"])
response = client.chat(parameters: { model: "gpt-4o", messages: [...] })
This works until:
- You need to switch from GPT-4o to Claude for better reasoning
- OpenAI rate-limits you and you need automatic retry logic
- You want to test without making real API calls
- A new model launches and you want to A/B test it
The fix is a provider abstraction that separates what you want from how it happens.
The Base Class
# app/services/llm/provider/base.rb
module LLM
module Provider
class Base
attr_reader :messages, :response_format, :model, :max_tokens, :temperature
attr_reader :response, :error
def initialize(messages:, response_format: nil, model: nil, max_tokens: nil, temperature: nil)
@messages = messages
@response_format = response_format
@model = model
@max_tokens = max_tokens
@temperature = temperature
end
def call
raise NotImplementedError
end
def success?
response.present? && error.nil?
end
def content
raise NotImplementedError
end
def rate_limited?
false
end
def retry_after
nil
end
end
end
end
The base class defines the interface every provider must implement:
call— Make the API requestsuccess?— Did it work?content— Extract the response textrate_limited?— Should we wait and retry?retry_after— How long to wait?
The OpenAI Provider
# app/services/llm/provider/openai.rb
module LLM
module Provider
class OpenAI < Base
RATE_LIMIT_PATTERN = /Please try again in ([0-9.]+)(ms|s)/i
def call
@response = client.chat(parameters: request_parameters)
self
rescue Faraday::Error, StandardError => e
@error = e
self
end
def content
return nil unless response
response.dig("choices", 0, "message", "content")&.strip
end
def rate_limited?
return false unless error
error_code == "rate_limit_exceeded" || error.is_a?(Faraday::TooManyRequestsError)
end
def retry_after
return nil unless error
retry_after_from_headers || retry_after_from_message
end
private
def client
@client ||= ::OpenAI::Client.new(
access_token: Rails.application.credentials.dig(:open_ai, :api_key),
request_timeout: 120
)
end
def request_parameters
params = { model:, messages: }
params[:response_format] = response_format if response_format
params[:max_tokens] = max_tokens if max_tokens
params[:temperature] = temperature if temperature
params
end
def retry_after_from_headers
error.response&.dig(:headers, "retry-after")&.to_f
end
def retry_after_from_message
match = error.message.to_s.match(RATE_LIMIT_PATTERN)
return nil unless match
value, unit = match[1].to_f, match[2].downcase
unit == "ms" ? value / 1000.0 : value
end
end
end
end
This encapsulates all OpenAI-specific behavior: response parsing, error handling, rate limit detection.
Using the Abstraction
Services that need LLM calls accept the provider class as a dependency:
class ParseJobDescription
def initialize(text, provider_class: LLM::Provider::OpenAI)
@text = text
@provider_class = provider_class
end
def call
provider = @provider_class.new(
messages: build_messages,
response_format: json_schema,
model: "gpt-4o",
temperature: 0
)
provider.call
return fallback_result unless provider.success?
JSON.parse(provider.content)
end
end
Testing becomes trivial:
class FakeProvider < LLM::Provider::Base
def initialize(response_content:, **options)
super(**options)
@response_content = response_content
end
def call
@response = { "choices" => [{ "message" => { "content" => @response_content } }] }
self
end
def content
@response_content
end
end
RSpec.describe ParseJobDescription do
it "parses the response" do
fake_provider = Class.new(FakeProvider) do
def initialize(**); super(response_content: '{"shift": "1st"}', **); end
end
result = described_class.new("Day shift position", provider_class: fake_provider).call
expect(result["shift"]).to eq "1st"
end
end
Adding a Second Provider
When Claude's reasoning is better for your use case:
module LLM
module Provider
class Claude < Base
def call
@response = client.messages(
model:,
max_tokens:,
messages:
)
self
rescue Anthropic::Error => e
@error = e
self
end
def content
response&.content&.first&.text
end
private
def client
@client ||= Anthropic::Client.new
end
end
end
end
Switching providers is a one-line change:
ParseJobDescription.new(text, provider_class: LLM::Provider::Claude)
Or configure per-environment:
# config/initializers/llm.rb
Rails.configuration.default_llm_provider =
ENV["LLM_PROVIDER"] == "claude" ? LLM::Provider::Claude : LLM::Provider::OpenAI
Why This Pattern Matters
- Testability — Fake providers run in milliseconds with no API costs
- Reliability — Rate limit handling is implemented once, correctly
- Flexibility — Switch models for A/B testing or cost optimization
- Observability — Add logging, metrics, or tracing in one place
The provider abstraction is table stakes for production AI features. Build it before your first LLM call, not after you're rate-limited in production.