evaluate > analysis > align cycle, you’ll start making more informed tradeoffs, and expertly balancing helpfulness, safety, and brand voice in your models.">
evaluate > analysis > align cycle, you’ll start making more informed tradeoffs, and expertly balancing helpfulness, safety, and brand voice in your models."/>
evaluate > analysis > align cycle, you’ll start making more informed tradeoffs, and expertly balancing helpfulness, safety, and brand voice in your models."/>
Manning Early Access Program (MEAP)
Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
It’s a fundamental truth that all software—even AI systems—is broken. AI engineers who can diagnose faults and refine systems to align with business needs are in high demand. Evaluation and Alignment: The Seminal Papers expands the foundational research into judging and adapting AI systems into a collection of practical techniques you can use on the job. As you trace the progression from surface-level text matching, to semantic similarity, to judgment-based evaluation, you’ll build the mental models necessary to choose the right metrics, detect failure modes, and close the loop from evaluation to alignment.
Evaluation and Alignment: The Seminal Papers teaches you to think of evaluation as a design constraint. You’ll employ a "working backwards" methodology that begins with what your system must get right, which directs you to the appropriate evaluation approach. As you internalize the define > evaluate > analysis > align cycle, you’ll start making more informed tradeoffs, and expertly balancing helpfulness, safety, and brand voice in your models.
what's inside
BLEU, ROUGE, BERTScore, COMET, and LLM-as-a-judge methods
Detecting and quantifying hallucinations
Aligning AI with RLHF, constitutional AI, and red teaming
Timeless best practices that will apply as models evolve
about the reader
For AI engineers and LLM practitioners. No prior knowledge needed of NLP metrics, reinforcement learning, or alignment research is required.
about the author
Han Lee has spent more than a decade applying cutting-edge research on large-scale AI and machine-learning systems into production‐ grade products. As Senior Director of Data and AI at Moody’s, he leads teams that ship generative-AI applications, giving him daily, hands-on exposure to safety-critical evaluation pipelines.
Introductory offer Save 50% for a limited time!
eBook
pdf, ePub, online
$47.99
$23.99
you save $24.00 (50%)
Introductory offer Save 50% for a limited time!
print
includes eBook
$59.99
$29.99
you save $30.00 (50%)
with subscription
free or 50% off
$24.99
pro $24.99 per month
access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!