Prof. Yakov Bart (Northeastern University, Boston, USA)

Date: Thursday, April 9^th, 2026
Time: 2:15 - 3:45 pm
Location: Ludwigstr. 28, VG, Room 211b
Title: (Mis)Measuring the Drivers of Ad Performance
Abstract: Researchers across the social sciences increasingly use large language model (LLM) annotations as scalable substitutes for human coding, yet whether these measurements are reliable enough for downstream inference remains largely untested. We examine this question in the domain of video advertising, where the ability to measure creative features at scale is a long-standing bottleneck to empirical analysis. Using data on over 10,000 nationally aired television ads with 14,000+ human annotations and survey-based ad-quality outcomes, we show that off-the-shelf LLMs generate systematic measurement errors that bias downstream regression estimates—including sign reversals on key coefficients, where a researcher relying on naive LLM annotations would draw the opposite substantive conclusion from one using human-coded data. We also show that standard first-order agreement metrics, which are the current convention for testing encoding reliability, completely fail to diagnose these errors, and calculate that the second-order inference tests that we recommend have large sample size requirements (minimum 1000+ observation validation sets in our setting).