The Missing Sort Command

How a Single Line of Code Undid a Top Paper, and What AI Means for the Future of Replication

Mar 20, 2026

∙ Paid

In early 2026, Michael Wiebe’s comment on Moretti (2021) was accepted at the American Economic Review. The comment showed that the causal results in one of the most cited recent papers on agglomeration and innovation were driven by coding errors. The paper had studied whether bigger technology clusters cause inventors to patent more, a question with direct implications for housing policy, urban planning, and billions of dollars in public subsidies to attract tech firms. Moretti found large positive effects. The event study showed a clear jump in patenting when inventors moved to bigger clusters. The instrumental variables strategy confirmed the causal interpretation.

Both results were wrong. The errors were plain bugs in the code, not subtle econometric judgments or debatable identification assumptions.

The event study specification was wrong. Among other problems, the treatment variable was not properly interacted with all event-year indicators, including year zero. The coefficient for the move year was therefore estimated using data from all years, not just the move year, inflating the apparent effect. The instrumental variable, based on the number of inventors in the same field working for firms in other cities, was constructed from data that had not been sorted by city. The code computed first-differences across different cities rather than within them. A single missing sort command meant the instrument was mixing up city A’s value this year with city B’s value last year. Correcting the sort order and rerunning the IV regressions produces first-stage F-statistics around 4.5 to 7, far below conventional thresholds for instrument strength, and null second-stage estimates.

Wiebe spent three years on this replication. He emailed Moretti in July 2023 about the event study problem. He never received a response. The comment went through a full round of review and revision at the AER before acceptance. Along the way, Wiebe documented ten distinct issues: the event study bug, the IV sorting error, a problematic log transformation using log(y + 0.00001) that reversed the citation quality results, an identifier based on inventor names that conflated different people in different cities, and cleaning code that used many-to-many merges with nonunique sort orders, producing different samples every time the code was run.

This story is worth telling not because Moretti made mistakes, since everyone makes mistakes, but because of what it reveals about the infrastructure of empirical economics. We have spent thirty years refining our identification strategies. We have developed sophisticated tools for causal inference: difference-in-differences with heterogeneous treatment effects, synthetic controls, regression discontinuity designs, shift-share instruments with proper inference. The econometric methodology has never been better. But the code that implements these methods often runs on the digital equivalent of duct tape and hope. Fragile Stata scripts with undocumented variable names, unsorted datasets, arbitrary choices buried deep in data-cleaning routines that no one else will ever read.

The credibility revolution taught economists to take identification seriously. We are now learning, more slowly and more painfully, that we also need to take implementation seriously.

Continue reading this post for free, courtesy of Carlos Chavez.

Or purchase a paid subscription.