Andrew Gelman was born in Philadelphia in 1965 into a family with intellectual range. His sister Susan Gelman became a prominent developmental psychologist. His uncle Woody Gelman was a cartoonist. He attended MIT as a National Merit Scholar, earning degrees in mathematics and physics in the mid-1980s, then moved to Harvard for graduate work in statistics, completing his doctorate in 1990 under Donald Rubin, the Bayesian statistician whose work on missing data and causal inference had already reshaped how empirical researchers thought about what they could and could not learn from observational studies. Gelman’s dissertation addressed image reconstruction for emission tomography, a technical problem in medical imaging that required hierarchical models and simulation-based inference. The dissertation was not famous, but the tools it required became the foundation of everything that followed.
He joined Berkeley’s statistics faculty in 1990, moved to Columbia in 1996, and has remained there since, accumulating appointments in both statistics and political science, directing the Applied Statistics Center, and in 2017 receiving the Higgins Professorship. The institutional biography is less interesting than what happened intellectually during those decades, which is that Gelman gradually transformed from a technical statistician into something harder to categorize: a public epistemologist, an institutional auditor, a methodological conscience for the social sciences, and a blogger whose daily output shaped the research culture of an entire generation.
The technical work came first and remained foundational throughout. Gelman’s contributions to Bayesian statistics are substantial. Bayesian Data Analysis, co-authored with John Carlin, Hal Stern, David Dunson, Aki Vehtari, and originally Donald Rubin, is now in its third edition and functions as the standard reference for applied Bayesian reasoning across statistics, epidemiology, social science, and machine learning. The book is notable not for mathematical novelty alone but for its insistence on workflow: the idea that Bayesian analysis is not a one-shot calculation but a iterative process of model building, checking, revision, and expansion. Posterior predictive checks, the technique of asking whether data simulated from your fitted model resemble the data you observed, become in Gelman’s treatment not a decorative verification step but the core of honest inference. If your model cannot generate data that looks like what you saw, your model is wrong in some important way and you should find out how.
This emphasis on model checking connects to his broader contribution to multilevel modeling, formalized in Data Analysis Using Regression and Multilevel/Hierarchical Models with Jennifer Hill and its successor Regression and Other Stories with Hill and Vehtari. Multilevel models, which allow parameters to vary across groups while borrowing statistical strength across them through partial pooling, had existed in various forms before Gelman. His contribution was to make them practically accessible and conceptually central. The basic idea is that neither complete pooling, treating all groups as identical, nor no pooling, treating each group as entirely separate, is usually right. Partial pooling acknowledges that groups share something while differing in ways the data can inform. This is both technically superior to the alternatives and epistemically humble: it encodes the belief that your data know something about the world but not everything, and that prior information and between-group patterns both deserve weight.
The political science work emerged alongside the statistical work. Red State, Blue State, Rich State, Poor State, with David Park, Boris Shor, and Jeronimo Cortina, addressed a puzzle that political commentary had gotten systematically wrong. The claim that red states were poor and blue states were rich was true at the state level but inverted at the individual level: within any given state, higher-income voters were more likely to vote Republican. The apparent paradox dissolved once you used multilevel models to examine individual-level voting patterns within states. The book demonstrates what happens when you analyze data at the right level of aggregation rather than the level that generates the most vivid narrative.
The election forecasting work extended this approach. Gelman’s collaboration with The Economist on presidential forecast models in 2020 and 2024 was notable for what it refused to do as much as for what it did. The models produced wide probability intervals. They combined economic fundamentals with polling data adjusted for non-sampling error. They treated the forecast as a posterior distribution over possible outcomes. The refusal of false precision was a methodological stance with public implications: in an environment where forecasters competed to display misleading confidence, Gelman’s models modeled uncertainty honestly, which made them less satisfying as a product and more honest as a description of what could be known.
The work on redistricting, conducted largely with Gary King, introduced the concept of partisan symmetry as a mathematical standard for electoral fairness. The principle is simple: a map is fair if both parties would receive the same seat share when they receive the same vote share. The concept provided a quantitative foundation for legal arguments about gerrymandering, translating a moral intuition about fairness into a measurable property of electoral systems. The research on redistricting also contained a more uncomfortable finding: that gerrymandering often fails to achieve its intended partisan effects, because the political environment is sufficiently uncertain that mapmakers cannot reliably engineer the outcomes they intend. This was a result that partisan advocates on both sides preferred to ignore.
The paper on the rationality of voting, with Nate Silver and Aaron Edlin, addressed a standard puzzle in rational choice theory. If the probability that your individual vote is decisive is vanishingly small, why is it rational to vote at all? The standard answer treats voting as expressive. Gelman’s answer worked through the expected utility calculation more carefully. If voters have social preferences, and if the social benefit of a preferred candidate winning is multiplied by the entire population affected, the expected utility of voting remains significant even in large electorates. The paper demonstrated that the standard argument against voting’s rationality rests on a contestable assumption about the scope of preferences.
All of this technical and applied work was serious and would have established a substantial career on its own. What made Gelman distinctive was that it coincided with a deepening preoccupation with the sociology of knowledge production itself, with what researchers were doing when they claimed to be doing science, and with the gap between the methods researchers described in their papers and the methods they used in their labs.
The concept of the garden of forking paths is the clearest expression of this preoccupation. The name comes from a Jorge Luis Borges story, but the phenomenon it describes is neither literary nor exotic. It is the ordinary condition of modern empirical research. A researcher collects data, looks at it, makes a series of reasonable decisions about how to analyze it, decisions about which outliers to exclude, which covariates to include, which subgroups to examine, which outcome measures to focus on, and these decisions, while each individually defensible, collectively guarantee that a statistically significant result will emerge. The problem is not fraud. The problem is that the space of possible analyses is large, the researcher navigates it in real time in response to what the data show, and the p-value that results from this process does not mean what p-values are supposed to mean. A p-value tells you the probability of seeing data as extreme as yours if the null hypothesis were true and you had committed to your analysis before seeing the data. It tells you nothing useful when the analysis was shaped by the data.
This is a sociological point as much as a statistical one. The garden of forking paths is not an abuse of the system. It is the system. Researchers are trained to explore their data, to look for patterns, to understand their measurements before committing to a final analysis. That exploratory process is useful for understanding. It is fatal for the inference that the final published p-value is supposed to support. Gelman’s contribution was to name this clearly and to resist the instinct to treat it as a rare pathology.
The Type S and Type M error framework follows from this. When a study is underpowered and a result nonetheless reaches statistical significance, the result is almost certainly an exaggeration. The true effect, if it exists at all, is smaller than what was measured. And there is a meaningful probability that the sign of the effect is wrong. Type S errors, where the estimate points in the wrong direction, and Type M errors, where the magnitude is grossly inflated, are not random. They are predictable consequences of the publication process in low-powered research environments. Gelman and his collaborators worked through the mathematics of this in detail, showing that in domains where effects are typically small and studies are typically underpowered, the published literature is almost certain to be dominated by exaggerated and unreliable estimates, even in the complete absence of fraud.
The replication crisis that became publicly visible around 2011 to 2015 validated this analysis. Studies that had been celebrated across psychology, social science, nutrition research, and medicine failed to replicate at rates that should not have been surprising given what was known about statistical power and the garden of forking paths, but that were nonetheless shocking to many people who had not thought carefully about what the publication process was producing. The power posing study by Amy Cuddy and Dana Carney, which claimed that expansive body postures increase testosterone, decrease cortisol, and improve risk tolerance in ways detectable even without direct measurement, was among the most prominent casualties. The study had a sample of forty-two people. It had made data-dependent decisions about which of the many hormonal and behavioral outcomes to report. It had generated a TED talk viewed tens of millions of times and a career built on the finding’s emotional resonance. Gelman’s critique focused on the methodology: the sample was too small, the decision to report only the significant outcomes among many measured was a textbook instance of the garden of forking paths, and the effect sizes claimed were implausibly large relative to what the noise level of the study could support.
The reaction from Amy Cuddy’s defenders, including a prominent piece in the New York Times Magazine framed as an account of scientific bullying, accused Gelman of cruelty and a failure of collegiality. The argument was that a proper critic would have reached out privately, helped the researcher fix her errors, and avoided public humiliation. Gelman’s response was consistent and clear. Science is a public activity. A claim published in a peer-reviewed journal and disseminated to millions of people through a TED talk is not a private matter between two researchers. The obligation to defend published claims publicly is not cruelty. The practice of helping famous researchers fix their errors privately, shielded from scrutiny, is a form of corruption. It protects reputations at the expense of the accuracy of the public record.
The Brian Wansink case made the same point more dramatically. Wansink, the director of Cornell’s Food and Brand Lab, had built a prolific career producing research on food behavior, portion sizes, and the environmental determinants of eating. A blog post in which he praised a student for squeezing four papers out of a dataset that had initially failed to produce significant results triggered a systematic examination of his published work. Gelman and others, including Jordan Anaya and Nick Brown, found impossible numbers, inconsistencies across papers reporting data from the same studies, and patterns of reporting consistent with pervasive p-hacking. More than fifteen of Wansink’s papers were eventually retracted. Cornell conducted an internal investigation, found evidence of academic misconduct, and Wansink resigned. The case was a demonstration of what post-publication scrutiny could accomplish when it was pursued.
These confrontations made Gelman a polarizing figure. Susan Fiske, a prominent social psychologist, circulated a draft essay describing critics like Gelman as methodological terrorists. The term crystallized a real tension. The old system of scientific quality control operated through pre-publication peer review conducted privately by a small number of experts who shared the professional norms and social networks of the authors they reviewed. Gelman’s approach, conducting critique publicly through a blog accessible to anyone with an internet connection, violated those norms. It made methodological arguments that would previously have been exchanged in private letters between specialists available to journalists, policy advocates, and the general public. It removed the protection that professional insularity had traditionally provided to high-status researchers whose work was prominent but weak.
The defense of that insularity as a necessary protection against unqualified criticism misses what was at stake. The old system had failed. The private, collegial correction mechanism had allowed the garden of forking paths to flourish for decades. It had produced a published literature in social psychology, nutrition science, and related fields that was substantially unreliable. The question was not whether criticism should be private or public. The question was whether the correction mechanism was working. It was not, and public criticism was the corrective.
The blog, Statistical Modeling, Causal Inference, and Social Science, which Gelman has maintained since 2004, is where most of this played out. The blog is neither a personal diary nor a conventional academic outlet. It is something harder to categorize: a daily practice of reasoning in public, a running commentary on the methods and findings of empirical research across many fields, a space where technical arguments get tested by an unusually informed and critical readership. The posts range from detailed statistical critiques of specific papers to reflections on research culture, to discussions of teaching, to occasional forays into politics, literature, and philosophy. The comments section at its best functions as extended peer review of Gelman’s own arguments, with methodologically sophisticated readers from many fields pushing back, extending, and correcting in real time.
The blog’s influence is difficult to measure but clearly substantial. It has shaped how a generation of researchers thinks about uncertainty, model checking, the replication crisis, and the ethics of publishing. It has given a public vocabulary to problems that previously had no name. The garden of forking paths, researcher degrees of freedom, Type S and Type M errors, the time-reversal heuristic, are all concepts that circulate in methodological discussions across disciplines partly because they were named and elaborated in a form accessible to non-specialists. The blog also changed what was socially possible. By demonstrating that serious technical criticism could be conducted in public without the protection of anonymous peer review, it helped normalize a form of accountability that the old system had made nearly impossible.
Gelman’s political science work and his statistical work are unified by a single underlying commitment that can be stated simply: the obligation to say what the data can and cannot support, and to resist the pressure to say more. This sounds unremarkable until you observe how consistently it is violated in practice. The pressure to produce clean, publishable, policy-relevant, morally satisfying results is pervasive in academic science. It operates through publication incentives, grant funding priorities, media demand for compelling findings, and the social dynamics of fields where being known for important results is the primary currency of reputation. Gelman’s entire career has been a sustained argument against yielding to that pressure.
This connects to the Alliance Theory analysis that has accumulated around his work, which is worth engaging seriously even if its framing is not one Gelman would use himself. The analysis observes that Gelman functions as an internal auditor of a coalition, the liberal academic knowledge class, that derives its authority from the claim to produce reliable scientific knowledge. When members of that coalition produce unreliable knowledge and wrap it in scientific legitimacy, they create a liability that rivals can exploit and that erodes trust in the coalition’s outputs over time. Gelman’s role is to identify and correct those liabilities before they compound into a credibility crisis.
This framing is not wrong, but it is incomplete. It explains why the coalition tolerates Gelman despite the social costs he imposes on high-status members. It does not fully explain his motivation, which seems to be better captured by a simpler claim: he thinks accurate description of the world matters, that the gap between what published science claims and what it can support is morally significant, and that the professional norms that prevent honest criticism from being conducted in public are themselves a form of corruption. He is not a coalition manager. He is a statistician who believes that statistics is supposed to describe reality accurately and is angry that it often does not.
The distinction matters because coalition management implies a certain strategic flexibility about what to criticize and when. Gelman’s criticism does not look strategic in that way. He criticizes work that is politically congenial to him on the same terms he criticizes work that is not. He criticizes prominent women and prominent men. He criticizes work published in high-prestige journals and work published in lower-tier outlets. He criticizes his own past work. The consistency is not the consistency of a tactician. It is the consistency of someone who has internalized a standard and applies it regardless of the social cost.
That consistency is what makes him credible. The time-reversal heuristic he uses to evaluate published results, asking what we would think of a study if the failed replication had been published first, is also a useful heuristic for evaluating critics. What would we think of Gelman’s criticism if we did not know the social positions of the people being criticized? The answer is usually: the same thing. The criticisms are technical and they hold up.
The contribution to Stan, the probabilistic programming language developed with Bob Carpenter and others, deserves attention as a distinct kind of contribution. Stan makes Bayesian computation accessible to researchers who do not have the mathematical background to derive sampling algorithms from scratch. It handles the difficult technical work of exploring posterior distributions automatically, allowing applied researchers to focus on model specification and interpretation. The contribution is infrastructure: it lowered the barrier to entry for Bayesian analysis across an enormous range of fields, and the research that has flowed through it has been substantial.
The multilevel regression and poststratification work, which Gelman developed with Yair Ghitza and others, has become the standard method for small-area estimation in political science and survey research. The core idea is that you can combine a large national survey with census data on demographic composition to produce estimates of public opinion at the state or congressional district level even when the survey contains only a handful of respondents from those areas. The method requires a multilevel model that borrows strength across demographic groups and geographies, and a poststratification step that reweights the model’s predictions to match the demographic composition of each area. The result is that accurate subnational estimates of opinion become achievable at a fraction of the cost of the large state-level surveys that previously would have been required.
The Xbox study, which Gelman and colleagues conducted using data from the Microsoft gaming platform during the 2012 election, was a demonstration of how far this approach could be pushed. The Xbox data were massively non-representative: they skewed heavily toward young men. After applying multilevel regression and poststratification, the adjusted estimates matched traditional probability-based polls with accuracy. The demonstration was provocative because it suggested that representativeness at the sampling stage mattered less than had been assumed, and that proper statistical adjustment could extract a reliable signal from biased data. This had practical implications for the design of surveys, which had traditionally treated probability sampling as a necessary condition for valid inference.
The work on polling variance in presidential elections, with Gary King, addressed a question that generated enormous confusion in political journalism. Presidential polls fluctuate substantially from week to week during campaigns, yet the final outcome is often predictable months in advance from economic and political fundamentals. The puzzle is why polls vary so much if the outcome is largely determined by structural factors. Gelman and King’s answer was the enlightened preferences hypothesis: voters have underlying preferences determined by their demographics, ideology, and assessments of economic conditions, and the campaign functions as an information environment that helps voters converge on those preferences over time. Early polls reflect noise around the stable underlying signal. As Election Day approaches and voters receive more and better information, the noise diminishes and polls converge toward the fundamental forecast.
This was a finding that political journalists consistently ignored because it implied that most of what they covered as consequential, the daily fluctuations in polls driven by speeches, gaffes, debates, and events, was statistical noise around a signal determined before the campaign began. It did not make campaigns irrelevant, but it substantially diminished their expected impact relative to what horse-race journalism assumed.
Gelman has been consistently willing to apply the same skepticism to politically congenial results that he applies to politically uncongenial ones. His critiques of social psychology research on implicit bias, stereotype threat, and other phenomena aligned with progressive political commitments have been conducted on the same methodological terms as his critiques of nutritional science and management research. This consistency has not made him popular with everyone on his side of the political spectrum, but it is the source of his credibility. A critic who only attacks findings he dislikes politically is not a methodologist. He is an advocate. Gelman has been willing to be inconvenient.
The teaching work, embodied in Teaching Statistics: A Bag of Tricks with Deborah Nolan and Active Statistics with Vehtari, reflects the same commitments translated into pedagogy. The books emphasize story-driven instruction, real data, and the connection between statistical methods and the questions they are supposed to answer. The approach resists the tendency to teach statistics as a collection of procedures to be executed correctly and grades assigned accordingly. It insists that statistics is a way of reasoning about uncertainty in the world, and that students learn it by doing it.
The reform agenda that Gelman has advocated throughout his career is now sufficiently mainstream that it is easy to forget how contested it was when he began advocating it. Pre-registration, open data, post-publication review, the abandonment of the p less than 0.05 threshold as a criterion for publishability, the treatment of replication as a fundamental rather than a supplementary activity: these are now acknowledged as important by most methodologists in the social sciences and increasingly embedded in journal policies and grant requirements. Gelman did not bring about these changes alone. But the public argument he conducted over fifteen years through his blog, his papers, his books, and his public confrontations with specific cases of methodological failure contributed substantially to the conditions in which these changes became possible.
The reply he sent in March 2026 to a question about whether research culture has reformed reveals the complexity of his current assessment. His answer separated two questions that are often conflated. Inside academic science, there has been reform: more skepticism toward underpowered studies, more pre-registration, more open data, less tolerance for the kind of obvious p-hacking that characterized the worst of the pre-replication-crisis era. But the public intellectual ecosystem that science feeds into has not reformed in the same way. The disintermediation of the pipeline from academic research to public influence means that figures like Andrew Huberman and other social media health influencers do not need academic credentialing for their claims to reach millions of people. They can produce and disseminate junk science directly, without passing through the journals, the NPR appearances, the TED talks, and the Gladwell-style popularizations that used to constitute the bridge between academic research and public culture. The reform of academic science has coincided with the collapse of the institutional infrastructure that made academic science consequential for public belief.
This is a sober assessment from someone who spent decades working to improve that infrastructure from within. It suggests that the replication crisis was a symptom of a deeper problem: the relationship between scientific research and public knowledge has been structurally disrupted in ways that better methodology inside academic institutions cannot fully repair.
His observation about Columbia, also delivered in that reply, applies the same structural analysis to university governance. His diagnosis, that universities have executive functions but minimal legislative or judicial functions, explains a pattern visible across many institutions: decisions get made on consequentialist grounds, and administrations repeatedly take the decision to cover up misdeeds.
Gelman at sixty-one remains extraordinarily active. He publishes, teaches, blogs, advises graduate students, contributes to Stan, and continues the daily practice of public reasoning that has made his blog one of the most consistently valuable intellectual resources in empirical social science. The range of what he takes seriously is unusual: a single week’s blog output might include a detailed critique of a clinical trial’s statistical analysis, a reflection on Bayesian workflow, a discussion of polling methodology, a comment on a political science paper, and a response to a reader’s question about multilevel models in education research. The range is not dilettantism. It reflects a commitment to the idea that the problems of statistical reasoning are not discipline-specific. They are problems of how human beings learn from data, and they appear wherever data gets used to support claims about the world.
What distinguishes Gelman from most technical statisticians who have become publicly prominent is that he has never mistaken technical authority for moral authority. He does not tell people what to think about politics, policy, or values. He tells them what the data can and cannot support, and he insists on a rigorous accounting of the difference. In an intellectual environment that consistently rewards confident, morally inflected, narratively satisfying claims, that insistence is subversive. It makes him useful to anyone who wants to know what is known and inconvenient to almost everyone who has built an audience or a career on claiming to know more than the data support.
His legacy is a generation of researchers who treat uncertainty as something to be quantified and communicated, who understand that model checking is not optional, who know that a statistically significant result from an underpowered study is more likely to be wrong than right, and who have been trained to ask what the researcher would have done if the data had come out differently. That training is not complete and the institutional forces working against it are substantial. But it is more deeply embedded in research practice than it was twenty years ago, and Gelman’s sustained, often uncomfortable, always technically serious public argument is a significant part of why.
Andrew Gelman built a heroic life out of refusal. Most men earn their standing by claiming, the bold result, the clean finding, the story that lands. Gelman earns his by declining to claim more than the numbers will bear. His whole authority rests on a discipline that sounds like the opposite of ambition, the insistence on saying only what the data support and stopping there, on the wide interval where others draw the confident line, on the model that might be wrong and the result that might be noise. He made himself the conscience of the empirical sciences by becoming the man who will not oversell, and in a culture that pays for confidence, the refusal to oversell is the rarest thing on the table.
What he built is the garden of forking paths. Jorge Luis Borges (1899-1986) supplied the image. A scientist gathers his data, looks at it, makes a chain of reasonable choices about what to keep and what to drop and which pattern to chase, and each choice is defensible and the chain together delivers a finding that means nothing, a certainty manufactured in good faith. No fraud. Just a man walking the branching paths in real time, led by the data toward the result the data happened to suggest, calling the arrival knowledge. That is Gelman’s terror, the honest self-deception, the false certainty wearing the face of science. His hero is the man who does not fool himself, and the harder feat folded inside it, the man who builds the tools that would catch him fooling himself and then runs them on his own work.
Here is where he parts from most of his peers. The others reach their authority by subtraction, the claim to have stripped the bias and the faith and the construction away to leave the clean residue, reality with the error removed. Gelman denies there is a clean residue. The whole of his method holds that you never reach the bare truth, you reach a range, a posterior, a model that knows something and not everything. Partial pooling, his signature move, refuses both the lie that all cases are the same and the lie that each stands alone, and settles in the honest middle where the data inform you without delivering you certainty. The wide interval is not timidity. It is the true width of what can be known, drawn to scale. Where the deflators say here is the world with the illusions gone, Gelman says here is the world with the uncertainty kept in the picture, because leaving it out is the deepest illusion of all.
Ernest Becker (1924-1974) named the work every man’s creed performs, the holding-off of death through service to something that outlasts him. Gelman’s something is the self-correcting record, the slow public machine by which inquiry catches its own errors and grinds toward truth across the generations. His methods, his students, the norms he pressed on a generation, these go on after him, and the going-on is his answer to the grave. The story his life tells is that science is fragile and precious, that the replication crisis threatened to rot it, and that his criticism defended a thing larger and more lasting than any career. A reductive reader will say the story is a cover, that under the talk of integrity sits the ordinary fear of slipping down the ladder. The reductive reader has not earned the claim. Becker’s point was never that the immortality project masks a baser motive. The project is how the motive lives in an animal that knows it will die. The hunger for significance and the love of the enterprise are not two things, one real and one decorative. They are the same hunger, and to call the nobler name a disguise is to claim a knowledge of another man’s heart that no evidence supplies, which is the one move Gelman spent his life teaching us to distrust.
Sit with that. To deflate Gelman, to announce that his integrity is status anxiety dressed for church, is to do the exact thing his whole career condemns. It is a finding with no power behind it, a confident story reverse-engineered from a man’s success, the garden of forking paths run on a biography instead of a dataset. You can always find the path that makes the honest man look like a careerist, the way you can always find the subgroup that turns the null result significant. Gelman taught the field to ask what we would believe if the study had come out the other way. Ask it here. Had the disintermediation never come, had his kind of science kept its grip on public belief, no one would read his integrity as a cover for status fear, because there would be no falling status to explain it by. The deflation depends on the outcome it pretends to diagnose. By his own time-reversal test it fails. The honest reading grants him the uncertainty he granted the world and takes the man at his word until the evidence says otherwise, and the evidence does not.
The cost is real, and he sees it more clearly than any critic could. He won the war he fought. Inside the academy the reforms took, the pre-registration and the open data and the death of the lonely underpowered study waved through on a lucky p-value. And the victory arrived as the ground gave way beneath it. The bridge from rigorous research to public belief, the science journalism and the popularizers and the lectures that once carried findings from the lab to the living room, gives way, and into the gap pour the direct-to-audience health influencers who need no credential and answer to no review, whose authority is reach and warmth and the parasocial trust of millions. Gelman perfected the instrument and the concert hall emptied. He is right inside a house whose writ no longer runs where most people form their beliefs. His March reply names this without flinching, the reform of the science and the ruin of the channel that made the science count, and a lesser man would have told himself a happier story.
A quieter cost sits beneath that one. The discipline that forbids overclaiming forbids the verdict too, the meaning, the thing a frightened public wants. A man deciding how to live, whether to fear the diagnosis or take the supplement or trust the shot, comes to Gelman and receives a probability interval and a warning that the study was underpowered, which is the truth and is not the bread he came for. The influencer hands him certainty and a plan. Gelman hands him the honest width of the unknown. The honest width is worth more and feels like less, and in a market for feeling, the man who sells the truth about uncertainty is selling the one thing the frightened animal is built to refuse.
The others in this gallery have a blind spot they cannot find. Gelman is the strange case who sees nearly the whole board, the square his own king stands on included. He runs the skepticism on himself, corrects his own old work, names the obsolescence creeping toward his method without dressing it as another man’s fault. The cut is not that he fails to see. The cut is that seeing does not save him. Rigor cannot manufacture the public trust that rigor once earned, and the virtue that built the bridge holds no tool to rebuild it after the culture stops prizing the virtue. He can describe the washing-out of the road with perfect accuracy. He cannot pave it with description.
So the figure stands, the honest accountant of what can be known, the man who made restraint heroic in a field that rewards the confident lie, and who turned his skepticism on himself when the others turned theirs only outward. His hero is the un-self-deceived inquirer. His immortality is the self-correcting record. He is doing the most honest work in the building. The building empties. He keeps the books straight anyway, which is either the last virtue or the first one, and is in any case the only one he was ever willing to claim.
Notes
* David Pinsof’s Alliance Theory cuts in several directions. Gelman occupies one of the most structurally unusual positions in contemporary academic life: he is a member of a coalition, the liberal knowledge-producing class, who spends most of his professional energy attacking that coalition’s outputs. Alliance Theory’s first and most important move is to ask what function this serves, because the naive reading, that Gelman is simply a disinterested truth-seeker, is the kind of stated motive that Alliance Theory treats with immediate suspicion.
The functional reading is this. Gelman’s coalition derives its authority from the claim to produce reliable scientific knowledge. The pipeline he documented in his reply to the question about research culture, from noisy studies through academic middlemen to NPR and TED and Gladwell, was a system for converting thin evidence into public credibility. When that pipeline produces fraudulent outputs, the credibility of the entire coalition is at risk. Rivals can point to the Amy Cuddy study, the Brian Wansink retractions, the power posing claims, and argue that academic social science is not reliable knowledge but prestige-laundered ideology. Gelman’s role, from an Alliance Theory perspective, is to perform the coalition’s self-correcting capacity before rivals can make that argument more damaging. He is doing internal policing to protect the coalition’s epistemic brand. His attacks on methodology serve the alliance more than they threaten it.
This is the standard reading and it is not wrong. But Alliance Theory generates more interesting predictions when you push further.
The first complication is the asymmetry Gelman described in his reply: he attacks work by people who share his political priors just as readily as work by people who do not. Alliance Theory predicts that internal critics will pull their punches when the target is a close ally and sharpen them when the target is a peripheral member whose failure would not damage the coalition’s core. Gelman does not obviously follow this pattern. He went after Amy Cuddy, whose work aligned with progressive commitments about female leadership and embodied cognition. He went after Brian Wansink, whose food behavior research was politically neutral but institutionally prestigious. He went after the implicit association test literature, which is foundational to a significant portion of the diversity and inclusion apparatus that the PMC coalition depends on. These are not peripheral targets. If he were purely doing coalition maintenance, you would expect him to be more selective.
Alliance Theory has an answer to this. The coalition that benefits most from Gelman’s criticism is not the progressive PMC broadly but the specific subcoalition of methodologically rigorous quantitative social scientists who want to differentiate themselves from the junk science end of the market. That subcoalition has its own status interests that are served by demonstrating that the broader market for social science claims is inflated. The more prominent the junk, the more valuable the correction. Gelman is not attacking his coalition. He is attacking adjacent market participants whose inflated claims devalue the currency his coalition trades in. From this angle, his criticism of Cuddy and Wansink is not costly at all. It raises the relative status of the careful quantitative work by demonstrating that the careless quantitative work is junk.
The second complication is the disintermediation point he made in his reply. He described how the pipeline that used to run from researchers through journals and NPR and Gladwell to public influence has been bypassed. Huberman and the supplement industry can go directly to audiences without academic credentialing. This means the public credibility that Gelman was protecting through his internal policing has become less valuable at the same time. The coalition whose authority he was shoring up matters less to public discourse than it did when he started doing the work. Alliance Theory would predict that his motivation to police the coalition’s methodological standards should decline as the coalition’s power to confer and withhold credibility declines. That he continues doing it at the same rate suggests either that the motivation is not purely coalition maintenance, or that the relevant audience for his policing has shifted from the general public to the internal market of methodologically serious researchers where the currency still holds value.
The third and most interesting application is to Gelman’s blog. Alliance Theory’s treatment of sacred values applies here with unusual precision. The sacred value Gelman defends is something like: science is the best available method for learning what is true about the world, and it works only when its practitioners maintain rigorous standards and are willing to correct each other publicly. This is a sacred value in Pinsof’s sense because it is apparently disconnected from self-interest, Gelman absorbs significant social costs to defend it, and it functions to stabilize a status game by disguising it as the pursuit of a non-status-related end.
But Gelman’s position within the status game is unusually secure because of the sacred value he defends. His blog has nine million views. His textbooks are standard references. His status within the methodologically serious quantitative community is very high. The sacred value of rigorous self-correction has made him one of the most influential statisticians of his generation. If he were simply maintaining epistemic standards because he believed they mattered, Alliance Theory would say: that belief is itself the outcome of a selection process that shaped him to defend the standards that his alliance depends on. He believes it sincerely. The sincerity is not evidence against the alliance function. It is evidence that the mechanism is working correctly.
The fourth application is to the institutional critique. Gelman’s diagnosis of Columbia as having an executive function without legislative or judicial functions is itself an alliance move. He is not attacking Columbia’s legitimacy as an institution. He is proposing a governance reform that would make the institution more consistent with the proceduralist values of the academic community. The reform he implies, more internal accountability, more transparent decision-making processes, more resistance to consequentialist cover-ups, would strengthen the coalition’s credibility. This is internal criticism in service of institutional improvement, which is exactly what Alliance Theory predicts an internal auditor will do: criticize craft rather than legitimacy.
Where Alliance Theory runs into difficulty with Gelman is on the question of whether his criticism is strategically calibrated or whether it has a more principled consistency that the framework cannot fully account for. The predictions Alliance Theory would make about a pure coalition maintenance actor, be more critical of rivals than allies, protect the core claims even when they are methodologically weak, calibrate the intensity of criticism to the political valence of the target, do not obviously describe Gelman’s behavior. His criticism of IAT research, his skepticism about implicit bias training, his challenges to nutrition science that was politically neutral, his critiques of election forecasting methodologies used by people who share his political commitments: these are harder to explain as pure coalition maintenance than the framework would prefer.
* David Pinsof “big misunderstanding” essay cuts directly at the tension the prior analysis left unresolved.
The prior analysis concluded that Gelman’s internal policing of social science is best understood as a coalition maintenance strategy that has become, through habit and intellectual formation, something that functions like a principled commitment. The misunderstanding essay challenges the premise that distinguishes those two things.
Pinsof’s central claim is that intellectuals systematically mislocate the cause of human problems because the misunderstanding narrative makes intellectuals indispensable. If the problem is that people have noisy data and forking paths and underpowered studies, then the statistician who names these things is saving science. If the problem is that researchers want publications, grants, prestige, and the social rewards that come from producing emotionally satisfying results, then naming the methodological failures does not address the cause. The cause is the incentive structure, and the incentive structure does not change because someone described it clearly.
Applied to Gelman, this is precise and uncomfortable. His entire project assumes that the replication crisis is substantially a problem of misunderstanding: researchers do not fully appreciate the garden of forking paths, they do not understand Type S and Type M errors, they have not internalized the implications of low statistical power, they mistake p-values for evidence. If they understood these things clearly, they would do better science. That is the misunderstanding model. His blog, his textbooks, his public criticism of specific papers: all of these are interventions designed to produce better understanding.
Pinsof’s essay says: what if the researchers understand perfectly well what they are doing? What if the scientist who runs twenty analyses and reports the one that reaches significance knows, at some level, that this is what she is doing? What if the lab director who encourages students to squeeze four papers out of a null result dataset understands that this is how careers are made? The garden of forking paths is not hidden from the people walking through it. It is the path they are incentivized to take.
This reframes Gelman’s contribution. If the problem is motivated behavior rather than cognitive error, then Gelman has not been diagnosing the disease. He has been describing its symptoms. The symptoms are methodological: p-hacking, underpowered studies, researcher degrees of freedom, failure to replicate. The disease is the incentive structure that makes these behaviors adaptive for individual researchers regardless of their collective cost to the credibility of science. Gelman’s corrections are accurate descriptions of the symptoms. They have not changed the incentive structure because nothing in his interventions changes the incentive structure.
The evidence for this reading is in his own reply. He described reform inside academic science over the past decade, more skepticism toward noisy studies, more pre-registration, more open data. But he also described the disintermediation problem: Huberman and the supplement industry have bypassed the credentialing system entirely and can go directly to audiences. The reform happened and the problem got worse. This is exactly what Pinsof’s essay predicts. You can improve the methodological standards of credentialed researchers and it changes nothing about the broader ecosystem of motivated belief production, because the broader ecosystem was never running on misunderstanding. It was running on incentives that credentialing reform does not reach.
There is a deeper application. Pinsof’s essay argues that cognitive biases are not mistakes but savvy strategies. Confirmation bias helps you win arguments. Overconfidence helps you project credibility. The self-serving bias helps you maintain your own motivation in the face of failure. Gelman has spent his career framing these as errors that better statistical education would correct. Pinsof says they are adaptive responses to the selection pressures researchers face. The researcher who properly accounts for uncertainty, who reports wide confidence intervals and acknowledges the limitations of underpowered studies, is doing epistemically better work and is less likely to get published, get grants, get invited to give talks, or build the kind of public profile that converts academic work into career advancement. The misunderstanding model says these researchers need to learn better statistics. The rational actor model says they already know better statistics and are choosing not to apply them because the incentives do not reward it.
This creates a tension in how you evaluate Gelman’s project. His pre-registration advocacy, his open data requirements, his Bayesian workflow, his posterior predictive checks: all of these are institutional design changes rather than educational interventions. They change what behavior is rewarded. To that extent he is not operating within the misunderstanding model at all. He is operating within a model where the incentives need to change, and his advocacy for pre-registration is an attempt to change them. That is the part of his project that Pinsof’s essay would endorse as correctly locating the cause.
But the blog, the textbooks, the public criticism of specific papers: these are educational. They assume that if enough researchers understand Type S and Type M errors, the field will improve. Pinsof’s essay predicts this will have limited effect because the researchers who produce the most egregiously underpowered work are not doing so out of ignorance. They are doing so because it is what the system rewards, and reading Gelman’s blog does not change what the system rewards.
The hardest application is to Gelman himself. Pinsof’s essay asks: what if intellectuals are not saving the world but competing for status through the reliable signal of naming others’ errors? Gelman’s blog has nine million views. His critiques of Cuddy and Wansink made him substantially more prominent than he would have been without them. The Amy Cuddy conflict was covered in the New York Times Magazine. His profile in the replication crisis era rose because the crisis gave his methodological work a dramatic public stage. None of this means his critiques were wrong. They were right. But the misunderstanding essay’s point is that being right and being strategically positioned to benefit from being right are not incompatible, and the intellectual who derives status from exposing others’ errors has a motivated interest in finding errors worth exposing that is independent of whether finding those errors improves science.
What the essay adds to the Gelman analysis is therefore this: a distinction between the part of his project that correctly identifies the cause of the problem, the institutional design work around pre-registration and open data that changes incentives, and the part that operates within the misunderstanding model, the educational and critical work that assumes understanding the errors will reduce them. Pinsof predicts the first will have durable effects and the second will have limited ones. That Gelman himself pointed to methodological reform inside academic science while noting that the broader public credibility problem had gotten worse through disintermediation is consistent with exactly this prediction. The incentive-changing interventions worked within the credentialed system. The understanding-changing interventions did not reach the system that had escaped credentialing entirely, because that system was never running on misunderstanding to begin with.
* David Pinsof’s essay on signaling one thing the prior analyses missed, and it is the most personally relevant addition for understanding Gelman specifically.
The defensive versus offensive signaling distinction is the contribution. The prior Alliance Theory analysis treated Gelman’s internal policing as offensive signaling: he attacks methodological failures to raise the relative status of rigorous work and to demonstrate that his coalition has self-correcting capacity. That framing makes him sound more strategic and more status-driven than he probably is. The defensive signaling essay complicates this.
Pinsof argues that most signaling is defensive. People are not trying to climb the hierarchy. They are trying to avoid falling off it. The fear of being seen as the person who let junk science slide, who stayed quiet while Brian Wansink ran his food lab, who said nothing when Amy Cuddy’s power posing claims circulated to millions: that fear is more plausible as Gelman’s primary motivation than the ambition to be recognized as the most rigorous statistician of his generation.
This matches the texture of what he does. He is not building a brand. He is unable to stay quiet when he sees something wrong. The blog’s tone is not triumphalist. It is almost compulsive. He posts corrections, qualifications, responses to responses, follow-ups on follow-ups. The sheer volume and consistency of the output looks more like someone who cannot stop noticing errors than like someone carefully calibrating which errors to attack for maximum status return. A pure offensive signaler would be more selective. He would pick targets that maximize visibility and minimize coalition cost. Gelman repeatedly picks targets where the coalition cost is not trivial, where the work is politically congenial, where the researchers are sympathetic figures. That is not the behavior of someone optimizing for status gain.
The defensive framing also explains the Amy Cuddy conflict better than the Alliance Theory analysis alone. The New York Times Magazine framed Gelman’s criticism as bullying, as a failure of collegiality, as the behavior of someone enjoying the destruction of a colleague’s career. Gelman’s response was that the accuracy of published claims is not a private matter. That is a defensive signal in Pinsof’s sense: it is an attempt to avoid being the person who knew a paper was methodologically weak and said nothing, who participated in the private correction system that keeps errors circulating while protecting reputations. The shame he is defending against is the shame of complicity, not the vanity of superiority.
The essay’s point that defensive signals often hide in darkness is also relevant. Gelman does not present his work as defensive. He frames it as about standards, transparency, and the integrity of science. He does not say I am doing this because I cannot bear to be the person who stayed quiet. He says he is doing it because science is a public activity and published claims require public defense. Both framings are accurate. The defensive motivation is real and mostly invisible, including probably to Gelman himself, while the principled framing is what becomes common knowledge.
The most useful addition is to the disintermediation observation he made in his reply. He noted that academic science reformed at roughly the same moment that the pipeline connecting academic science to public influence collapsed. Pinsof’s defensive signaling frame suggests this is not a coincidence. When the pipeline was intact, the reputational costs of staying quiet about junk science were distributed across many institutions: journals, NPR, TED, book publishers, university press offices. The defensive pressure to maintain standards was spread thin because the consequences of any individual failure were partially absorbed by the system. When the pipeline collapsed and Huberman and the supplement industry went direct, the credentialed system could no longer pretend that its standards were doing the work of protecting public knowledge. The defensive pressure concentrated. Researchers who had looked the other way at noisy studies while the system maintained its authority could no longer afford to do so once that authority was visibly eroding.
Gelman had been making the defensive signal before the crisis concentrated it. That is why his position shifted from annoying internal critic to indispensable auditor. The crisis did not change what he was doing. It changed what the coalition needed from him. The defensive signal he had been sending at personal social cost became, in the context of the replication crisis, exactly what the coalition required to maintain any credibility at all.
What the essay adds in sum is this: Gelman is better understood as someone who is constitutionally unable to perform the complicity that academic life normally requires than as someone who has strategically positioned himself as an internal critic for status gain. The distinction matters because it changes the prediction about his durability. A strategic offensive signaler will adjust his behavior when the status rewards change. Someone running on defensive motivation will keep doing what he does regardless of the reward structure, because what he is defending against is a form of shame that does not go away when the status game shifts. That is why the prior analysis was right that the behavior looks more like a principled commitment than a calculated position. Pinsof’s defensive signaling essay gives the mechanism: the commitment is real because it is not about gaining status but about not being the person who knew and said nothing.
* The David Pinsof charisma essay’s central claim is that charismatic people are the gold medalists in social paradoxes. They gain status by not caring about status. They are authentic because that is what society wants them to be. They manipulate without being manipulative. The signal is buried so completely that neither the signaler nor the recipient perceives it. When you are with someone charismatic, you have no sense they are trying to impress you. They are just a pure bright ball of shimmering authenticity.
Gelman is conspicuously not this.
His blog is awkward in the best sense. He admits uncertainty, reverses positions publicly, posts corrections to his own corrections, engages with hostile commenters at length, and often says things that make him look pedantic or difficult. He is the opposite of the smooth social operator who conceals his strategies. He is the person whose strategies are almost always visible. He tells you exactly what he is doing and why. There is no performance of effortless mastery. There is just a man with chalk who keeps noticing things wrong.
This is where the charisma essay adds something non-obvious. Pinsof distinguishes between people who are good at social paradoxes and people who are bad at them. People who are bad at social paradoxes come across as cringe, pretentious, thirsty, awkward. Their signals are too obvious. They try too hard. They interpret the values of their culture too literally and pursue them too monomaniacally. Pinsof gives the example of the effective altruist who raises money for shrimp welfare instead of running a cancer marathon like a cool person. Someone who has missed the point of what the social game requires even while sincerely trying to play by its stated rules.
Gelman is a version of this but with a crucial difference. He is bad at social paradoxes in a way that has become its own form of status, which is itself a higher-order social paradox. He is the person who cares visibly and intensely about methodological rigor, who does not bother to conceal the caring, who will spend a week arguing in blog comments about a minor statistical point with someone nobody has heard of. Normally this behavior signals low status: only someone who is insecure about their position argues that hard about small things. But Gelman has been doing it so consistently and so publicly, and has been right often enough about things that mattered, that the visible caring has become a valid cue of something valuable: that he means what he says about standards.
This connects to Pinsof’s point about symbiotic deception. The essay argues that deception can be mutually beneficial: the deceiver gains status and the deceived gain a reliable partner whose apparent social competence is a valid cue of underlying quality. Gelman’s case is the inverse. His apparent social incompetence, his refusal to play the concealment game, is a valid cue of epistemic reliability. You trust him because he does not seem to be managing your impression of him. The absence of the usual concealment signals that the usual concealment is absent. Whether this is itself a sophisticated form of higher-order social performance, the person who is so good at social paradoxes that he achieves authenticity through its complete abandonment, is unanswerable. Pinsof’s essay does not give you the tools to resolve it.
The charisma essay’s section on status game collapse is the most directly applicable piece. Pinsof argues that when players of a status game gain common knowledge that they are playing a status game, the game collapses and inverts. The winners look conniving and entitled. The losers look humble and modest. In the aftermath, you gain status by doing the opposite of what was done before. Baroque complexity gives way to modernist simplicity. Conspicuous consumption gives way to inconspicuous consumption. Ornate displays of erudition give way to plain speaking.
The replication crisis was exactly this kind of status game collapse for academic social science. The players who had won by producing emotionally resonant, policy-relevant, TED-talk-ready findings suddenly looked conniving and entitled. Their methodology was exposed as a status game. The people who had been arguing for boring, rigorous, uncertain science, who had been losing the status competition because their work was not compelling enough for NPR, suddenly looked humble and modest and honest.
Gelman was positioned to benefit from exactly this inversion because he had spent years doing the opposite of what was winning. He had been insisting on uncertainty when the market rewarded confidence. He had been demanding replication when the market rewarded novelty. He had been reporting wide intervals when the market rewarded clean significant results. When the status game collapsed and inverted, his accumulated record of doing the opposite of what the game rewarded became a large asset. He had not accumulated it strategically in anticipation of the collapse. He had accumulated it because he could not do otherwise. But the collapse rewarded it as if he had planned it, which is the kind of outcome that social paradox theory predicts: the person who succeeds most in the aftermath of a status game collapse is often the one who was least invested in the original game.
The charisma essay adds one final piece that connects back to the misunderstanding essay. Pinsof says charismatic people have tools for becoming cult leaders. They can manipulate without being manipulative, defend their reputations without getting defensive. Gelman has no such tools. His defenses are completely visible. When the New York Times Magazine piece framed his criticism of Amy Cuddy as bullying, he responded by explaining at length why he thought the framing was wrong and why public criticism of published science was appropriate. He did not perform equanimity. He defended himself while visibly caring about defending himself. This is the anti-charismatic move. It is also, Pinsof’s essay implies, probably the honest one, because the person who can defend their reputation without appearing defensive is performing a concealment that Gelman is constitutionally unable to perform.
What the charisma essay adds in total is a way to see that Gelman’s apparent social incompetence within academic culture, his refusal to play the concealment games that the culture normally requires, is neither a strategic choice nor a failure. It is a character trait that became an epistemic credential at the moment the culture’s status game collapsed and inverted. He did not engineer this outcome. He benefited from it because the thing he had been doing all along happened to be exactly what the collapsed status game needed someone to have been doing. That is the social paradox Pinsof would identify in Gelman’s career: the person who refused to play the game won it because he refused, and now the refusal itself is the most valuable signal he can send, which means he is, despite everything, playing the game, at one more level of recursion than he would recognize or acknowledge.
* David Pinsof’s social paradoxes paper has a core argument: social paradoxes emerge from the interaction of two cognitive systems: recursive mindreading and cue-based inference.
Gelman’s sacred value is methodological rigor and the integrity of the scientific record. It is well-designed in Pinsof’s sense: it is conceptually distant from status, it appears disconnected from self-interest since Gelman incurs real social costs to defend it, and it provides cover for behaviors that would otherwise be recognizable as status competition. Calling out a prominent researcher’s methodological failures is a dominance move. Under the sacred value of scientific integrity, it becomes a defense of the commons.
The paper’s point that sacred values should awkwardly track real status acquisition is where the analysis gets most precise. Pinsof predicts that wherever the sacred value appears, competition for superiority should follow closely behind, and that pursuit of the sacred ideal should, beneath appearances, be indistinguishable from the pursuit of social rewards. The evidence for this in Gelman’s case is not that he is cynical or dishonest. It is structural. The cases where he is most prominently and publicly critical are the cases where the target is high-profile enough that the criticism generates substantial attention. He criticizes many things on his blog, but the cases that become events, that generate press coverage and symposia and responses, are the ones where the target is prominent. This is not obviously strategic selection on his part. But the outcome tracks the prediction: the sacred value criticism generates the most status return when it is aimed at the most prominent targets.
The paper’s discussion of status game collapse and inversion is also relevant here in a way the charisma essay’s treatment only sketched. The academic paper provides the mechanism: when common knowledge sets in that a status game is a status game, the hierarchy inverts. Players who accumulated rank through the old signals now look conniving. People who did the opposite now look humble. Gelman’s position in the post-replication-crisis landscape is exactly this. He accumulated a record of doing what was low-status before the collapse, and the collapse made that record into a credential. The paper explains why this happens mechanically: the negative cues attached to the old status signals transformed into positive signals for their opposites, and Gelman had been producing the opposites consistently enough that he had a large stock of them available when the inversion occurred.
The most precise application is to Gelman’s disintermediation observation. He described how academic science has reformed its internal standards while losing its relevance to public knowledge because Huberman and the supplement industry bypassed the credentialing system. The social paradoxes paper explains this as a status game that has partially collapsed inside academic science while remaining intact outside it. Inside the credentialed system, common knowledge of the replication crisis has transformed the old signals: claiming surprising significant results from small samples is now a negative cue. Outside the credentialed system, the old signals are still working fine because the audience has not gained common knowledge that they are status signals. Huberman’s audience does not know they are watching a status game. The academic audience does know, which is why the game has partially collapsed there.
Gelman’s reforms addressed the collapse inside the credentialed system. They cannot address the intact game outside it because the mechanism that would collapse that game, the audience gaining common knowledge that the signals are status signals, has not occurred and Gelman has no means to produce it. The people watching Huberman are not reading Gelman’s blog. The sacred value of scientific integrity does not stabilize the external status game because it is not the sacred value the external game is running on.
What the social paradoxes paper adds to the Gelman analysis in total is a mechanistic account of what the other frameworks described only structurally. Alliance Theory explained why Gelman’s coalition needs internal policing. The defensive signaling essay explained why his motivation is more about avoiding shame than gaining status. The misunderstanding essay explained why his educational interventions have limited reach. The charisma essay explained why his anti-charismatic style became an asset after the status game inverted. The social paradoxes paper explains the sequence of transformations that produced all of these outcomes: the original valid cue becoming a signal, the honest signal generating negative inferences, the attempted transformation of his criticism into a cue of bad character, the sacred value stabilizing the internal status game while leaving the external one untouched, and the inversion that made his accumulated record of counter-signaling into the most valuable credential available in the post-crisis landscape. The mechanism is the cue-signal instability produced by recursive mindreading operating on a population of researchers who are all trying to anticipate how their behavior will be read, which is exactly what Gelman’s garden of forking paths concept describes at the individual level. His diagnosis of the crisis and the crisis of his diagnosis are generated by the same underlying process.
* The central claim of David Pinsof’s essay “Why Things Go To Shit” is that everything goes to shit unless there is an incentive for it not to. Applied to Gelman, this reframes the entire replication crisis and his response to it in a way that is cleaner than any of the prior frameworks managed.
The prior analysis established that Gelman’s internal policing serves a coalition maintenance function, that his motivation is defensive, that his educational interventions have limited reach because the problem is incentives not misunderstanding, and that his institutional design work around pre-registration and open data is the part of his project that changes behavior. The Why Things Go to Shit essay provides the single underlying principle that explains why all of these observations are true.
Academic social science went to shit because there was no strong incentive for it not to. The incentive structure rewarded publication, not replication. It rewarded significant results, not accurate ones. It rewarded narrative clarity, not honest uncertainty. It rewarded novelty, not rigor. Under these conditions, the garden of forking paths was not a bug or a failure of understanding. It was the rational response to the incentives. Gelman has been saying this for twenty years. Pinsof’s essay gives him the most parsimonious possible statement of the argument: the literature went to shit because there was no incentive for it not to.
This is where the essay adds something the prior analyses missed. The misunderstanding essay established that Gelman’s educational interventions have limited effect because people understand what they are doing. The Why Things Go to Shit essay explains why this is a general law. There is no incentive for beliefs to be accurate beyond the domain of direct sensory experience and practical decisions, unless an incentive structure guides them toward truth. The prestige economy surrounding scientific research is supposed to be that incentive structure. Gelman’s entire career has been an argument that the prestige economy is not aligned with truth-tracking in the way it claims to be, and that the misalignment is not accidental but structural. The essay’s law predicts this outcome directly: the prestige economy will go to shit to the extent that it lacks incentives for accuracy, and it went to shit in exactly the ways and at exactly the rate that the incentive misalignment predicted.
The essay also clarifies why Gelman’s reform proposals are the most important part of his project. Pre-registration, open data, registered reports, post-publication review: these are not educational interventions. They are incentive structure changes. Pre-registration removes the incentive to explore data until significance appears because the exploration is now on record. Open data removes the incentive to hide inconsistencies because the data is now public. Registered reports remove the incentive to run studies and report only the significant ones because the journal commits to publish based on the design before seeing results. Each of these changes the incentive structure. The essay predicts these will work where educational interventions will not, because the law operates through incentives, not through knowledge.
The disintermediation observation Gelman made in his reply is the most directly illuminating application. He described how academic science reformed its internal standards while losing relevance because Huberman and the supplement industry bypassed the credentialing system. The Why Things Go to Shit law explains this. Inside the credentialed system, the replication crisis created a new incentive: the reputational cost of being associated with non-replicating work became high enough to change behavior. Outside the credentialed system, no equivalent incentive exists. Huberman has every incentive to produce confident health claims and no incentive to produce accurate ones, because his audience cannot evaluate accuracy and rewards confidence. The academic reform worked where the incentive changed. The broader problem got worse where it did not.
The essay also adds something to understanding why Gelman keeps doing what he does despite the limited effect of his educational interventions. He is not trying to change researchers through understanding. He is trying to change the incentive structure. The blog functions as a reputation tax on bad methodology: knowing that Gelman might write about your paper creates a small but real incentive to be more careful. The public criticism of Wansink and Cuddy was not aimed at those individuals. It was aimed at anyone watching who understood that public methodological failure now had costs. The blog is an incentive structure intervention disguised as educational content.
The deepest application is to Gelman’s sacred value of scientific integrity. The essay predicts that sacred values, like every other human institution, will go to shit unless there is an incentive for them not to. The history of science is a history of repeated episodes where the prestige economy captured the truth-tracking function and redirected it toward social reward. The mid-century physics prestige economy was more aligned with truth-tracking than the mid-century social psychology prestige economy because physics had clearer feedback mechanisms: predictions either matched experimental results or they did not. Social psychology had no equivalent feedback because the phenomena it studied were too noisy and too distant from direct practical application for failures to be visible quickly. The sacred value of scientific integrity was better maintained where the incentive structure supported it and went to shit where it did not.
Gelman understands this, which is why his reform proposals are all incentive changes. He is not asking researchers to care more about truth. He is asking journals to create structures that make caring about truth adaptive. The essay’s law predicts this is the only intervention that will have durable effects. Everything else, the blog posts, the textbooks, the public criticism, the conference talks about the replication crisis, these are all second-order effects that shift the reputational incentives at the margin without changing the underlying structure. They matter because reputational incentives are real incentives, and shifting them at the margin is better than not shifting them. But they will not solve the problem because the structural incentives of publication, grant funding, and career advancement have not changed enough.
What the essay adds in total is this: Gelman has been fighting entropy. The academic literature was moving toward disorder because that is what systems without aligned incentives do. His career has been a sustained attempt to introduce enough friction into that process to slow it down, partly through reputational incentives that the blog and public criticism create, and partly through institutional design changes that alter the underlying structure. The essay’s law predicts that the reputational interventions will be partially effective while they remain novel and costly, and will become less effective as they become routine and expected. The institutional design changes will be more durable because they alter the incentive structure directly. The disintermediation problem will not be solved by either because it operates in an ecosystem where neither reputational costs nor institutional design changes have reached. That ecosystem will continue producing junk science because there is no incentive for it not to, and Gelman has no mechanism to introduce one.
The final addition is the most uncomfortable. The essay predicts that Gelman’s project itself, the effort to maintain methodological standards through a combination of public criticism and institutional reform, will tend to go to shit unless there is a strong incentive for it not to. The incentive that has sustained it so far is the replication crisis, which made methodological rigor reputationally valuable in a way it had not been before. If that incentive fades, if the crisis recedes from memory and the prestige economy realigns around new forms of impressive-sounding research, the reforms will erode and the literature will drift back toward the conditions that produced the crisis. Gelman has been building incentive structures to prevent this. The essay’s law says the incentive structures will themselves go to shit unless there are incentives for them not to, which is the problem of institutional maintenance that every reform movement eventually faces and that no amount of methodological sophistication resolves.
* David Pinsof’s vague bullshit essay argues that vagueness functions as a coalition filter. The people who get your vague statement are demonstrating similarity, closeness, attention, and respect. They are the ideal alliance partners. The vagueness selects for them by excluding everyone else.
Applied to Gelman, this illuminates his project from an unexpected angle. His entire career has been a systematic attack on vagueness in scientific claims. The garden of forking paths, Type S and Type M errors, posterior predictive checks, the demand that claims be specific enough to be falsified: all of these are tools for reducing the vagueness of scientific assertions. He wants claims that have determinate enough content that you can tell whether the data supports them or not. He is, in Pinsof’s technical sense, an anti-vagueness crusader operating inside a field that has systematically benefited from strategic vagueness.
The power posing literature is the clearest example. The original claim was vague in Pinsof’s precise sense: when Amy Cuddy said power poses increase testosterone, decrease cortisol, and improve risk tolerance, the claim had enough interpretive flexibility that defenders could always retreat to a more defensible version when the original was challenged. When the hormonal effects failed to replicate, the claim shifted to felt power. When felt power was challenged, it shifted to something more subjective still. Each retreat maintained plausible deniability by exploiting the vagueness of what the original claim was asserting. This is exactly the coalition technology Pinsof describes. The vague claim unites a community around a sacred value, in this case the claim that embodied psychology produces measurable behavioral effects, while remaining slippery enough that no single piece of contrary evidence can definitively refute it. Gelman’s insistence on specificity was an attack on the vagueness that made the coalition technology function. If you specify the claim precisely enough that it is falsifiable, you destroy its ability to serve as a rallying point for a community that needs flexibility to maintain cohesion in the face of inconvenient evidence.
This reframes the Cuddy conflict. It was not a dispute about statistical power or sample size or researcher degrees of freedom, though it was also those things. It was a dispute about whether scientific claims in social psychology should be specific enough to be falsifiable or vague enough to survive negative evidence. Gelman was insisting on the former. The community organized around power posing was operating on the latter. The New York Times Magazine piece that framed his criticism as bullying was doing the work Pinsof’s essay predicts: it was protecting a sacred value by making the attack on vagueness legible as a character flaw rather than as a methodological standard. The community’s defense was not to produce more precise claims. It was to reframe the demand for precision as an act of aggression.
Pinsof argues that pondering vague bullshit allows people to show off their interpretive acumen, to demonstrate that they can extract meaning from chaos. Implicit association test research, power posing, ego depletion, stereotype threat: all of these are vague enough that understanding them requires enough interpretive work to feel like intellectual accomplishment. The practitioner who deploys these frameworks in applied contexts, the diversity trainer who uses implicit bias research to structure an intervention, the executive coach who incorporates power posing into her program, the therapist who applies ego depletion findings to her clients, is demonstrating interpretive sophistication by applying findings whose vagueness would otherwise make them unusable. The vagueness is the feature, not the bug. It requires expertise to deploy, which makes deployment a status signal. Gelman’s insistence that the findings are too vague to support any determinate applied claims does not just challenge the research. It challenges the status structure built on the interpretive labor of applying that research.
Pinsof argues that gobbling up vague bullshit can be a display of fealty to allies or submission to leaders. The more credulous you are of someone’s vague claims, the more you must trust them. This maps directly onto Gelman’s observation that the pipeline from research through academic middlemen to public influence was a trust network as much as an evidence network. When NPR covered Brian Wansink’s food behavior research, or when TED promoted power posing, the credulous reception of those findings was partly a display of fealty to the scientific establishment whose authority the coverage was endorsing. Audiences trusted the findings because they trusted the institutions, not because they had evaluated the evidence. Gelman’s public criticism attacked not just the findings but the fealty structure: it said that the institutions whose authority was being invoked to lend credibility to the findings were themselves failing to maintain the standards that would make that authority legitimate. This is why the criticism was received as threatening even by people who had no personal stake in power posing or food behavior research. It challenged the fealty structure that sustained the entire pipeline.
Pinsof describes how flimflam artists use vague information to create the illusion that they know you better than your best friend, producing an addictive feeling of being understood in an atomized world. The junk social science pipeline created a version of this for educated audiences. Gladwell’s books, TED talks, NPR’s social science coverage: all of these produced the feeling that science was explaining you to yourself, revealing hidden mechanisms underlying your behavior that you had never quite been able to articulate. The feeling of finally understanding why you make the decisions you make, why you are susceptible to certain influences, why your body responds to posture the way it apparently does, was intoxicating because it was vague enough to seem like it applied to you personally while being general enough to apply to everyone. Gelman’s destruction of these findings did not just remove the knowledge claims. It removed the feeling of being understood, which is why the cultural resistance to his criticism has been so persistent even among people who are technically sophisticated enough to evaluate his statistical arguments.
The final addition is to the disintermediation observation Gelman made in his reply. He described how Huberman and the supplement industry have bypassed the credentialing system and gone directly to audiences. The vague bullshit essay explains why this bypass is so effective. Huberman’s claims are vague in exactly the right ways: specific enough to sound scientific, vague enough to survive negative evidence, wrapped in the language of mechanism and optimization that signals insider knowledge, and delivered with enough personal authority that listeners experience the fealty-display function of believing him. His audience is not evaluating his claims. They are demonstrating alliance membership and interpretive sophistication by engaging with his framework at all. Gelman’s reforms inside the credentialed system, pre-registration, open data, registered reports, addressed the vagueness problem within the community that had agreed to play the specificity game. Huberman’s audience never agreed to play that game. They are playing a different game entirely, one where vagueness is a feature, and where Gelman’s demand for precision is an uninvited disruption of a coalition technology that is working exactly as intended.
Gelman’s project is a sustained assault on the coalition function of vagueness in scientific claims. His insistence on specificity, falsifiability, and the precise quantification of uncertainty is not just a methodological preference. It is an attack on the mechanism by which scientific communities maintain cohesion, signal alliance membership, and stabilize their status games in the face of negative evidence. The resistance to his project is not epistemic. It is social. The people defending vague social science claims are defending the coalition technology those claims serve, and they experience the attack on vagueness as an attack on the community itself, which is exactly what it is in Pinsof’s functional sense. The disintermediation problem is the ultimate expression of this: when the community that had agreed to play the specificity game lost its monopoly on public credibility, the vague bullshit function migrated to platforms where no one had agreed to play by those rules, and Gelman has no mechanism to follow it there.
* Does Gelman’s story about the replication crisis make evolutionary sense?
The story Gelman tells is roughly this. Researchers produce inflated, non-replicating findings because the incentive structure rewards publication over accuracy, novelty over rigor, and significant results over honest uncertainty. This is a structural account. The implication is that if you change the incentive structure through pre-registration, open data, and registered reports, you change the behavior. The story makes partial evolutionary sense: yes, organisms respond to incentives, and yes, changing incentives changes behavior at the margin.
But the story misses something the formula surfaces. The researchers who produce junk social science are not responding to a misaligned incentive structure that was imposed on them from outside. They are social primates who evolved to seek status, to maintain coalition membership, to tell stories about their work that make it seem more important than it is, to self-deceive in ways that make their self-promotion more convincing, and to resist challenges to their status claims with social responses. These are not artifacts of the academic incentive structure. They are what Gelman is working against at the species level. The incentive structure is the contemporary scaffolding that gives these tendencies their current form. Pre-registration changes the scaffolding.
* David Pinsof’s bullshit advice essay reframes the Gelman blog as an advisory project. Pinsof argues that very few people have a meaningful stake in your success. Gelman’s stake in the success of the researchers he criticizes on his blog is essentially zero. He is not their advisor. He is not their colleague in any functional sense. He is not someone whose career depends on theirs going well. The blog addresses thousands of researchers across dozens of fields whose specific situations he does not know and whose success he has no incentive to promote.
This means the blog’s primary social function, by Pinsof’s analysis, is grooming. It establishes Gelman as someone who sees more clearly than the researchers he criticizes, which is a status claim. It signals coalition membership with readers who already believe the replication crisis reflects methodological failure, which is the audience the blog has assembled. It provides rationalization for positions those readers already hold about the unreliability of social psychology, nutrition science, and management research. The readers who find the blog most valuable are the ones who already agree with its diagnoses, which is the grooming audience rather than the audience that most needs the advice.
Gelman’s criticism of work that is politically congenial to his coalition, implicit association test research, stereotype threat, ego depletion, is the most status-costly advice he gives. It is also the advice that most clearly signals to his core audience that the coalition membership he is offering is based on methodological standards. That signal is more valuable to his audience than any specific methodological correction, because it demonstrates that the coalition he represents prioritizes accuracy over comfort. But in Pinsof’s framework, demonstrating that your coalition prioritizes accuracy over comfort is itself a coalition signal, which means even Gelman’s most politically costly criticism is functioning as grooming for the audience that values watching him incur that cost.
Pinsof argues that advice often legitimizes whatever the recipient wanted to do anyway, which is why vague advice is more popular than specific advice. Gelman’s advice is unusually specific: pre-register your studies, share your data, report effect sizes with confidence intervals, use posterior predictive checks. This specificity distinguishes it from most bullshit advice. But the blog’s broader message, that the replication crisis reflects structural failure, functions as rationalization for a wide range of readers who want confirmation that their skepticism about prominent social science findings is justified. A reader who already distrusted power posing reads Gelman’s analysis as confirming that distrust. A reader who already believed nutrition science was unreliable reads it as confirming that belief. A reader who already thought priming effects were too good to be true reads it as confirming that intuition. The specific methodological content is real and accurate. But the function is rationalization of prior commitments, which is the grooming function.
The most uncomfortable application is to the Amy Cuddy conflict specifically. Pinsof’s essay argues that giving advice when you are equal or lower status than the recipient feels like status theft to them, regardless of whether the advice is objectively helpful. Cuddy was a prominent Harvard professor with a TED talk viewed by millions. Gelman was a statistician at Columbia whose work was respected inside a specialized methodological community but not widely known outside it. His criticism of her work was received as status theft not because it was wrong but because the social logic of advice giving, which says the advisor must be higher status than the recipient to have the right to advise, was violated. The New York Times Magazine piece that framed his criticism as bullying was articulating exactly this: he had stolen status by advising down rather than up, and the social penalty for status theft is the same regardless of whether the advice is correct.
* Pinsof’s arguing essay adds the precise account of why Gelman’s critics respond the way they dog.
The Amy Cuddy conflict is the clearest case. Gelman’s criticism of her work was methodologically precise: specific claims about sample size, researcher degrees of freedom, the garden of forking paths, Type M and Type S errors. It was, in Pinsof’s terms, as close to a real argument as academic criticism gets. It cited specific evidence. It made falsifiable claims. It addressed positions Cuddy held rather than straw-manned versions. It was not tribal chanting.
The response was a pseudoargument. The New York Times Magazine piece did not engage the methodological content. It argued that Gelman was cruel, that he had not contacted Cuddy privately, that his criticism was bullying. These are the diagnostic markers Pinsof lists: arguing against positions the person does not hold, interpreting behavior in the worst possible light, focusing on the relative status of people rather than the truth of propositions, deflecting from the dispute. Susan Fiske’s methodological terrorist framing was the purest expression: it reframed a methodological real argument as a tribal attack requiring tribal defense.
The dispute about power posing was not an intellectual disagreement about statistical methodology. It was a dispute that touched the tribal identity and institutional status of the social psychology coalition. That coalition’s response to an attack on one of its high-status members was not to engage the methodological argument. It was to punish the attacker for deviating from the coalition norm that members defend each other against outside criticism. The punishment took the form of the methodological terrorist framing, which is exactly what Pinsof’s essay predicts: the coalition creates common knowledge that attacking our members is unacceptable, that doing so marks you as aggressive and dangerous rather than epistemically rigorous, and that the social penalty for the attack is high enough to deter future attacks.
What Gelman was doing, in Pinsof’s framework, was bringing concrete practical rationality into a domain where the intergroup dominance game was being played as the persuasion game. Pinsof explicitly identifies this as the behavior of autistic-adjacent people who are too socially unintelligent to recognize that the game being played is not the one it appears to be. This is not an insult. It is a precise description of the mismatch. Gelman was playing the real argument game in a context where everyone else was playing the pseudoargument game. His frustration that the methodological content was not engaged, his insistence that science is a public activity and published claims require public defense, his refusal to understand why private correction would have been more appropriate: all of these are the responses of someone who did not recognize that he had entered a pseudoargument rather than a real argument, or who recognized it and refused to adapt.
Gelman described how Huberman and the supplement industry have bypassed the credentialing system. Pinsof’s arguing essay explains why this bypass is so effective at the level of argument structure. Huberman’s audience is not engaged in real arguments about health claims. They are engaged in tribal chanting and status maintenance around a specific identity: the person who takes their health seriously, who does the research, who is not fooled by mainstream medicine. Gelman’s methodological criticism of that ecosystem is a real argument entering a pseudoargument structure. The response is not to engage the evidence. The response is to treat the criticism as a tribal attack, to dismiss the critic as captured by pharmaceutical interests or academic elites, and to reinforce the coalition’s common knowledge that outsiders who challenge the tribe’s claims are enemies rather than evidence-bearers.
* David Pinsof’s incentives are everything essay lists three conditions for changing the world with words. Gelman has spent his career trying to change the world with words, and the three conditions explain with unusual clarity why his project has partially succeeded inside the credentialed system and completely failed outside it.
Gelman had something new and important to say that no one else was saying with his combination of technical precision and public reach. The garden of forking paths, Type S and Type M errors, the time-reversal heuristic: these were new analytical tools that gave the community language for problems it had been experiencing without being able to name. The first condition is not in question.
Inside the credentialed system, the replication crisis created exactly this incentive. When enough high-profile studies failed to replicate, the institutional cost of being associated with non-replicating work rose high enough that listening to Gelman became adaptive. The second condition was created by the crisis itself, not by Gelman. He was positioned to benefit from it because he had been producing the right words before the incentive to listen existed. Outside the credentialed system, the second condition was never created. Huberman’s audience has no institutional cost associated with non-replicating health claims. The incentive to listen to Gelman’s methodological criticism of supplement industry science does not exist in that ecosystem and nothing Gelman does creates it.
The social psychology community twisted his criticism of Cuddy and Wansink into evidence of bullying and methodological terrorism. This served their incentive to protect coalition members from outside status attacks. The methodological reform community twisted his work into a cudgel for dismissing entire fields of research. This served their incentive to signal epistemological superiority over the unreformed mainstream. The open science advocates twisted his Bayesian workflow into a brand marker for the epistemically serious coalition rather than a practical tool for improving inference. This served their incentive to differentiate themselves from frequentist researchers regardless of whether the specific application warranted the distinction. In each case the twisting was not malicious. It was incentive-driven, exactly as the essay predicts. The words meant something specific to Gelman. They meant whatever the receiving community had an incentive for them to mean.
* David Pinsof’s essay on opinions argues that opinions are preferences combined with positive judgments about people who share them and negative judgments about people who do not, deployed in a secret war over social norms. The opinion game conceals that the player is trying to make their preferences look superior while doing exactly that.
Gelman’s methodological commitments are opinions in Pinsof’s precise technical sense, and his blog is an opinion game conducted under the cover of epistemic standards.
The prior analysis established that his methodological commitments are genuine, that the technical arguments for Bayesian workflow and rigorous uncertainty quantification are sound, that his criticism of junk social science is accurate. None of that is in question. But the opinions essay adds the precise account of what else is happening.
His preference for Bayesian methods over frequentist ones, for multilevel modeling over simpler approaches, for posterior predictive checks over p-values, comes bundled with positive judgments about the people who share those preferences. They are epistemically serious. They are honest about uncertainty. They do not chase significance. They understand what inference requires. And it comes bundled with negative judgments about people who do not share those preferences. They are sloppy. They are motivated by status. They mistake noise for signal. They have not thought carefully enough about what their methods can and cannot establish. These judgments are embedded in every blog post, every paper review, every public criticism. The covert insult structure Pinsof identifies is present in every assessment Gelman makes of underpowered research: the researcher whose study fails his methodological standards is not merely wrong. She is the wrong kind of scientist.
The opinion game framing adds something the prior analysis did not quite reach. Gelman’s blog has been enormously successful at shifting norms inside the methodologically serious quantitative community. Pre-registration, open data, effect size reporting, honest uncertainty quantification: these have moved from Gelman’s preferences to widespread norms in large parts of social science. That is a won opinion game in Pinsof’s precise sense. The preferences have been successfully externalized as objective requirements of good science. The positive judgments about researchers who pre-register and share data have become naturalized as the obvious marks of serious science. The negative judgments about researchers who do not have become naturalized as the obvious marks of sloppiness or worse. The norm has been established. The opinion game has been won in the domain where Gelman had enough status and the crisis provided enough leverage to win it.
* David Pinsof wrote that our fear of mortality is bullshit. Gelman’s career narrative has a mortality structure.
The narrative Gelman’s project implicitly tells is this. Science is a fragile and precious enterprise. The replication crisis threatened to destroy public trust in the capacity of empirical research to produce reliable knowledge about the world. Gelman’s methodological criticism, his pre-registration advocacy, his public exposure of junk science, these were defenses of something larger and more permanent than any individual career: the integrity of the scientific record, the capacity of human inquiry to correct itself, the long-run project of understanding how the world works.
* David Pinsof’s status is weird essay traces the collapse and re-emergence of status games in antithetical forms. When common knowledge sets in that a game is a game, the game collapses. Counter-elites invent an anti-status game taking the opposite form. The anti-status game is just another status game, now played in the dark again. And the person most positioned to benefit from a collapsing status game is the one who had been doing the opposite of what the collapsing game rewarded, not through strategic calculation but through conviction that the opposite was right.
The pre-replication-crisis social psychology status game rewarded impressive significant results from small samples, emotionally resonant findings, policy-relevant claims, TED-ready narratives. Gelman had been doing the opposite: reporting uncertainty honestly, questioning significant results, demanding replication, insisting on methodological rigor at the cost of narrative clarity. He was losing the status game that the pre-crisis field was playing. The crisis collapsed that game. The players who had accumulated status through the old signals suddenly looked conniving and entitled. Gelman’s accumulated record of doing the opposite became the most valuable credential available in the post-crisis landscape. He had been playing the anti-status game before the collapse made it the winning game.
The status is weird essay adds the precise account of what came next that the prior analysis did not quite reach. The anti-status game Gelman won is now itself a status game, and it is now being played in the dark by a community of methodologically rigorous researchers who have internalized pre-registration, open data, and honest uncertainty reporting as sacred values rather than as strategic positions. That community is now defending its status game with the same sincere appeals to sacred values that the power posing community used to defend theirs. The sacred value is methodological rigor. The conviction is genuine. The game is fragile: it requires the players to lack awareness that they are playing a status game organized around methodological rigor rather than seeking truth through rigorous methods.
Pinsof writes that the quest to improve the world through thinking hard and seeing through bullshit is itself a sacred value, a covert status game that he and his readers are playing because they think they stand a good chance of winning it. And maybe that is not such a bad thing.
Gelman’s project is the most explicit available version of the quest to improve the world through thinking hard and seeing through bullshit. His blog is called Statistical Modeling, Causal Inference, and Social Science. Its operational content is the sustained exposure of bullshit in social science through rigorous statistical thinking. He is the most prominent available player of the anti-bullshit status game applied to quantitative research. And the status is weird essay predicts that this game, like every status game, is fragile, requires players to lack full awareness that they are playing it, and will eventually collapse and re-emerge in antithetical form.
The signals of the current methodological rigor status game are already showing the instability Pinsof’s essay predicts. Pre-registration has become widespread enough that strategic pre-registration has emerged as its own form of gaming. Open data has become widespread enough that performative open data, sharing data in formats that technically comply while practically preventing replication, has developed. The methodological rigor vocabulary has become widespread enough that fluency in it functions as a coalition signal rather than purely as evidence of methodological commitment. The game is sliding from honest signal toward the status game instability the essay describes. The anti-status game that will replace it, perhaps organized around post-methodological pluralism or radical uncertainty or some other formulation that differentiates from the current rigor orthodoxy, is not yet visible but is structurally predicted.
Gelman cannot see this from inside the game he is winning. That is the precise prediction the status is weird essay generates. He can see the junk science status game clearly because he is outside it and losing it. He cannot see the methodological rigor status game clearly because he is inside it and winning it. The Darwin essay established that his idealism about scientific integrity makes the self-serving functions of his project invisible to him. The status is weird essay adds the specific mechanism: the game requires the player to lack awareness that it is a game, and the player who is winning a game is the last one to see it as a game rather than as the pursuit of something worth defending.
* David Pinsof argues that morality is not cooperative but is a coordination device for dominating rivals, that the mean part lives underground and the nice part lives on the surface, and that moral vocabulary presents exclusions as serving the greater good while functioning to get more stuff for the coalition at rivals’ expense.
Gelman runs a moral coordination system that identifies rivals, coordinates negative judgments against them, and presents those judgments as disinterested methodological assessment in service of scientific integrity. The rivals are identified through the morality essay’s precise mechanism. Wansink, Cuddy, the power posing literature, the ego depletion literature, the implicit association test research: these are not targets of methodological criticism in the evolutionary functional sense. They are the focal points around which Gelman’s coalition coordinates its shared exclusions. The morality essay’s mathematical model applies directly: any two players can form an alliance to impose a cost on the third so long as the benefit to the alliance members exceeds zero and the members can coordinate on whom to target. Gelman’s blog provides exactly this coordination function. Each post identifying a methodological failure is a coordination signal: this is not our kind of science, and the people who share our standards will recognize this judgment as legitimate. The signal does not need to be wrong about the methodology to function as moral coordination. It needs to identify a target around which the coalition can organize its shared exclusions.
The essay’s claim that tarring rivals as evil is rewarding because it reassures the coalition that other moralists will have their backs applies to Gelman’s project with uncomfortable precision. The blog’s comment section, the social media responses to his criticisms, the conference discussions that reference his methodological points: all of these are coalition members reassuring each other that the exclusion is legitimate, that the target deserves the negative judgment, that the people who share the standards of rigor will maintain the alliance when it matters. The reassurance function is more important than the persuasion function, which is why the blog reaches people who already agree with its diagnoses.
Every coalition believes its own moral vocabulary serves the greater good while its rivals’ moral vocabulary serves narrow self-interest. Gelman believes his criticism of bad methodology serves scientific integrity rather than coalition maintenance. The morality essay establishes that this belief is the nice surface of a mean underground operation.
* David Pinsof’s imagination essay argues failures of imagination are red flags for self-delusion. Wherever there is a gap in your imagination your mind fills it with bullshit. The most important failures are not failures to imagine concrete things but failures to imagine abstract ones: incentive structures, the possibility that your ideology is ad hoc rationalization, the possibility that your moral convictions are driving immoral behavior, the possibility that you have wasted significant effort on a framework that is not what it claims to be.
Gelman’s project is sustained by the failure to imagine your own most fundamental commitments as the ad hoc coalition technology the analysis says they are, which means the prior analysis can be precise and accurate and Gelman can remain unchanged because the imagination required to fully internalize the analysis as self-description is structurally unavailable to anyone operating inside the sacred value the analysis examines.
Gelman’s first convenient belief is that the replication crisis is primarily a methodological problem. Gelman’s career-defining contribution is the diagnosis: the garden of forking paths, Type S and Type M errors, the failure of null hypothesis significance testing, the inadequacy of underpowered studies. In each case, the framing is that researchers are making analytical mistakes that better methods can correct. They do not pre-register. They explore their data and then report the results as though the analysis was planned. They use statistical tools that cannot do what they claim to do. The solution is better workflow: Bayesian reasoning, posterior predictive checks, multilevel models, honest uncertainty intervals, pre-registration, open data.
Turner would recognize this as the most convenient possible belief for a methodologist. If the crisis is methodological, then the person with the best methods is the most important person in the room. If the crisis is structural, if the publication system, the incentive structure of academic careers, the funding model, and the prestige economy systematically reward unreliable findings regardless of anyone’s statistical sophistication, then better methods are necessary but not sufficient. They change what individual researchers do. They do not change what the system selects for.
Gelman knows this. He has written about incentive structures, about publication bias, about the sociology of science. But his primary output, the textbooks, the blog posts, the lectures, the collaborations, is organized around the premise that teaching people better statistics will make science better. Turner predicts this emphasis because it is the emphasis that preserves Gelman’s function. A statistician who says “the problem is methods, and I have the methods” has a mission. A statistician who says “the problem is incentives, and better methods cannot fix incentive structures” has an observation that undermines his own centrality.
The second convenient belief is that the researchers who produce unreliable work are making honest mistakes rather than rational responses to institutional incentives. Gelman’s characteristic tone when discussing bad research is pedagogical rather than accusatory. He treats p-hacking, the garden of forking paths, and publication bias as cognitive errors that education can correct. The researcher did not understand what his p-value meant. He did not realize his study was underpowered. He did not appreciate how much flexibility in analysis inflates false positive rates.
Pinsof’s misunderstanding essay applies here with unusual directness. The misunderstanding diagnosis flatters the diagnostician. If researchers produce bad work because they do not understand statistics, then the person who teaches better statistics is performing an essential service. If researchers produce bad work because the system rewards it, because a significant finding in a prestigious journal advances a career regardless of whether the finding replicates, then statistical education is treating a symptom while the disease operates at a level the educator cannot reach.
Turner sharpens this. The convenient belief is that cognition is the bottleneck. Researchers lack understanding. They need to be taught. The inconvenient belief is that researchers understand perfectly well what they are doing and do it because it works. The garden of forking paths is not a cognitive error. It is a rational strategy for maximizing publication in a system that rewards significance. Researchers who explore their data until they find something publishable are not confused about statistics. They are responding to the incentive structure of their profession. Better statistical education does not change that structure. It just makes the exploration more sophisticated.
Gelman has come close to saying this explicitly. His blog posts sometimes acknowledge the incentive problem. But his professional output remains organized around the pedagogical model: teach better methods and the science improves. Turner predicts that the pedagogical model will remain dominant in his work because it is the model that keeps him central. The incentive-structure model would make him one voice among many in a conversation about institutional design, a conversation where statisticians have no special authority.
The third convenient belief is that the Bayesian framework represents a genuine epistemological improvement rather than a coalition marker. Gelman’s advocacy for Bayesian methods is sincere and substantively grounded. Bayesian reasoning handles uncertainty more honestly than frequentist methods. It encourages explicit prior specification. It produces posterior distributions rather than binary significance decisions. These are real advantages.
But the advocacy also functions as a coalition signal. In the statistics world, Bayesian versus frequentist is not just a methodological debate. It is a tribal affiliation. Gelman’s Bayesian identity marks him as a member of a specific intellectual community with specific journals, specific conferences, specific hiring networks, and specific assumptions about what good work looks like. Turner would note that the sincerity of his Bayesian commitment and its coalition function are not in tension. They reinforce each other. He believes in Bayesian methods because they are better. He also benefits from believing in Bayesian methods because that belief positions him within a coalition that rewards his particular skills. The convenient belief is not that Bayesian methods work. It is that the choice between Bayesian and frequentist frameworks is primarily an epistemological decision rather than a coalition-membership decision. Turner predicts that Gelman will experience the choice as purely epistemological because experiencing it as coalitional would complicate his self-understanding as a scholar motivated by truth rather than affiliation.
The fourth convenient belief is that the blog is a democratizing force rather than a status-consolidation mechanism. Gelman’s blog has been enormously influential. It functions as an alternative peer review system, a teaching platform, a public forum for methodological debate, and a mechanism for holding researchers accountable. He writes about it as though it is a form of open intellectual exchange, which it is.
Turner would add that it is also a mechanism for accumulating and maintaining status. The blog gives Gelman a platform that no peer-reviewed journal can match for speed, reach, and influence. It allows him to set the terms of debate, to decide which papers deserve scrutiny and which do not, to determine which errors are worthy of public attention and which can be quietly corrected. That curatorial power is enormous. It makes him the de facto editor of a shadow journal with no peer review, no editorial board, and no accountability beyond his own judgment. The fact that his judgment is usually good does not change the structural point. The blog centralizes authority in a way that contradicts the democratizing narrative Gelman tells about it. Turner predicts he will hold the democratizing narrative because it is the narrative that makes the power feel legitimate.
The fifth convenient belief, and the one most directly parallel to the Orthodox figures in this series, is that honest methodology can coexist with the institutional structure of the modern research university without requiring fundamental changes to that structure. Gelman works within Columbia. He trains PhD students who need to publish to get jobs. He collaborates with researchers who operate within the existing prestige economy. He does not advocate for abolishing the journal system, restructuring tenure incentives, or dismantling the funding model that produces the replication crisis. His reforms are additive: pre-registration, open data, better workflow, Stan, Bayesian reasoning. They improve practice within the existing structure. They do not challenge the structure.
Turner would recognize this as the same convenient belief that Adlerstein holds about Modern Orthodoxy and that Shapiro holds about Orthodox historical scholarship. The system can be improved from within. Better methods, better history, better translation, all working within existing institutional constraints, will produce better outcomes. The inconvenient belief, which none of these figures holds, is that the system produces its outcomes because of its structure, and that working within the structure reproduces the structure regardless of how good the methods, the history, or the translation are.
The beliefs that would be inconvenient for Gelman to hold are identifiable.
That the replication crisis is not fixable by better methods because it is produced by an incentive structure that his methods cannot change. That conclusion would demote him from essential reformer to diagnostic commentator.
That his own research selections, which papers to critique and which to leave alone, which errors to publicize and which to overlook, are shaped by coalition dynamics in the same way that the selections of the researchers he critiques are shaped by theirs. He tends to target social psychology and behavioral science more heavily than economics or biostatistics. Turner would predict that the selection tracks coalition boundaries rather than error rates.
That the Bayesian-frequentist debate functions as tribal affiliation as much as epistemological disagreement. That conclusion would subject his own methodological commitments to the same sociological analysis he applies to the commitments of researchers who use methods he considers inferior.
That the blog’s influence represents a concentration of informal power that is structurally similar to the editorial gatekeeping he criticizes in the journal system. That conclusion would require him to see his own platform as part of the problem rather than the solution.
That his students, trained in his methods and steeped in his standards, go on to operate within the same incentive structure that produced the crisis, and that their superior training may make them better at navigating the garden of forking paths rather than eliminating it. That conclusion would suggest that his educational project reproduces a more sophisticated version of the disease rather than curing it.
The comparison with the other figures in the series reveals the structural parallel.
Gelman is to academic social science what Shapiro is to Orthodox Judaism. Both diagnose ignorance as the primary problem. Both position themselves as the essential corrective. Both produce work that is genuinely valuable and genuinely illuminating. Both stop short of the structural explanation that would reveal the problem as self-reproducing regardless of how much anyone knows. Shapiro documents the archive. Gelman teaches the methods. Neither can say that the system’s behavior is driven by something their expertise cannot fix, because saying it would undermine the premise of their career.
Gelman is to the reform wing of statistics what Adlerstein is to centrist Orthodoxy. Both occupy a position that requires them to believe the system can be improved through better practice from within. Both hold that belief sincerely. Both benefit from holding it. Both would lose their function if they concluded that the system produces its outcomes structurally rather than through correctable error.
Gelman differs from Etshalom in one critical respect. Etshalom refuses to resolve. He presents the evidence and lets the student carry the tension. Gelman resolves. He presents the methodological failure and prescribes the Bayesian workflow that fixes it. That resolution is what makes him more institutionally successful and less pedagogically destabilizing. He gives his audience what it wants: a diagnosis plus a cure. Etshalom gives his audience the diagnosis without the cure. The system prefers Gelman’s model because it produces dependent practitioners who need the cure. Etshalom’s model produces independent readers who carry their own uncertainty.
Turner’s convenient beliefs framework reveals that Gelman’s extraordinary contribution to the integrity of empirical science is also, and simultaneously, a career organized around a diagnosis that makes the diagnostician indispensable. The garden of forking paths is real. The Type S and Type M errors are real. The replication crisis is real. The methods he teaches are genuinely better than the methods they replace. None of that changes the structural fact that his beliefs about what causes the crisis and what can fix it are also the beliefs that sustain his position within the system the crisis inhabits.
He is the most honest figure in this comparison group. His self-criticism is more visible than any of the others. His willingness to acknowledge uncertainty, to update his positions, and to admit error is rare and genuine. Turner’s framework does not deny any of that. It simply notes that even the most honest intellectual holds the beliefs his position makes convenient, and that the honesty is itself the most effective form of the concealment. The person whose convenient beliefs look least convenient, because they involve criticizing powerful researchers, demanding higher standards, and subjecting his own field to relentless scrutiny, is the person whose convenient beliefs are most invisible. Gelman’s integrity is real. His convenient beliefs are also real. Turner’s insight is that the two do not contradict each other. They are the same phenomenon seen from different angles. The integrity is what makes the convenience work.
On April 6, 2026, I emailed Andrew Gelman:
Andrew:
Do you think this is fair? “Columbia is the most volatile campus, and its no-go zones shift faster than anywhere else because two powerful coalitions are in open conflict. High-intensity activist networks and an administration under significant federal pressure collide constantly, and the boundaries move with each news cycle. Pro-Israel speech in activist spaces requires heavy hedging. Pro-Palestinian speech that crosses new procedural lines installed under federal scrutiny carries its own risks. The deeper prohibition is being legible to neither coalition: floating above the conflict reads as moral evasion, and the system punishes that more reliably than it punishes taking either side. Columbia students face both peer friction and administrative friction simultaneously, which produces the lowest free speech scores in the country. The tacit rule is that you must pick a side or perform neutrality with extreme care.”
“Harvard has the most aesthetically rigorous no-go zone of any campus. The deepest taboo there is not ideological. It is visible striving paired with dissent. The effortless perfection norm means that you can, in principle, critique DEI frameworks, question affirmative action, or express skepticism about progressive orthodoxy. What you cannot do is sound like the argument matters to you more than your standing does. A Harvard student can say something mildly heterodox if it sounds like a casual aside over dinner in the dining hall. The same content delivered with intensity, citations, or moral urgency gets socially downgraded. You must never reveal that your beliefs cost you anything. The enforcement runs through comp culture, house tutors, and peer networks that operate with surgical precision. Anonymous apps like Fizz and Sidechat have accelerated this: an uncalibrated comment in a seminar can circulate before the student leaves the building. Roughly a third of seniors historically report being unable to express their genuine views on campus, and self-censorship among moderate and apolitical students has risen faster than among conservatives since 2021. The tacit rule at Harvard is that you may question outcomes but never the legitimacy of the social grammar that produced you.”
Gelman responded: “Hi, I have no idea what is meant by a no-go zone. I have not seen any prohibitions myself, but I have read about some things such as Barnard not letting students put signs on their dorm room walls.”
I replied: “By ‘tacit no-go zones’ I mean the unwritten rules about what you can and cannot say publicly without social penalty. It’s about self-censorship driven by fear of consequence, not formal policy violations like the Barnard sign rule.’
Andrew replied: “In that case, I know of no such unwritten rules.”
Gelman’s response is itself a Turner datum. He is not being evasive. He is reporting accurately from inside his formation. The tacit rules of Columbia’s academic culture are invisible to him for the same reason a native speaker cannot hear his own accent. The rules constitute his sense of normal, and normal does not feel like a rule.
But notice what his response reveals structurally. Gelman is a senior tenured professor at an elite research university, politically aligned with the dominant coalition of his institution, methodologically credentialed in ways that place him above the fray of most departmental conflicts, and sufficiently prominent that his professional standing insulates him from the low-level social penalties that enforce tacit norms on graduate students, junior faculty, and undergraduates. The tacit rules Stephen Turner describes are not experienced uniformly across an institution. They are experienced most acutely by people whose position is precarious, whose coalition membership is uncertain, or whose views place them near the boundary of what the dominant formation tolerates. Gelman sits nowhere near that boundary. Of course he has not felt the rules. He is the kind of person the rules were designed to protect, not constrain.
This is a general feature of tacit enforcement that Turner’s framework predicts and that Gelman’s response confirms. The people least likely to perceive tacit speech norms are the people most fully formed within the dominant coalition of their institution. Their intuitions about what is sayable were calibrated by the same formation that produces the norms, so nothing they naturally want to say triggers the enforcement apparatus. They move through the institution without friction not because there is no friction but because they are precisely shaped to avoid it. A fish optimally adapted to its water does not experience resistance.
The more interesting question is what Gelman’s methodological commitments do with this. He has spent years arguing that researchers systematically fail to notice the degrees of freedom they exercise because those choices feel like obvious good practice rather than contingent decisions. That is a near-perfect description of tacit knowledge operating in a statistical context. He has the conceptual apparatus to understand what Turner is describing. He applies it to p-values and model specification. He does not apply it to the social grammar of his own institution, which suggests the tacit formation runs deeper than the methodological critique reaches. The critique is internal to the research program. The formation that produced the research program remains unexamined.
His answer about Barnard’s sign policy is telling in a different way. He reaches for a formal, visible, written rule because that is what “prohibition” means to someone whose framework for thinking about constraint is procedural and explicit. Turner’s point is precisely that the most consequential constraints never reach that level of formalization. They operate through the trained sense of what is appropriate, what sounds serious, what kind of claim invites ridicule rather than engagement. Those constraints leave no documentary trail, which is why asking a senior Columbia professor whether he has seen prohibitions produces a sincere negative answer that tells you almost nothing about whether prohibitions exist.
What Gelman’s response does not and cannot tell you is what a junior scholar at Columbia with heterodox views on, say, the epidemiology of ideology, the political valence of replication failures, or the coalition structure of academic hiring would experience if he pursued those questions publicly. The answer to that question is not available from Gelman’s vantage point. It is available from Turner’s framework, which predicts that the enforcement would be real, largely invisible to those doing the enforcing, and experienced by the target as a diffuse series of professional disappointments rather than a single identifiable act of suppression.
Gelman is in this respect the ideal Turner subject: sophisticated enough to have partially articulated the tacit at one level of his practice, and formed deeply enough within his institutional coalition that the tacit at the level above remains entirely invisible to him. His sincerity is not in question. His formation is.
Stephen Turner’s point about tacit knowledge is that its defining feature is invisibility to the practitioner. It is not hidden in the sense of being concealed. It is hidden in the sense of being constitutive. The fish does not notice the water. Someone fully formed within a tacit framework does not experience it as a framework at all. They experience it as simply how things are done, what good work looks like, which questions are interesting, which results are publishable, which objections need answering and which can be safely ignored. The tacit is precisely what you cannot see from inside it.
Andrew Gelman is one of the most methodologically self-conscious scholars working in quantitative social science. His blog, his papers on the replication crisis, his sustained critique of underpowered studies, his work on the garden of forking paths: all of this represents a genuine and serious effort to make explicit what practicing researchers usually leave implicit. He has done more than almost anyone in his field to surface the tacit assumptions baked into standard statistical practice. That makes him an interesting case for Turner rather than an easy one. The question is not whether Gelman is naive about methodology. He is not. The question is whether his very sophistication about one layer of the tacit blinds him to other layers operating beneath it.
The speech codes question is a good entry point. Columbia’s political science and statistics departments do not have written speech codes. They have something more powerful: a shared sense of what counts as a serious question, what kind of answer demonstrates competence, what topics a scholar of Gelman’s standing engages with publicly and what topics he leaves alone without quite deciding to leave them alone. These are not rules. They are trained perceptions about professional seriousness, and they operate through the same tacit formation that Turner describes everywhere else. A scholar who asked whether elite university admissions systematically disadvantage certain groups on grounds other than the officially stated ones, or who pursued the epidemiology of ideology with the same rigor Gelman applies to election forecasting, would not be told he had violated a speech code. He would simply find that the work was not taken seriously, that the best journals showed no interest, that colleagues changed the subject, that the grants did not materialize. The enforcement is tacit all the way down, which is why asking about explicit speech codes produces a sincere denial.
Gelman’s methodological work itself carries tacit commitments he does not fully examine. His critique of social psychology’s replication failures is rigorous and largely correct, but it operates within an assumption that the interesting questions social psychology asks are basically the right questions, that the dependent variables are well chosen, that the theoretical frameworks are at least approximately tracking real phenomena. The critique is internal. It asks whether the methods are adequate to the questions. It does not ask whether the questions are shaped by the same coalition pressures and convenient belief structures that Turner identifies elsewhere. When Gelman criticizes a study on power posing or ego depletion, he is doing methodological hygiene within a shared research program. He is not asking why that research program rather than another, who benefits from its conclusions, what it would cost the field to pursue different questions.
His political commitments provide another angle. Gelman is open about his left-liberal politics and has written thoughtfully about how those commitments relate to his scholarly work. But openness about explicit commitments is not the same as visibility into tacit ones. Turner would note that the tacit formation of a Columbia statistics professor includes a set of prior judgments about which empirical findings are plausible, which theoretical accounts of human behavior are respectable, which scholars are worth engaging and which are outside the conversation, that are not reducible to explicit political positions. These priors shape which anomalies get treated as interesting puzzles and which get treated as methodological artifacts. They shape which critiques of mainstream social science get amplified on his blog and which get a polite dismissal or no mention at all. None of that is dishonest. It is exactly what Turner predicts: the tacit framework doing its work below the threshold of explicit reasoning.
There is a particular irony in Gelman’s garden of forking paths concept applied back to Gelman. He argues that researchers make dozens of small decisions, about which covariates to include, which subgroups to examine, which models to report, that collectively produce results far more favorable to the researcher’s hypothesis than the nominal statistics suggest. Each individual decision seems reasonable. The cumulative effect is systematic bias. Turner would say the same structure applies to the prior decisions Gelman makes about which research questions matter, which scholarly traditions deserve serious engagement, which findings count as anomalies requiring explanation. Each decision seems like an exercise of professional judgment. The cumulative effect is a research career that stays remarkably well within the tacit boundaries of its formation, even as it critiques those boundaries at the methodological level.
His engagement with political science and election forecasting illustrates this. Gelman has genuine cross-disciplinary range. But the disciplines he ranges across share a common formation: quantitative, empiricist, committed to identifying causal effects through clever research design, skeptical of grand theory, embedded in the same set of elite research universities. The tacit speech codes of that formation are not Columbia-specific. They are field-wide, and they are largely invisible to someone formed entirely within them because they look like the natural shape of serious inquiry rather than one possible shape among others.
Turner’s deepest point is that tacit knowledge cannot be made fully explicit without destroying it. The attempt to articulate all the rules governing a practice either fails, because some rules resist articulation, or produces a codified substitute that loses the flexibility of the original. Gelman’s methodological transparency project is genuinely valuable and genuinely limited in exactly this way. He has made explicit a great deal that was previously implicit in statistical practice. But the framework within which he conducts that project, the sense of what a good question looks like, what a satisfying explanation does, what scholarly seriousness requires, remains tacit and therefore largely invisible to him. When asked about speech codes, he reports accurately that he is not aware of explicit rules. Turner would say that is precisely the point. The codes that matter most are the ones nobody needs to write down.
‘The Blogosphere and Its Enemies: The Case of Oophorectomy’
Gelman accepts most of Turner’s theoretical apparatus. He endorsed Turner’s convenient beliefs framework by extending it. He has used his blog for two decades to do exactly what Turner praises the hysterectomy bloggers for doing: aggregating testimony that contradicts expert consensus, exposing the cognitive biases of credentialed professionals, analyzing the interests behind the findings.
Turner’s essay identifies the specific function the blogosphere performs at its best. It aggregates personal testimony the specialist channels filter out. It performs folk sociology of knowledge on expert claims, noticing when claims track professional interest rather than evidence. It provides specific factual material that qualifies expert assertions. It challenges experts to justify themselves in Habermasian terms. It operates as a corrective to expert error.
Gelman’s blog, Statistical Modeling, Causal Inference, and Social Science, has performed this function across quantitative social science since 2004. Cornell’s food psychologist Brian Wansink had institutional position, peer-reviewed publications, a pipeline to NPR and Ted talks, lucrative speaking engagements, and the backing of the American food industry. He had everything the hysterectomy establishment had. Gelman and his commenters aggregated the evidence that Wansink’s findings did not survive scrutiny. They documented the statistical impossibilities in his papers. They compiled the contradictions across his published work. They produced the corrective the specialist channels of academic peer review had failed to produce. Wansink eventually lost his Cornell position and had papers retracted in numbers that exceeded what his institutional defenders could absorb.
The Wansink correction happened through exactly the mechanisms Turner’s essay describes. Blog aggregation. Commenter testimony. Folk sociology of interests. The challenge to justify. The refusal of specialist channels to perform their stated function. The eventual forcing of correction through accumulated evidence that could no longer be absorbed by the existing framework. Gelman occupies the HERS Foundation’s structural position in this and many parallel cases. Amy Cuddy. Marc Hauser. The entire power-posing literature. The Himmicanes paper. Dozens of smaller cases that never reached the news but got corrected inside the field through blog-driven accumulation of evidence that peer review had missed.
Gelman’s explicit engagement with Turner extended the convenient beliefs framework. His exchange confirming that going beyond convenient beliefs is hard and mostly unprofitable shows him operating the framework consciously rather than being caught by it unawares. This matters because most figures to whom Turner’s framework applies cannot see the framework operating on them. Gelman can.
His examined convenient beliefs include the standard ones the replication crisis exposed. Researchers in social psychology found it convenient to believe that p-values below .05 indicated real effects. They found it convenient to believe that small sample sizes were adequate if the effects they detected were real. They found it convenient to believe that the choices they made about which covariates to include and which subgroups to examine did not materially affect their results. Gelman’s garden of forking paths paper documented how these convenient beliefs produced systematic overconfidence across entire literatures. His work on Type S and Type M errors specified how convenient beliefs about statistical significance produced inflated effect sizes and wrong signs.
The examined convenient beliefs extend to his own field. Bayesian statisticians found it convenient to believe that their methods were immune to the problems plaguing frequentist inference. Gelman has repeatedly challenged this. Pre-registration enthusiasts found it convenient to believe that pre-registration would solve the replication crisis. Gelman has repeatedly noted that pre-registration addresses only a subset of the problems. The open science movement found it convenient to believe that transparency would produce reform. Gelman has documented the ways transparency without other changes leaves the core incentive structures intact.
The beliefs Gelman treats as candidates for methodological critique are clustered in specific areas. Social psychology. Behavioral economics. Certain kinds of biology. Nutrition science. The beliefs he does not treat as candidates for methodological critique are clustered in other specific areas. Economics in its more mainstream registers. Political science in its liberal registers. The methodological choices that produced findings he endorses politically. The statistical practices of the reform coalition he leads.
The email correspondence on Columbia’s speech codes provides the clean case. Gelman answered the question about tacit speech codes at Columbia by saying there were none he knew. The answer was sincere. He could not see what the question was asking because he was inside what the question was asking about.
Turner’s blogosphere essay contains the specific framework that makes this visible. The hysterectomy patients reporting loss of libido could see something the gynecologists could not see. The gynecologists were not lying. They were operating inside a framework that filtered the testimony they were receiving. They heard the reports. They categorized the reports as confounded with other factors. They produced sincere responses that explained why the reports did not indicate what the reporting women thought they indicated. The sincerity was load-bearing. The framework could not have operated if the gynecologists had been conscious of the filtering.
The same structure operates in Gelman’s response on Columbia. The speech codes he was asked about are not written rules. They are the tacit formation that tells a scholar of his standing which questions are serious, which answers demonstrate competence, which topics he engages and which he leaves alone. They are what Turner’s teacher Polanyi called tacit knowledge. They are what Turner’s broader work treats as the conditions of professional judgment that cannot be made fully explicit because making them explicit would change what they are.
Gelman’s methodological sophistication extends to tacit knowledge in statistical practice. He has written about how statistical judgment cannot be reduced to explicit rules. He has endorsed Turner’s work on tacit knowledge in epistemology. He knows the framework exists. He cannot apply it to himself in the specific location where it matters most for him, which is the location of his own professional formation.
This is the precise finding Turner’s blogosphere essay enables. The experts who are blind to their own convenient beliefs are not stupid. They are sophisticated. They know about convenient beliefs in the abstract. They cannot see their own because their own are constitutive of the framework they are seeing with. The fish cannot notice the water. The gynecologist cannot notice that his standards filter out the patient testimony. The Columbia professor cannot notice that his sense of what a serious question looks like is shaped by the tacit formation of elite quantitative social science at a specific moment in its history.
Turner’s essay emphasizes that the hysterectomy case involved evidence that accumulated for decades before the experts acknowledged it. Women had been reporting the same symptoms since the 1970s. The HERS Foundation had been collecting data since 1991. The meta-analyses that confirmed the reports came in 2008 and 2010. The gap between the evidence becoming available and the expert framework absorbing it was approximately twenty years.
The question Gelman’s case puts to this framework is what evidence currently accumulating outside the reform coalition’s framework might force analogous revision in the future. The coalition’s framework cannot generate this question from inside. That is Turner’s point about frameworks. They cannot identify what they are filtering out because the filtering is what makes them frameworks.
Some candidates are visible from outside the coalition. The coalition’s assumption that academic knowledge production is broadly reliable when properly reformed may be filtering out evidence that the institutional conditions of academic knowledge production have become inimical to reliable knowledge regardless of methodological reform. The coalition’s focus on researcher degrees of freedom and publication incentives may be missing structural features of the contemporary university that no methodological reform can address. The coalition’s faith in peer review, even as it documents peer review’s failures, may be convenient in ways the coalition cannot examine.
Gelman has approached these questions occasionally. His remark about academic science reforming at the moment when it stopped being the pipeline to public influence is a version of them. But he approaches them as asides rather than as the central focus. The central focus remains methodological. The reform coalition organizes around the assumption that methodology is the primary problem. The assumption may be correct. It may also be the convenient belief that holds the coalition together.
Watergate as Democratic Ritual & Cultural Trauma
Gelman operates as a credentialed expert watching carrier groups from inside the academic apparatus. The inside position changes what the framework can see about him. Gelman is the methodologist whose specific disciplinary tools both equip him to see trauma construction at the technical level and prevent him from seeing it at the civic-religious level where the construction operates.
“Watergate as Democratic Ritual” by Jeffrey Alexander argues that Watergate became a constitutional crisis through specific symbolic work rather than through the objective properties of the break-in. The transformation required consensus building, generalization from political goals to sacred values, invocation of social control institutions, mobilization of elite countercenters, and ritual processes that produced purification. The Senate hearings created liminal space where ordinary political rules were suspended and the nation entered sacred time. The facts did not change. The symbolic context in which the facts were situated changed. The change was the work.
“Toward a Theory of Cultural Trauma” by Jeffrey Alexander argues that cultural traumas are constructed representations produced by carrier groups making claims about fundamental injury to collective identity. The construction requires specific representational work: specifying the nature of the pain, identifying the victims, establishing the relation of victims to the wider audience, and attributing responsibility. The work happens through institutional arenas including religious, aesthetic, legal, scientific, and mass media sectors. The naturalistic fallacy consists in treating constructed traumas as natural responses to events.
The Scientific Arena as Trauma Construction Site
Alexander identifies the scientific arena as one of the institutional sites where trauma construction occurs. The specification is easy to miss because the scientific arena presents itself as the site where traumas are diagnosed rather than constructed. Alexander’s point is that this presentation is itself part of the construction work. Scientific classification of certain events as traumatic (or certain claims as traumatic for the scientific community itself) performs the same representational work the other arenas perform. The work looks different because it uses different tools (peer review, citation, statistical analysis, methodological critique) but produces the same kind of outputs (sacred values defended, polluted claims expelled, collective identity reorganized).
Gelman operates inside this arena and has spent his career producing methodological classifications that function as trauma-construction work. The replication crisis is the specific example. A set of events (failed replications, questionable research practices, specific high-profile frauds) got converted into a sacred civic crisis for academic social science. The conversion happened through specific carrier group work. The carrier group included Gelman, Brian Nosek, Uri Simonsohn, Simine Vazire, and a specific network of methodologists who made the replication crisis visible as a crisis. Before their work, the failed replications were isolated anomalies within specific subfields. After their work, the failed replications became the symptoms of a general pathology that threatened the legitimacy of social science as a whole.
Alexander’s framework makes the specific moves visible. The carrier group specified the nature of the pain (research claims the literature had accepted turned out to be unreliable). It identified the victims (the broader scientific community, whose collective knowledge had been corrupted; the public, whose trust in science had been betrayed; the careful researchers, whose work had been crowded out by less careful but more publication-friendly research). It established the relation of victims to the wider audience (everyone who relies on scientific knowledge is affected, which is everyone). It attributed responsibility (specific researchers who cut corners, specific incentive structures that rewarded corner-cutting, specific journals that published without adequate scrutiny, specific fields that tolerated the practices).
The representational work succeeded. Alexander’s framework identifies successful trauma construction by specific markers. New sacred values get established. New social control mechanisms get invoked. New institutional arrangements emerge to manage the construction’s aftermath. In the replication crisis case, pre-registration became a sacred value, open data became a sacred value, honest uncertainty quantification became a sacred value. Social control mechanisms included pre-registered reports at specific journals, open science badges, and the reputational costs of being associated with non-replicable work. Institutional arrangements included new funding agency requirements, new journal policies, and new methods curricula in graduate programs.
Alexander’s framework does not treat this construction as wrong or manipulative. It treats it as what carrier groups do when they work effectively. The replication crisis construction produced real changes in how social science operates. Whether the changes are adequate to the underlying problems is a separate question. The construction itself did what successful trauma construction does: it reorganized the symbolic classification system of a specific field around new sacred values and polluted counter-values.
Gelman is a carrier group member who has participated in producing a specific, successful trauma construction inside his field. The participation is not cynical. Alexander’s framework emphasizes that carrier group work operates through sincere commitment. Gelman believes the replication crisis is real, the reforms are necessary, the new sacred values are correct. The belief is part of what makes him effective as a carrier group member. A carrier group whose members held their positions cynically could not produce the sincere collective representations that trauma construction requires.
Gelman’s material interests align with the construction’s success. His reputation as the leading methodological voice in empirical social science depends on the continued relevance of the methodological critique. If the replication crisis were declared substantially resolved, his continued influence would diminish. He does not consciously calculate this. He experiences the ongoing methodological problems as genuine and his continued attention to them as warranted. Alexander’s framework predicts this alignment between perception and interest as the standard feature of carrier group positions.
His institutional position at Columbia gives him the platform the carrier group work requires. His statistics credentials give the work its authority within the scientific arena. His blog extends the carrier group’s reach beyond formal academic channels. The combination of institutional position, credentials, and extended reach produces the specific hybrid authority that allows him to operate simultaneously as credentialed expert and as public voice. Neither position alone would produce the carrier group effectiveness the combination produces.
His discursive talents fit the work the carrier group does. His technical ability to diagnose specific methodological problems, his rhetorical ability to explain the diagnoses to readers outside his specific subfield, his persistence across years of sustained attention to specific targets, his willingness to name names, all contribute to the specific carrier group work of producing and maintaining the replication crisis construction.
The replication crisis construction has generated its own sacred values. Pre-registration as a marker of scientific virtue. Open data as a marker of honest researcher. Power analysis as a requirement rather than an afterthought. The distinction between exploratory and confirmatory research. The preference for large samples over clever designs. The specific statistical moves (hierarchical modeling, Bayesian workflow, multiverse analysis) that Gelman has advocated. Each of these has become sacred in the specific sense Alexander’s framework identifies: it functions as a marker of membership in the reform coalition, violation produces pollution, defense produces sacralization.
The sacred values have produced the same mixed effects Alexander identifies in Watergate’s aftermath. The values have constrained some practices that needed constraining. They have also produced new ritual compliance that substitutes for substantive change. Pre-registration has prevented some garden-of-forking-paths manipulation. It has also produced pre-registered reports that hit their specified analyses without the research being meaningfully better. Open data has allowed some re-analyses that exposed errors. It has also produced the appearance of transparency without the substance of it. Power analysis has prevented some underpowered studies. It has also produced pro forma calculations that satisfy reviewer demands without changing how researchers actually work.
Gelman has noticed these developments. He has written about them. He has pushed back against the reduction of the reform to compliance rituals. But Alexander’s framework identifies a limit to what he can do about the pattern. The carrier group produces the sacred values. The sacred values, once established, take on lives of their own. They get implemented by people who did not participate in the carrier group’s original work and who relate to the values through institutional pressure. The drift from substantive reform to ritual compliance is what happens to successful trauma constructions over time. Gelman’s continued critique of the drift is one of the things successful carrier group members do in the mature phase of a construction, but the critique cannot prevent the drift because the drift is constitutive of how constructions become institutionalized.
Alexander’s description of the Senate Watergate hearings as liminal space applies specifically to Gelman’s blog. The hearings created phenomenological world separate from ordinary political life. The framing devices (hushed voices, pomp and ceremony, television’s specific conventions) produced sacred time and sacred space where statements carried weight they would not carry in mundane politics.
Gelman’s blog creates analogous liminal space within academic statistics. The specific framing devices include the blog’s consistent voice, the regular rhythm of posts, the established commenter community that has developed its own conventions, the specific vocabulary (the garden of forking paths, Type S and Type M errors, the Wansink case as reference point) that marks insiders and outsiders, the cross-references across years of posts that establish continuity with sacred past instances of methodological reform. The blog is not academic statistics in the ordinary register. It is academic statistics in the register where methodological crisis is ongoing and sacred values require continuous defense.
Inside the register, statements carry weight they would not carry in ordinary academic discourse. A specific paper gets classified as an instance of known pathology, and the classification sticks in ways that ordinary methodological critique would not produce. A specific researcher becomes associated with the polluted practices, and the association persists across subsequent interactions. A specific field becomes marked as particularly troubled, and the marking shapes how new work from that field gets received. The blog’s liminal character gives it the authority to produce these markings.
The markings are sometimes accurate and sometimes not. Alexander’s framework does not settle the question of accuracy. What it settles is the specific character of the authority the blog wields. The authority is not primarily the authority of individual methodological arguments. It is the authority of the liminal space in which the arguments are made, which gives them the weight of sacred pronouncements rather than of ordinary criticism. This matters because the effects of the markings often exceed what the underlying arguments alone would justify. A blog post that might, considered purely on its argumentative merits, warrant modest revision of opinion about a specific paper instead produces strong classification effects that reorganize how the paper gets read in its field.
Gelman is aware of this effect. He has written occasionally about his own influence and has tried to calibrate his pronouncements accordingly. The awareness is partial. Alexander’s framework predicts this. The liminal space cannot be fully analyzed from inside. The participant in liminal ritual experiences the ritual’s power as the power of truth rather than as the power of the ritual. The blog’s author experiences his posts as accurate descriptions of the methodological situation rather than as sacred pronouncements in a liminal space that gives them more weight than their arguments alone would warrant. Both descriptions can be true simultaneously. The liminal space amplifies the arguments without necessarily distorting them. But the amplification is itself an effect the framework identifies as structural rather than as a product of individual argumentative quality.
The replication crisis construction operates at the level of methodology. It assumes that the scientific enterprise is broadly sound and that specific methodological reforms can correct specific problems. The construction is compatible with continued academic authority, continued institutional funding, continued public deference to scientific findings when the findings meet the new methodological standards. The construction preserves the professional structure of academic science even as it reforms specific practices within that structure.
A more thoroughgoing construction would attack the professional structure. It would argue that the incentive structures producing the specific pathologies cannot be reformed without dismantling the academic system that generates them. It would argue that peer review cannot be fixed, only replaced. It would argue that the universities have become so captured by non-epistemic interests that methodological reform inside them is theater. It would argue that the public should withdraw deference to credentialed expertise in specific domains where the credentials have ceased to track the underlying reliability of the claims.
Gelman’s position at Columbia, his credentials in academic statistics, his income from the academic structure, his authority derived from his standing within the profession, all depend on the professional structure continuing to exist. A carrier group for the more thoroughgoing construction would have to come from outside the profession or from figures willing to sacrifice their professional positions to produce the construction.
The Columbia situation of 2024-2026 provides a specific test of where Gelman’s carrier group capacity operates and where it does not. Columbia experienced a civic crisis in Alexander’s sense. The crisis involved specific events (October 7, the subsequent protests, the university’s responses, the federal government’s interventions, the administration’s successive failures). The events could have been processed in various ways. They became processed as specific kinds of crisis through specific representational work performed by specific carrier groups.
Several carrier groups operated simultaneously. A pro-Palestinian carrier group constructed the events as instances of apartheid, genocide, and colonial violence requiring specific institutional responses. A pro-Israel carrier group constructed the events as instances of antisemitism, terrorism support, and civilizational threat requiring different institutional responses. A free-speech carrier group constructed the events as instances of institutional failure to protect discourse. A conservative political carrier group constructed the events as instances of left-wing capture of elite institutions requiring federal intervention. Each carrier group produced its own version of what the events meant, who the victims were, and who bore responsibility.
Gelman did not operate as carrier group member for any of these constructions. He wrote analytically about the administration’s failures, the institutional problems that produced them, the broader patterns of university governance that made the specific responses predictable. His framework (executives operating without legislative or judicial constraints default to consequentialist cover-up) was analytical rather than constructive. It did not produce a sacred civic claim about what Columbia’s situation meant for collective identity. It produced a structural diagnosis that several carrier groups could use but that none of them could fully absorb.
Gelman did what the academic arena often does in civic crises: provide analytical frameworks that the carrier groups can deploy but that do not themselves constitute carrier group work. The frameworks have specific value. They can improve the quality of the carrier group constructions by providing better descriptions of the underlying mechanisms. They can expose the limits of specific constructions by identifying what the constructions miss. They can limit the excesses of constructions by providing disciplinary checks on their claims.
The frameworks cannot substitute for carrier group work. They cannot produce the civic-religious effects that carrier groups produce. They cannot reorganize symbolic classification systems. They cannot establish new sacred values or mobilize social control mechanisms. Gelman’s Columbia analysis is valuable as analysis. It does not operate as construction. The distinction matters because his readers sometimes treat the analytical work as if it were constructive work, which it is not. The analytical work informs the constructive work without performing it.
Gelman was born into a cross that his career has extended in specific directions. His secular Jewish formation, his MIT undergraduate training in the Carey-Chomsky cognitive science environment, his Harvard statistics PhD under Donald Rubin, his Columbia joint appointment in statistics and political science, and his family life all represent crossings that the heterosis frame reads with some precision.
The core intellectual crossing is between statistics and political science. These are populations with distinct co-adapted complexes. Statistics developed inside mathematics and the quantitative sciences, with a particular set of problems about inference from samples, a technical vocabulary, and conventions for what counts as a defensible claim. Political science developed inside the humanities-adjacent social sciences, with attention to history, institutions, and the interpretive problems that formal models cannot fully capture. Most scholars belong to one parent population or the other. Gelman sustains a working cross that produces hybrid vigor in specific domains. The Red State, Blue State work could not have emerged from either parent alone. Pure statisticians would not have seen the puzzle about how Republican voting at the individual level related to Democratic voting at the state level. Pure political scientists would not have had the technical apparatus to resolve the puzzle with the care Gelman brought. The hybrid produced traits neither parent could generate alone.
Bayesian Data Analysis shows hybrid vigor of a different kind. The textbook crosses the technical statistical tradition with pedagogical clarity aimed at applied researchers in fields that do not have Gelman’s training. The crossing works because the co-adapted complexes of each parent population complement rather than disrupt each other. Technical rigor survives the pedagogical translation. The translation gives the rigor a larger host population than purely technical statistics could reach. The book has become canonical partly because the hybrid produced something the parent populations separately could not.
The blog shows the crossings getting harder to sustain. Statistical Modeling, Causal Inference, and Social Science crosses academic statistics with public commentary, methodological policing with political argument, long-form analysis with humor and ridicule, and professional expertise with amateur participation in comment threads. Some of the crossings produce hybrid vigor. The blog caught methodological problems that the journals would not catch. The comment-thread format created a scholarly community with traits no pure journal or pure social media platform can reproduce. Other crossings produce outbreeding depression. The political commentary does not have the co-adapted complexes that the statistical work requires. The ridicule mode does not combine cleanly with the methodological seriousness mode. The Vermeule fascism episode showed the outbreeding depression pattern. Gelman was borrowing rhetorical moves from coalition political combat and trying to fit them onto methodological critique. The hybrid did not work. The political combat mode disrupted the statistical rigor mode, and the statistical rigor mode could not discipline the political combat mode into a stable form.
Gelman belongs to the secular Jewish academic liberal coalition that has dominated elite American intellectual life since mid-century. That coalition is itself a hybrid. It crosses Jewish intellectual traditions of argument and textual engagement, Enlightenment rationalism, American pragmatism, progressive political commitments, and the technical-scientific cultures of the research universities. The hybrid produced extraordinary vigor for about seventy years. The Lipset-Bell-Glazer-Trilling generation and their successors made the coalition productive in ways its component traditions separately could not have managed. The coalition has shown signs of outbreeding depression in recent decades. Progressive political commitments have begun to disrupt the co-adapted complexes of empirical rigor and open inquiry that the earlier generation maintained. The technical-scientific cultures have developed internal requirements around ideological conformity that the Jewish intellectual traditions of argumentative combat would have rejected. The coalition is thinner than it was. Gelman represents a relatively stable version of the earlier hybrid. The broader coalition cannot reliably produce his type anymore.
The statistical reform movement shows hybrid vigor of a particular kind. It crossed the replication crisis concerns of experimental psychology with the Bayesian statistics tradition, Uri Simonsohn’s forensic methods, open science advocacy, and the blogging ecosystem. The hybrid worked. The movement produced a discipline-wide correction to methodologically weak work that neither the journals nor the professional associations could produce on their own. Gelman’s role in the reform required exactly the crossings his career represents. A pure statistician could not have sustained the discipline-wide conversation. A pure political scientist would not have had the technical apparatus. A pure blogger could not have had the professional credibility. The hybrid was fit for the problem the environment presented.
The reform movement has now shown its outbreeding depression edge. The crossing of technical methodological critique with coalition political combat produced an uneven target distribution. Targets that threatened progressive preferences, power pose research, social priming, certain welfare-policy applications of behavioral economics, got sustained scrutiny. Targets that served progressive preferences received less. The co-adapted complex that would have disciplined consistent targeting requires a neutral methodological stance that the coalition commitments disrupt. The movement is still producing useful work. It is not producing the consistent discipline-wide correction the hybrid promised in its early phase.
The secular Jewish American academic coalition developed in specific civic conditions: dense urban Jewish communities, thick family structures, high investment in children’s education across generations, sustained engagement with Talmudic argumentative traditions that selected for specific cognitive traits. Those conditions produced a slow life history strategy calibrated to environments of relative safety and high returns to long investment. The Gelman household his Columbia career represents is a successor to that environment. The conditions have thinned. The slow life history strategy depends on communal and institutional substrates that Putnam’s data show erosion in. Gelman’s career shows what the substrate still permits. His graduate students and younger coauthors face weaker substrates. The frame predicts that the hybrid Gelman represents will be harder to reproduce in the next generation.
Right-coded critiques of Gelman treat his selective targeting as pure coalition maintenance. Left-coded defenses treat his methodological rigor as pure neutral science. Both miss the crossing. Gelman is a genuine hybrid. The methodological rigor is real and does discipline parts of his work even when the coalition commitments pull the other way. The coalition commitments are real and do filter which targets he selects for sustained attention even when the rigor is available to scrutinize him. The hybrid produces something neither parent population would produce alone. Better than pure coalition combat. Worse than fully neutral methodology. The instability is characteristic of the hybrid itself. It is not a failure of either parent tradition.
Gelman’s candor operates within coalition limits. He did not apply the same structural analysis to his fascism rhetoric against Vermeule. The hybrid vigor that produced the concession about Columbia’s leadership did not extend to concession about his own coalition’s rhetorical practices. The outbreeding depression edge of his coalition position prevented the crossing from reaching its full extension. The hybrid is real. Its limits are real. Both operate together, and the frame lets us see how the same man produces both.
The elite academic statistical establishment functions as a superorganism with specific homeostatic mechanisms. Peer review, citation patterns, grant approval, and hiring committees all maintain the set point of what counts as legitimate work. Gelman operates inside this superorganism while occasionally criticizing its dysfunctions. His hybrid position between pure statistics and public methodological commentary gives him leverage the superorganism’s homeostatic mechanisms cannot fully absorb. He pays costs for this leverage. He is not the president of the American Statistical Association. He is not the chair of Columbia’s statistics department. The superorganism tolerates him because his technical work is too strong to dismiss and absorbs his criticism by treating it as Gelman being Gelman rather than as identification of problems requiring structural response. The hybrid position buys him influence and costs him institutional power.
Gelman’s graduate students and his younger coauthors face a harder crossing than he did. The parent populations have diverged further. Statistics has become more technical and less connected to substantive social science. Political science has developed its own methodological subcultures that do not always engage statistical reform. The coalition that produced Gelman’s formation has thinned. The civic substrate Putnam measures as supporting the slow life history strategies his career depends on has thinned further. The hybrid he represents required specific conditions that are no longer reliably available. His successors will have to make crossings in a harder environment, or they will revert toward one of the parent populations without the other’s complement. The frame predicts the latter is more likely than the former. The Gelman type will become rarer. The coalition and the discipline will be worse for the loss without being able to name what they have lost, because the thing they will have lost is a hybrid whose traits neither parent population can produce alone.
Gelman’s professional operation runs almost entirely within buffered assumptions. Reality is accessible through measurement. Measurement is subject to known sources of error that can be characterized. Uncertainty is quantifiable. Knowledge advances through systematic accumulation of evidence properly analyzed. The enchanted cosmos that animated pre-modern understanding does not register in Gelman’s professional operation. Bayesian reasoning is the paradigm of buffered cognition. Prior beliefs update in response to observed data according to well-defined rules. The self doing the updating is insulated from external spiritual or metaphysical forces. The self updates because the self chose to adopt the Bayesian framework and continues to apply it across problems.
Gelman comes from a secular Jewish family background. His sister is a developmental psychologist. His uncle was a cartoonist. The family intellectual culture was secular rather than religious. This is the buffered American Jewish formation that Myers’s work engages at scholarly distance. Gelman grew up within it.
Gelman’s buffered orientation makes him valuable as methodological critic. He can see what researchers who share his buffered orientation are doing wrong because he shares the orientation and has developed sharper tools for detecting buffered cognitive failures. The forking paths diagnosis is buffered analysis of buffered cognitive operations. The replication crisis is a crisis within buffered social science. Gelman is qualified to diagnose it because he operates natively within the framework where the crisis occurs. A porous observer would not be able to see the crisis as a crisis because porous cognition does not value the buffered standards the crisis is failing to meet.
His buffered orientation produces a limited reach. He can speak to other buffered researchers about buffered research practices. He cannot speak to porous populations about porous experiences because the framework within which he operates brackets exactly what porous populations experience as central to their lives. His political science research on Red State Blue State voting patterns is excellent empirical work about buffered topics (income, geography, education, demographics) that predict voting. It cannot address what religious voters feel when they vote their religious commitments because those feelings are porous experiences that empirical correlates reduce and flatten.
The Gelman-Myers contrast is illuminating. David N. Myers operates on porous Jewish materials as buffered scholar. The buffering is achieved through sustained professional formation within buffered academic Jewish studies. The porous phenomenology of Jewish liturgical and ritual life is what Myers’s buffered scholarship does not access in its full porous register. Myers knows this about his own condition and attempts to recover porous dimensions through buffered means (study, liturgy, ritual participation within modified buffered forms).
Gelman operates on buffered materials as buffered scholar. His Jewish background is already buffered by the time he encounters it. His statistics training is buffered from its origin. His political science engagement is buffered through its empirical method. The productive aspects of his work follow from this. The limited aspects also follow from this. Gelman’s work is excellent within its scope. The scope is buffered. The scope does not extend to porous questions because porous questions do not arise within the framework his work operates within.
Most of the populations studied by political science are not buffered. The Red State voters Gelman has studied are substantially porous in their political orientation. The religious right votes from porous commitments that empirical analysis can measure but cannot understand from within. The working-class Democratic voters who defected to Trump voted from porous experiences of dignity, community, nation, and meaning that buffered empirical analysis misses when it reduces them to demographic correlates. Gelman’s excellent empirical work captures the statistical structure of the phenomena it studies. Gelman’s buffered orientation limits what his work can say about the phenomenological content of what it studies.
The social science replication crisis is a crisis of buffered cognitive operations within buffered institutional spaces. Researchers run analyses until they find patterns that meet buffered publication standards. The patterns do not replicate because the patterns were artifacts of the buffered analytical process. Gelman’s proposed reforms (pre-registration, better uncertainty quantification, Bayesian workflow) respond to buffered failures. The reforms improve buffered research quality. The reforms do not address whether buffered research is the right mode for studying phenomena that substantially involve porous content. If much of social science is attempting to measure porous phenomena through buffered methods, the buffered methods will continue to miss what matters about the phenomena even after the methodological reforms Gelman advocates.
The Set
Andrew Gelman (b. 1955) holds court at Columbia and at his blog Statistical Modeling, Causal Inference, and Social Science. The blog runs daily and the comment section serves as the salon. The regulars include Phil Price, Kaiser Fung, Bob Carpenter, Daniel Lakeland, Martha Smith, and a long roster of pseudonymous methodologists. Gelman’s circle extends to his collaborators (Jennifer Hill, Aleks Jakulin, Aki Vehtari, Michael Betancourt) and to his teacher Donald Rubin (b. 1943), whose causal inference framework supplies much of the technical grammar.
The replication-crisis crowd overlaps almost entirely with Gelman’s. John Ioannidis (b. 1965), Brian Nosek, Uri Simonsohn, Joseph Simmons, Leif Nelson (the three together being Data Colada), Simine Vazire, Daniel Lakens, E.J. Wagenmakers, Anna Dreber, Felix Schönbrodt, James Heathers, Nick Brown, Tim van der Zee, Elisabeth Bik (b. 1966), and Sander Greenland all share the same air. So do philosophers of statistics like Deborah Mayo and statistician-bloggers like Cosma Shalizi, Larry Wasserman, Frank Harrell, Stephen Senn, Christian Robert, Judea Pearl (b. 1936) at the edges, and Kosuke Imai. Nate Silver (b. 1978) sits at the journalistic perimeter. The economists Andrew Eggers, Macartan Humphreys, and Gary King overlap on the causal-inference side.
The set values calibration, transparency, replicability, technical skill at probability and inference, willingness to admit error, slowness of claim, smallness of effect. They want the published record to track the world. They want the standard error to mean what it says. They want preregistration, open data, open code, and they want famous findings checked rather than repeated.
Their hero system rewards the careful auditor. The man who catches the error in the famous paper is the saint. The man who runs the failed replication is the saint. The whistleblower who finds the fraud (Nick Brown on Barbara Fredrickson’s positivity ratio, Brown and Heathers on Brian Wansink, Bik on image duplication, Simonsohn on Lawrence Sanna) is the saint. The patient Bayesian modeler who builds the small honest model that beats the big flashy one is the saint. Gelman canonized this in his “garden of forking paths” essays and his repeated insistence that single studies establish almost nothing.
The anti-saints are easy to name. Brian Wansink, Amy Cuddy, Diederik Stapel, Marc Hauser, Daryl Bem, Satoshi Kanazawa, and the more cautious but still-suspect Roy Baumeister, John Bargh, and Susan Fiske. Famous studies on power posing, ego depletion, social priming, embodied cognition, and most of behavioral economics from the 2000s sit in the dock. Malcolm Gladwell sits in the dock. TED talks sit in the dock. The New York Times Tuesday science section sits in the dock, though Gelman often appears in it.
Status games run on errors found and frauds caught. The currency is the takedown post, the failed replication, the citation of your blog comment by a journalist, the moment Many Labs posts another null result, the moment Data Colada finds another data anomaly. Secondary currencies include the Stan model people fit, the technical paper on multilevel modeling, the textbook that teaches the next generation (Gelman and Hill’s Data Analysis Using Regression and Multilevel/Hierarchical Models, Gelman and others’ Bayesian Data Analysis, Gelman, Hill, and Vehtari’s Regression and Other Stories), the methodological appendix that solves the puzzle nobody else solved. A man’s reputation rises with each prominent finding he kills.
A subtler status game runs on symmetry. The set prides itself on policing its own side. Gelman criticizes Democrats. Vazire criticizes psychology she likes. Heathers and Brown go after any author. The claim to symmetry is part of the brand. Whether the symmetry holds in practice is a separate question, and outside critics have pressed it.
The normative claims, stated and assumed, are these. Researchers owe the public honest reporting. P-hacking is a sin. Preregistration is a duty. Failed replications belong in the literature. Effect sizes shrink under scrutiny and that should be expected. Power calculations belong before the study, not after. The burden of proof rises with the surprise of the claim. The press should slow down. Tenure committees should reward rigor over splash. Critics deserve responses, not silence. Reviewers should ask for data.
The essentialist claims, stated and assumed, are also clear. There is a real distinction between good and bad statistical inference, and trained eyes can tell. There is a real world the data point at, and probability gives us partial access to it. Some findings are real and others are not, and the difference is discoverable. The replication crisis describes a real pattern in psychology, medicine, and parts of economics, not a moral panic. Bayesian reasoning, done with care, gets closer to truth than the null-hypothesis ritual it replaces, though Frequentists in the set (Wasserman, Senn, Mayo at times) push back hard. Method has substance, not posture. Numeracy is a virtue and an aptitude, and it can be ranked.
A few features sit underneath all of this. The set is overwhelmingly male, heavily Jewish at the top, heavily academic, light on humanities, suspicious of qualitative work, fond of programming, fond of New York and Cambridge and Stanford and Amsterdam and Boston and the Bay Area. The men in it write fast, post often, joke dryly, treat blog comments as a serious form, and treat ad hominem as bad manners while practicing it freely against the named anti-saints. They share a low opinion of TED, Davos, Aspen, the Edge Foundation, and the celebrity-academic circuit, even as some of them brush against that circuit. They like Tukey, mid-period Fisher, Box, Cox, Rubin. They tolerate but do not love Pearl. They distrust most economists. They distrust most psychologists. They trust each other to find each other’s mistakes and to say so.
The binding glue of the set is a shared confidence that they will be told when they are wrong, by men they respect, and that being told is honor rather than insult. That is the air they breathe, and that air is rarer than they think.
The Voice
Andrew Gelman writes and talks the way a man thinks out loud at a whiteboard. The voice runs flat, plain, and fast. He distrusts grand phrasing and reaches for the small concrete example instead of the sweeping claim.
Start with his diction. He keeps the words short and the syntax loose. He says “this is wrong” and “I don’t buy it” and “I could be making a mistake here.” He avoids the inflated register of the academy. When a technical term shows up, he usually pauses to deflate it, to say what it means in kitchen English. He prefers the everyday noun to the Latinate one. He would rather say a study “doesn’t replicate” than dress the same point in the language of “robustness failure.” This gives the prose a deceptively casual feel. The casualness hides a lot of control.
His sentences favor the active voice and the present tense, which is part of why he reads as direct even when the argument runs long. He builds by accretion. He states a claim, qualifies it, doubles back, adds an aside in parentheses, then quotes someone at length and answers them line by line. The blog, Statistical Modeling, Causal Inference, and Social Science, runs on this rhythm. A post often opens mid-thought, as if you walked in on a conversation already going. He uses numbered lists, postscripts, updates stapled to the bottom, and a running cast of recurring examples. The form is digressive on purpose. He trusts the reader to follow a tangent and come back.
The rhetoric is deflationary. His signature move is to take a flashy published finding and shrink it. The beauty-and-sex-ratio paper, the ovulation-and-voting paper, the “himmicanes” paper, the Wansink food-lab work. He names the study, names the author, lays out the numbers, and shows why the effect cannot be what the headline says. He coins terms to carry these arguments and the terms stick. The garden of forking paths. The statistical significance filter. Type M and type S errors, the errors of magnitude and sign. The piranha problem, his argument that a world cannot hold dozens of large independent effects all pushing on the same outcome. The kangaroo, his line about weighing a feather on a bathroom scale while the scale sits on the back of a jumping kangaroo, his way of saying a noisy instrument cannot catch a tiny effect. These phrases do real work. They let him make the same structural point about many different papers without sounding like he repeats himself.
He fights, but the affect stays cool. He calls bad work bad. He uses the word “fraud” when he means fraud and “incompetence” when he means that, and he draws the line between them. The tone never gets hot. He reports the flaw the way a man reports the weather. This flatness is part of the rhetoric. Heat would invite a fight about manners. The flat delivery keeps the fight on the numbers. He pairs this with steady self-correction. He admits his own past errors, he posts retractions of his own claims, he says “commenters caught me on this.” The humility is real and it also functions. It earns him the standing to be hard on others.
Now the spoken man. In talks and on podcasts he sounds like the blog read aloud, only more so. The speech runs fast and associative. He starts an example, interrupts himself to start a second one, circles, and lands the point a minute later than a tidier speaker would. He digresses into baseball, into a 1970s study he half-remembers, into something a student said yesterday. He is self-deprecating in a low-key way, quick to say “I don’t know” and “I might be wrong about this.” He does not perform authority. He underplays it. His slides, when he uses them, tend toward rough graphs and screenshots rather than polished decks, which fits a man who argues that the picture should carry the argument and the decoration should get out of the way.
What he is not, in speech or print, is a stylist of the polished sentence. He is no aphorist working a line until it gleams. The power comes from accumulation and from nerve, not from cadence. A reader who wants a clean essay with a single arc will find him shaggy. The shagginess is the cost of the method. He thinks in public, shows the false starts, and lets you watch him change his mind.
The through-line, written and spoken, is a war on false certainty. He hates the move from a noisy result to a confident story. Almost every tic serves that war. The deflating diction, the small examples, the coined terms, the flat tone, the self-correction, the willingness to name names. The manner is the argument. He sounds uncertain about himself and certain about the math, and he wants his audience to learn the same reflex.
