I’ll begin by pointing out that the statement made in the title of this post isn’t merely the crazed opinion of a lunatic blogger. It is in fact the subject of at least one peer-reviewed paper in the scientific literature: Why Most Published Research Findings Are False. Of course, if that published research finding turns out to be false, well … all bets are off!
I’m going to share some personal anecdotes of my experience in the area of peer review. But first, I need to give some background. I cringe whenever I see peer review and scientific consensus being used to support arguments in the media and on blogs. I cringe when I see “consensus” being used this way because scientific truth isn’t based on a vote. I cringe when I see “peer review” being used this way because I don’t think most people understand what “peer review” means. It doesn’t mean, for example, that something has been found to be correct or has been in any way proven true. It merely means that other scientists in the field, usually three, have looked at the work and found no problems with the methodology.
Several years back, I stumbled across an article titled The Truth Wears Off, by a guy named Jonah Lehrer at The New Yorker. That article addresses principally what is known as the “decline effect,” or the tendency for seemingly significant scientific results to decline or “wear off” over time. An excerpt from that article follows:
In the late nineteen-nineties, John Crabbe, a neuroscientist at the Oregon Health and Science University, conducted an experiment that showed how unknowable chance events can skew tests of replicability. He performed a series of experiments on mouse behavior in three different science labs: in Albany, New York; Edmonton, Alberta; and Portland, Oregon. Before he conducted the experiments, he tried to standardize every variable he could think of. The same strains of mice were used in each lab, shipped on the same day from the same supplier. The animals were raised in the same kind of enclosure, with the same brand of sawdust bedding. They had been exposed to the same amount of incandescent light, were living with the same number of littermates, and were fed the exact same type of chow pellets. When the mice were handled, it was with the same kind of surgical glove, and when they were tested it was on the same equipment, at the same time in the morning.
The premise of this test of replicability, of course, is that each of the labs should have generated the same pattern of results. “If any set of experiments should have passed the test, it should have been ours,” Crabbe says. “But that’s not the way it turned out.” In one experiment, Crabbe injected a particular strain of mouse with cocaine. In Portland the mice given the drug moved, on average, six hundred centimetres more than they normally did; in Albany they moved seven hundred and one additional centimetres. But in the Edmonton lab they moved more than five thousand additional centimetres. Similar deviations were observed in a test of anxiety. Furthermore, these inconsistencies didn’t follow any detectable pattern. In Portland one strain of mouse proved most anxious, while in Albany another strain won that distinction.
The disturbing implication of the Crabbe study is that a lot of extraordinary scientific data are nothing but noise. The hyperactivity of those coked-up Edmonton mice wasn’t an interesting new fact—it was a meaningless outlier, a by-product of invisible variables we don’t understand. The problem, of course, is that such dramatic findings are also the most likely to get published in prestigious journals, since the data are both statistically significant and entirely unexpected. Grants get written, follow-up studies are conducted. The end result is a scientific accident that can take years to unravel.
At least in part, the decline effect is due to simple regression to the mean, or the tendency for data outliers to be neutralized when one looks at a larger population. For example, flipping a coin ten times might yield ten heads, or an average of 100% heads — not what one expects. Flip the same coin a million times, however, and the result should be very close to 50:50 heads:tails. But regression to the mean alone isn’t enough fully account for the decline effect. John Ioannidis, the author of Why Most Published Research Findings Are False, believes that “significance chasing” is the main problem. This is the practice of trying to force fit marginal data into the 95% significance category by tweaking it, interpreting it differently, etc. Numerous other factors, such as poor signal-to-noise ratios, publication bias (selectively publishing positive results), and simply unknown variables all play a role.
In reading The Truth Wears Off, one is reminded of Richard Feynman’s famous Cargo Cult speech. In a commencement address at Caltech in 1974, Feynman addressed the problem in one of his colorful anecdotes:
We have learned a lot from experience about how to handle some of the ways we fool ourselves. One example: Millikan measured the charge on an electron by an experiment with falling oil drops, and got an answer which we now know not to be quite right. It’s a little bit off because he had the incorrect value for the viscosity of air. It’s interesting to look at the history of measurements of the charge of an electron, after Millikan. If you plot them as a function of time, you find that one is a little bit bigger than Millikan’s, and the next one’s a little bit bigger than that, and the next one’s a little bit bigger than that, until finally they settle down to a number which is higher.
Why didn’t they discover the new number was higher right away? It’s a thing that scientists are ashamed of–this history–because it’s apparent that people did things like this: When they got a number that was too high above Millikan’s, they thought something must be wrong–and they would look for and find a reason why something might be wrong. When they got a number close to Millikan’s value they didn’t look so hard. And so they eliminated the numbers that were too far off, and did other things like that. We’ve learned those tricks nowadays, and now we don’t have that kind of a disease.”
I’ve linked to these various sources on the subject to demonstrate that precedent exists for what I am talking about here. This can be a touchy subject these days as peer review and scientific consensus are most often invoked in arguments on global warming. A very vocal portion of the population is prepared to shut any conversation down that dares question the current consensus view. Some overwhelming percentage of the peer-reviewed literature, we are told, supports that consensus. The “debate,” they say, is over. So, let’s talk a little about the gold standard of peer-reviewed scientific literature that is often touted in the media and on the internet, and what it really means in practice.
The peer-reviewed scientific literature is a little like the proverbial “needle in a haystack,” where the actual “good” papers are the needles. I would like to think I’ve contributed to the needles a little over the years. I know, however, that I’ve contributed to the hay.
On Microwaves and Clay …
My very first project as a graduate student in chemistry was something that didn’t pan out. It involved developing a “one-pot” tandem reaction sequence by combining one reaction known as a Claisen rearrangement and another reaction known as a radical cyclization. To pull this off, I constructed molecules that contained, on one end of each molecule, something known as an orthoester group, and on the other end of the molecules, a radical site which was usually a bromine atom or phenylselenide group. The idea was, I would take the orthoester, combine it with another molecule for the rearrangement, and then add a few things to do the radical part. As it turned out, the Claisen rearrangements I was trying to do required a great deal of heat — enough heat to make the radical sites on my orthoesters unstable. So, when trying to do the Claisen rearrangement part of the tandem reaction sequence, my orthoesters just cyclized on themselves, producing useless products like butyrolactone. It was a little like turning gold into lead, or diamonds into black soot … not something one can really publish.
That initial project of mine was seen as a quick way to get a publication or two, in part because of that word “tandem.” Tandem reactions were, at the time, fairly trendy, and so if you announced in the title of your paper that it was a new tandem reaction, you had a good shot of getting it published. This was in the early 1990’s, and there were other trendy buzzwords at the time that would virtually guarantee publication. One was “microwave,” and another was “clay catalysis.” So what do you suppose I did with my failed project? 😛
Remember how I said that the Claisen rearrangements required a great deal of heat? I started sticking my reactions in a microwave oven. And instead of using propionic acid to catalyze the Claisen rearrangements, I started using clay (zeolites and related). They still didn’t work, so I just switched to simpler orthoesters that didn’t have a radical group attached and did very simple Claisen rearrangements. These reactions were in no way novel, except that I was using a microwave to heat them rather than an oil bath or mantle, and I was using dirt to catalyze them rather than propionic acid. I produced a series of products in this manner by reacting the simple orthoesters with a variety of allylic alcohols. Then, to demonstrate utility, I took the isolated final products from these reactions and ran them through a series of very simple reactions to produce a small library of unsaturated bicyclic lactones. “Library,” by the way, was another trendy buzzword at the time. Score!
So I milked a couple of papers out of that otherwise failed project, with titles laden in trendy buzzwords that were sure to get them published … “Clay-Catalyzed Microwave Thermolysis … [blah, blah, blah].” But were they valuable papers? No. They were trash, and overall the chemistry sucked. To do reactions in a microwave oven, you generally place them in sealed tubes rather than flasks with reflux condensers, and so you can only react a little tiny bit at a time. Even doing the small amounts though, the sealed tubes built up substantial pressure and blew from time to time. I destroyed three microwave ovens by the time I was done. And while clay might work as a suitable catalyst, the damn reactions were so horrifically hot that I doubt a catalyst was needed at all. Did I do a control run without a catalyst? Probably, but I’m not sure it would have mattered if I didn’t.
Something else that needs pointing out here is that, while there is such a thing as photochemistry, microwave reactions aren’t that. Photochemistry is done using ultraviolet light which is on the other end of the electromagnetic spectrum. Ultraviolet light is capable of breaking chemical bonds, giving you a suntan, and causing cancer. Microwaves, which are on the low energy end of the spectrum, do nothing more than jiggle atoms around a little. They produce simple heat, and that’s it. They don’t break bonds, give you a suntan, or cause cancer. Amusingly, these reactions sometimes aren’t even done using solvents that absorb microwave radiation. In such cases, the chemist needs to pack the sealed tube in vermiculite or some similar substance that absorbs microwaves, and the heat is just transferred through the glass walls of the vessel as would be the case using conventional heating methods. So the difference between using microwaves and using an oil bath or heating mantle for my reactions — it’s a little like opting to use a pear-shaped flask rather than a round-bottomed flask. It’s really of little consequence.
How does shit like that get published? Easy. The reviewers were, by and large, guys who were doing the same shitty chemistry in microwave ovens. They knew damn well their papers weren’t any better than mine, and you know … what comes around goes around! Of course, reviewers are anonymous unless they opt to tell you who they are, but you can often guess, based on their comments and any concerns they may raise.
It’s like that. Trendy shit gets published, regardless if it’s good or not. Supercritical solvents (e.g. water, carbon dioxide) came and went as a trend. “Green chemistry” has been a trend for well over a decade now — so long in fact that it has spun off a number of sub-trends, and some of those are quite wacky, such as ionic liquids. Doing organic chemistry in ionic liquids is considered “green,” but I have no idea why. They are often toxic as Hell, and switching from a conventional solvent to an ionic liquid usually seems to make a reaction about a thousand times less green! On top of that, they aren’t even liquids. When you use an “ionic liquid” as a solvent, what you are really doing is boiling your compound in a fucking God awful molten solid. These aren’t the kinds of things that you’re gonna be running through pipes in a plant. But what the Hell … papers with “Ionic Liquid” in the titles get published, and that’s what counts, right? Ka-Ching!
Some of my papers after that initial failed project were probably reasonable. There are a few in particular that I think were very good and contributed a little to the “needles” of that haystack. But quite a lot of others, I think, were “me too” papers — papers that don’t truly contribute anything useful, but are often reviewed by other people doing the same lame chemistry. Maybe you take the basic idea of an existing enantioselective catalyst, and you modify it a little, adding some novel but irrelevant component. It’s like taking the iPhone and making a “me too” clone: The original was innovative, the “me too” phone with the ripped off swipe screen and such … not so much.
A Magical Drop of Water …
At least the reactions I published mostly worked. One of the toughest parts of being a graduate student in organic chemistry was all the time wasted trying to reproduce the chemistry in the peer-reviewed literature. There are a lot of really great reactions in the literature of course — Suzuki reactions, Sharpless epoxidations, Swern and Dess-Martin oxidations, Claisen rearrangements, the Diels-Alder cycloaddition, etc. But these are the “needles” of the haystack. An alarming percentage of published reactions either don’t work at all or simply do not perform as advertised. I remember once I wanted to cleave a t-butyldiphenylsilyl group from a molecule in the presense of a t-butyldimethylsilyl group. Normally, the dimethyl version would be the first too cleave, the diphenyl one being the more stable. There was exactly one published method to do this reaction in the scientific literature at the time. The paper called for sodium hydride in strictly anhydrous HMPA (hexamethylphosphoramide — pretty nasty stuff, actually). I struggled with that reaction for weeks attempting to get something better than a 5% yield. Eventually, I figured out that if you add a drop of water, the reaction works like a charm. In other words, the HMPA wasn’t anhydrous at all, contrary to the description in the published procedure. As near as I could figure out, the active reactant in the reaction wasn’t a hydride ion (from NaH) at all, but rather a “naked” hydroxide ion, generated in situ by the reaction between NaH and water. [This is an almost legendary anecdote in organic chemistry — published reactions that don’t work until someone discovers that adding a drop of water fixes them. I’ve heard the same story about other reactions in addition to this one of my own experience.]
The Peer Review Process …
When a paper in chemistry is submitted for publication, the editor of the journal typically passes it on to three others for review. Those three will have published in the same area, but they might have different specialties. For example, if you submit a paper on doing Claisen rearrangements in microwave ovens, one reviewer may be well published in microwave chemistry, and another may be well published in Claisen rearrangements. The review process generally does not involve replication of any of the work. The reviewer can, if he wishes, try to replicate the work, but who has time for that? Most professors feel put upon as it is when asked to do anything that takes them away from their own research and publishing. So what happens is this: The professor first makes sure that the paper doesn’t cast any doubt on his or her own work. If so, he will scrutinize it ruthlessly, and it will be a cold day in hell when it finally gets published. After that, he will consider if it’s a good paper and has anything that he can use as a scoop. He is among the first to be seeing the work, after all, and can get a lead time of several months on the competition if there is anything valuable in the paper. If the paper is from a well respected chemist in the field, the reviewer will likely be gentle so as not to offend the author, should the identity of the reviewers become known. If, on the other hand, the paper is a “me too” paper, or the author is a relative unknown, the professor will probably just pass it on to one of his graduate students to review. The first time a graduate is given a paper to review, he’ll often feel proud to be given such a responsibility and will do a thorough job. After that first time though … “Fuck it! I’ve got a lab to teach and a reaction to do!”
Monks and Climate Scientists …
That’s my experience, anyway. Getting back to climate change and the often touted peer review process and golden consensus of climate science: Am I really expected to believe that they are any different from the rest of us? Profs in climate science never punt bullshit papers down to their graduate students to review? They never seek to block from publication papers that contradict their own work? They never try to scoop one another? They never load crap papers down with trendy jargon because climate science journals are just above the trendiness seen in other disciplines? Their data is so noise and error free that everything works and nothing is ever wrong? In short, am I really expected to believe their shit don’t stink? Yeah, right.
Religious scholars have their own peer review process as well, you know. My guess is, somewhere in the world right now, a monk is probably reviewing a paper. What do you suppose the odds are of that paper passing peer review if the paper’s conclusion is that Jesus never existed? I’d say probably zero, because the consensus view among Christian scholars is likely that Jesus existed, and you can’t go against the consensus, can you?
The Consensus on Dark Matter …
I don’t want to leave the impression here that I don’t feel scientific consensus is valuable or that I think the peer-reviewed scientific literature is useless. On the contrary, I consider both valuable. But I also think they have their place and that they shouldn’t be slung about as they are by pundits and media outlets bent on pushing their causes. Take the case of dark matter as an example. I find the whole subject of dark matter to be very “witchy.” Were it something that was pushed only by a tiny handful of physicists, I would likely dismiss them as cranks and move on. Given the general consensus on the existence of dark matter, however, it’s rather tough to do that. One really can’t brush it off so readily.
At the same time, I’m not going to believe in dark matter simply because I’m told that a consensus of scientists believe in it. That is the way we are treated with the subject of climate change though, isn’t it? “It’s the consensus! Shut up! Move on!”
No. I believe in dark matter because it makes sense. I understand the basics of how, for example, the rotation of stars about a galaxy’s core is determined. I might lack the expertise (and equipment) to go out and measure galactic rotation myself, but I understand the basic principles. And I understand the basics of gravity and how that works, and I can see that the measured rotations of galaxies don’t seem to match the amount of matter that is observed to comprise galaxies. I can see that there is a problem there and that something is off and that unseen mass is a likely culprit. It seems that this unseen mass of galaxies has to be composed of something we don’t understand: If it were simple gases or other common matter that interacts with light, we should be able to detect it, but we don’t. So yeah … a consensus of scientists believe that dark matter exists and there are rational reasons to believe that dark matter or something equally spooky has to exist to make sense of things. I believe in dark matter, in other words, because I am a human with a brain and capable of reason.
But don’t push it! Dark matter is still kind of dodgey to me, and I’ll be more than happy to shift my acceptance of it if given half a chance. If you want me to fully embrace the idea, show it to me. Give me a chunk of it. Tell me something about it’s measurable properties other than gravity, and the techniques you use or plan to use to measure those properties. Explain to me why we don’t have chunks of dark matter zipping through our solar system and our bodies all the time. Do those things, and then we can talk about that time-share in Florida you want to sell me!
Some additional resources
- In this post, I referenced Richard Feynman’s Cargo Cult speech at Caltech. He also told the anecdote of Millikan’s oil drop experiment in Surely You’re Joking, Mr. Feynman! I don’t think an audio recording of the commencement address exists, but I’m sure I’ve heard Feynman tell that story in a lecture. I’ve been looking for it for days so that I can provide a link, because it’s so much cooler to hear the story told in Feynman’s own voice. Sadly, I haven’t been able to locate it.
- There is a followup article to The Truth Wears Off called More Thoughts On The Decline Effect. The follow up is a good read as well, but it’s not nearly as awesome as the original article.
- Not discussed in this post but worth mentioning: A lot of trash science seems to pass peer review merely because it is attached, however loosely, to prestigious institutions such as NASA. Two notable examples of trash science tied to NASA are the alleged fossils in a martian meteor, and a purportedly alien microbe that was falsely claimed to incorporate arsenic rather than phosphorus in its DNA.