In October 2018, a prestigious scientific journal published a series of articles that jolted the medical world. Scientists from the United States stated, in the wake of an extensive study, that people over 70 who take aspirin regularly as a preventive measure, are not at lower risk of falling ill with heart and vascular disease – but are in fact at higher risk of dying from cancer. Until then, physicians had urged healthy older people to take aspirin as a means of preventing myocardial infarction (a heart attack), on the basis of studies that also claimed that in addition to its beneficial influence on the heart, aspirin could reduce the risk of dying from cancer. Now all that was being tossed out the window.
Here in Israel, the study deeply troubled Benjamin Mozes, an internist who regularly prescribed aspirin for his patients. How, he wondered, could a medical fact that had just a year earlier been a rock-solid truth, be suddenly upended? And what was he supposed to do now – alter the treatment? And how could be explain the flip-flop to his patients without undermining their trust in him and in medicine as such?
Dr. Mozes, 69, was a senior physician in the internal-medicine ward at Sheba Medical Center in Ramat Gan, the head of the quality of health services unit at the Gertner Institute for Health Policy, the scientific director of the Israel National Institute for Health Policy and Health Services Research, and an associate professor of clinical epidemiology at Columbia University in New York. Currently he is devoting his time to community medicine.
Having been active in the medical system for 40 years, and having also studied it academically in depth, Mozes often finds cause for fierce criticism of the profession, some of which he has published in the Haaretz Hebrew edition. He asked himself whether aspirin is only indicative of a larger situation in clinical research overall, and decided to examine the question in depth. His diagnosis now appears in the form of a book, “Requiem for Aspirin” (published, in Hebrew, by the Magnes Press). The patient, it turns out, is not in great condition – though it is not incurable.
“I broke down the methodology of clinical trials into its constituent principles, and I examined how they are applied in practice,” he explains. “I wanted to understand whether the clinical trial is truly the ultimate tool that brings us closer to scientific truth. I started with aspirin and went on from there.”
We’ll start with aspirin, too. What caused the dramatic difference between the 2018 study and earlier studies?
Mozes: “That study could be described as the ‘perfect trial’: The researchers worked without limits of time or budget and applied strictly all the principles of controlled clinical research. For five years they monitored a large number of participants – 19,000 – who were divided randomly into two groups. One group received aspirin and the other served as a control. In all the other parameters – age, sex, socioeconomic class, health, smoking habits, etc. – the resemblance between the groups was exemplary, and the results were decisive. They weren’t borderline and they weren’t dependent on statistical tricks.
- You Can't Train People to Be Less Racist, Israeli Researchers Find
- When Scientific Data Is Too Good to Be True
- Way Worse Than COVID: This Is the End of the Road for Antibiotics
“The problem is that this successful trial is the exception that proves the rule. In practice, trials of this sort are hardly ever conducted. All the previous studies on aspirin encompassed only small numbers of participants. The differences between the two groups were too large, the monitoring periods were too short, and in some of the studies the results were not statistically significant. In some studies, the advantage that aspirin offered to the subjects was offset by their tendency to bleed, in some cases to the point of threatening their lives. Yet despite all this, the medical community arrived at a consensus that aspirin is a good thing – both for the heart and against cancer.”
How could that happen?
“It turns out that the controlled clinical trial is not the ultimate path to bridging between basic science and medical practice. The protocol of the clinical trial, as we know it, has gradually become more sophisticated, and today, 73 years after the first clinical trial – which examined the effectiveness of the antibiotic streptomycin for tuberculosis patients – its principles are agreed on. For example: that a comparison is needed between two groups, one that receives the treatment being tested and the other that acts as a control group; that the participants of the control group need to receive a placebo, a dummy treatment that looks like the new treatment but actually isn’t; that the participants must be allocated randomly to the groups; and so forth.
“On the face of it, these are principles carved in stone. A defensive shield against empty allegations, amateurish determinations and dubious esoterica. In practice, I found that clinical trials are replete with flaws and that their execution entails compromises that endanger the reliability of the conclusions that are derived from them.”
Give me an example of a flaw.
“Take, for example, the double-blind principle, which states that both the participants in the trial and the physicians who are conducting it are supposed to be ‘blind’ – that is, they don’t know which participants are in the treatment group and which are in the control group. This is accomplished by giving a placebo to the control group, in order to simulate a situation in which they are receiving the same treatment. The point is to differentiate between the effect of the treatment itself and the psychological effect of the anticipation of success, on the part of both physicians and patients.
“Yet, despite the declared attempts to conduct the trials under double-blind conditions, in many cases a considerable number of the patients and the physicians are able to discern whether they received an experimental medication or a placebo. That happens, for example, when the medication has an overt effect, such as in testing a medication that is supposed to affect the heart rate [a participant whose heart rate has changed can surmise that he is in the treatment group]. Sometimes the side effects of the treatment give it away, for example, dryness in the mouth. Failures of that sort to create blindness can have a critical effect on the results of the trial.”
But what can be done – if a medication has side effects, you can’t hide that.
“Every trial needs to examine, by means of questionnaires for both the subjects and the researchers, how ‘blind’ they actually are to the identity of the groups – meaning that they haven’t succeeded in guessing. The rate of success needs to be factored into the results. That is only rarely done.”
The 5 percent chase
The list of shortcomings Mozes finds in clinical trials goes on. A major one has to do with determining when a result is significant – in other words, that it attests to a substantive phenomenon – and when it is merely accidental. The idea that it’s necessary to determine whether the difference that’s found in the trial between the treatment and control groups is significant, was conceived by the British statistician Ronald Fisher (1890-1962). He suggested using a statistical test whose result would be the gauge of significance. This was what he called the P (probability) value. Generally speaking, it can be said that the lower the value, the more probable it is that the trial results will be statistically significant. Fisher set a threshold value of 5 percent, below which the results are considered significant and above which they are said to be accidental. Seemingly, a definition of a gatekeeper of this sort is an excellent notion, one that safeguards science. It disqualifies chance findings and allows the entry only of those that are well-based. But this gatekeeper has become a tyrant.
I found that clinical trials are replete with flaws and that their execution entails compromises that endanger the reliability of the conclusions that are derived from them.Mozes
“Over time,” Mozes says, “this value, which was set completely arbitrarily, acquired almost holy status. It became a craving, a goal to be reached instead of a means to protect the reliability of the research. The P-value acquired the role of a sealer of fates in clinical research: favorably, meaning publication in scientific journals, if the study achieved the P-value; or unfavorably, for the wastebasket, if it failed. That leads researchers to juggle the data in order to reach the coveted target of 5 percent. That’s done by what’s known in the jargon as ‘cherry-picking’: deliberately choosing the desirable data and ignoring data that slightly interfere with the statistics.
“These manipulations can affect the research dramatically. For example, when a company that wanted to market a medication for fibromyalgia achieved results that did not cross the 5 percent threshold, the researchers decided to dispense with the data that came from a particular group, of depressed individuals. After they also revised one of the criteria for evaluating the medication’s success, they succeeded in crossing the threshold – they succeeded in the game of ‘crossing the line,’ but in life they caused a great injustice. The medication did not relieve the patients’ suffering, but it did cause serious side effects, some of them life-threatening, in certain cases.”
Manipulations can reflect the economic interests of a pharmaceuticals company, but in most cases they stem from the researchers’ eagerness to publish. “Some physicians grasped that in order to earn a great deal of money it’s necessary to acquire the title of professor, to which end they need to publish articles. So they do these tricks. They go to a statistician and say, ‘Get rid of some data so that we’ll get a value of 5 percent.’ Because journals publish almost only articles [about studies] that meet the threshold criterion – and because publication of articles and career are closely intertwined – a conflict is created between the researcher’s incentive to achieve results that cross Fisher’s threshold and the need to analyze the results reliably.”
It’s grim, but maybe it’s an innate flaw, not in the method but in human nature. When researchers face a limitation, they tried to bypass it. Drivers who slow down to look at an accident that happened in the opposite lane create a traffic jam. It’s infuriating, because it could seemingly be prevented, but in practice it can’t.
“The fact is, it wasn’t always like this. Fifty years ago, the threshold value was treated as being far less central. It’s only in recent years that it’s become a superstar.”
How did it arrive at that status?
“People want clear-cut answers: yes or no, it works or it doesn’t work. Paradoxically, today, when there is abundant information that is also so accessible, uncertainty looms large and people are looking for strong filters. The American Statistical Association issued a public statement that was strongly critical of Fisher’s statistical decider mechanism; they said that reducing data analyses to mechanistic laws in order to justify a scientific argument is highly problematic. The thrust for truth requires far more than a binary standards label, and a lone index is no substitute for reason and judiciousness.
“The results of a clinical trial need to be weighed from multiple angles, of which statistical significance is only one, and not the most important. It’s important, for example, to understand the biological basis and also to take account of the potency of the effect. For example, scientists tested a new medication on patients with pancreatic cancer, and found that it prolonged their lives, as compared with the accepted treatment. It was a statistically significant result, and they published it in a respected journal as a great achievement. But the patients who received the new medication lived 10 days longer than those who did not receive it. That’s all. What does that say? Nothing that is of medical or scientific importance.”
There’s a saying that when a trial works, you don’t need statistics.
“There are trials whose result is a ‘bullet between the eyes.’ The trial that examined Pfizer’s coronavirus vaccine was like that. Its effectiveness was proven in an exemplary trial that meticulously followed the principles of controlled clinical trials, and the results were so unequivocal in the short term that no statistical examination seemed to be necessary. In all the breakthroughs that transformed the face of medicine, the effect of the new treatment was not in any doubt. For example, when penicillin was discovered, when insulin was administered and when defibrillators – an electric shock by direct current – came into use in cases of irregular heart rate.
“It can be said that when the results are so clear and dramatic, it’s almost immoral to conduct a controlled clinical trial. Why deprive the control group of the treatment? But the number of breakthroughs is declining, whereas the urge to publish is growing. The result is that more and more small, pitiable research studies are being conducted, with marginal results.”
Herein lies a another sensitive point. The factor that is supposed to create order in research and draw us closer to the truth is the meta-analysis, a study that surveys all the research studies done on a subject over the years and sums them up. This appears to be a powerful tool, which makes it possible to sum up all the findings and extract from them general, more valid conclusions than can be achieved in a lone study. In fact, that is an illusion.
What led to the multiplicity of small, constrained and inferior studies in the clinical realm is the eagerness to publish articles. That is the source of the evil.Mozes
“The meta-analysis has acquired superstar status as possessing the ability to decide and clarify the truth,” Mozes says. “There is nothing more deceptive. The raw material of the meta-analysis is a patchwork of clinical studies that do possess similarities but also a great many differences, and it’s doubtful that they can be grouped together in order to draw a conclusion from them. In addition, when there is an inflation of articles some of which present scientifically flawed studies, their results affect the meta-analysis and distort it.
“For example, many articles report small-scale studies, but combining the results of small studies is not a substitute for one large-scale study, because small-scale studies are often a dubious proposition. Let’s say that a trial found a difference between the group that was given the treatment being tested and the control group. The question is always asked whether that difference stems from the treatment or is connected to some extraneous factor (perhaps the treatment group happened to include more vegetarians, in a manner that affects the result). If the trial is a large-scale one – if the groups include a large number of participants – the irrelevant factor, the ‘noise,’ is divided between them equally, so that it does not affect the difference between them.
“In a small trial, the ‘noise’ can have a far greater effect on only one of the groups. In other words, a meta-analysis cannot transform something inferior into something superb, and that is why Alvan Feinstein, who is considered the father of clinical epidemiology, called meta-analysis ‘the statistical alchemy of the 21st century’ – in other words, a foolish attempt to transform a heterogeneous mixture of cheap elements into gold.”
Is it possible that our expectations of medicine are exaggerated? After all, in addition to the human factor that sometimes spoils things, clinical research has a deep innate problem: the tremendous difference between trial participants. So fundamental is the difference between clinical research and biological research that perhaps the question to ask should be whether medicine is actually a science in the same sense that biology is one. Whereas a biologist examines the effect a certain substance has on groups of completely identical mice – identical genetically (hence they are known as a “pure breed”) and also in their growth environment (the laboratory) – the research physician examines people between whom vast differences exist. Each has a different genetic makeup, a different past, a distinctive way of life and their own mental baggage, besides which there are differences between men and women, and so on.
All these variables give rise to differences in the response to treatment. In this sense, medical research more closely resembles research in the social sciences, because of the immense complexity of the subjects. With so many factors influencing the results, it’s very difficult to arrive at “scientific” conclusions.
“I agree that there is a great difference between people,” Mozes says. “Accordingly, clinical research is a more complex field. In spite of that, I think it is possible to conduct good clinical research if one sees to it that the number of participants is very large, and that they are divided randomly into groups. The large numbers can overcome the differentiation and make clinical research more closely resemble biological research.”
You certainly realize that an article like this one will be put to good use by the “alternativists,” those who claim that all medical research is tainted by vested interests and that conventional physicians don’t know much – one time they say this, the next time they say something else.
“The more that trust in medicine is eroded, the greater the influence grows of those who supposedly offer an alternative. But even with all the problems that exist in clinical research, it’s still the best there is.”
“Exactly. I have a great deal of criticism of the system, but science still has power that is irreplaceable, and I haven’t yet despaired of the possibility to change and improve things. A person should voice criticism if he has an alternative mode of action, if he can offer a realistic solution, even if it is not immediate.”
What is the solution that you are offering?
“As I said, what led to the multiplicity of small, constrained and inferior studies in the clinical realm is the eagerness to publish articles. That is the source of the evil. And what drives that eagerness is competition over promotions, money and prizes. The interdependence between a researcher’s academic and professional advancement and the number of articles they publish, is unprecedented today. Fortunately, there is a large community that is talking about the need to breach that connection, so I am optimistic that change will occur.
Prizes can constitute a certain incentive, but they can also generate ruthless competition, in which the end justifies the means. In any event, they are not what encourages creativity.Mozes
“The subject of prizes is also in need of change, and it’s quite ridiculous if you think about it. After all, most of the significant scientific achievements are the result of the contribution of many researchers, not of a lone scientist. In the case of AIDS, for example, a large number of people contributed to the identification of the virus that causes the disease, to the development of the test that discovers it in the blood and to the development of the medication that copes with it. All of them are responsible for saving the lives of millions of people today. But despite this, only two scientists received the Nobel Prize for it. Prizes can constitute a certain incentive, but they can also generate ruthless competition, in which the end justifies the means. In any event, they are not what encourages creativity, and there is evidence that they are often harmful to it.”
What is the alternative?
“Most research grants today are given to scientists who present a specific project that sounds well-grounded and promising. The grants are renewed according to the progress made in the project, and this impels researchers to prove success even at the expense of quality. But there is another model, in which grants are awarded not to a particular project but to a specific scientist who displays talent and creativity. According to this model – which is applied, for example, by the Howard Hughes Medical Institute in the United States – the success of a scientist who received a grant is not measured according to their ability or failure to confirm their initial hypothesis, but according to the degree of boldness and originality they display. These scientists are not under pressure to publish a lot, and fast, so they can gamble on groundbreaking hypotheses and conduct large-scale studies without being concerned that prolonged trials or negative results will diminish their research output.
“In any event, even if they don’t succeed in bringing their idea to fruition, they have enough time, money and legitimacy to try a different direction. This model appears to justify the hopes that were pinned on it. When the achievements of the scientists who worked in accordance with it were compared with those of their colleagues who received regular grants, it was found that the former published in better journals and were quoted by more scientists (in other words, their contribution was greater). These studies frequently changed direction into new and promising channels – which is rare among scientists who receive regular grants.”
Mozes adds that it is important “to understand that clinical research is only the point of departure for what happens afterward in medical practice. For example, when I prescribe a medication whose efficiency was exemplified in the clinical studies, I still have to monitor the patients in order to examine what influence it has on them personally. After all, they, as individuals, are not obligated to obey the statistical rules. It follows that a dialogue is needed between the physician and the patient that bridges between the findings and the patient’s condition – between the clinical studies and life itself.”
Dialogue is also sometimes needed to mediate the clinical studies to the patient. Mozes: “A woman of 55, the age of menopause, arrives dripping with perspiration and tells me she showers 12 times a day. I suggest hormonal treatment, but she refuses adamantly. She has read that it causes breast cancer and heart attacks. I try to mediate to her the clinical findings on the subject.
“First, I say that the risk of breast cancer is very low, and that compared to the daily suffering she experiences, it is apparently negligible. Second, I refer to the problematic aspects of the latest clinical study, which warned against a rise in heart ailments and made a lot of noise in the media. I tell her that it contradicts earlier studies that actually showed a beneficial influence on the heart. The woman hesitates, and I explain. The average age of the women who took part in the last study was much higher than those in the previous studies – 63, as against 51 – and a great many of them started taking hormones many years after the cessation of the menstrual cycle. That’s the key to the difference.
“In older women, the arteries were affected by atherosclerosis, and the hormones raised their risk of producing blood clots that endanger the heart. On the other hand, the hormones prevent the onset of sclerosis in the ‘clean’ arteries of younger women, so in their case they actually protect the heart. I suggest to the woman that she examine the condition of her arteries, in order to determine her suitability for treatment. She agreed.”
The dialogue you are describing requires a long time, and physicians in the health maintenance organizations generally meet their patients for only 10 minutes. You are talking about a physician in the 19th century who speaks at length with his patient. Maybe he spends a lot of time doing that, because besides smelling salts he doesn’t have much to offer.
“Maybe the opposite. The fact that today there are clinical studies, medications and technologies, induces people to think that dialogue is superfluous, but there is no bigger mistake. The medical studies provide us with a basket of tools, but only the dialogue enables us to use them in way that is suitable to the patient. At the same time, it is clear that even the most fruitful dialogue is not a substitute for medications and procedures, which are products of outstanding biological and clinical studies, which are conducted by determined scientists who possess integrity and have no ulterior motives.”