HONcode Certified

This website is certified by Health On the Net Foundation. Click to verify.

This site complies with the HONcode standard for trustworthy health information: verify here.

Understanding evidence-based medicine in 4 days. Lesson 2: 5% is the magic number for confidence and certainty.

Ami Banerjee
Last edited 17th March 2010

Some things in life are certain. We are 100% sure that every human being will die at some point in their lifetime. Other things are close to 100% certain, but there is a small chance of an alternative outcome. When a person has a coronary angiogram to look at the arteries in their heart, we are 99.9% certain that they will have an uneventful procedure, but 0.1% of the time there will be a major complication such as bleeding, stroke, or a heart attack. If the person has had previous heart attacks or other illnesses, this chance increases and might reach as much as 2%. In still other situations, the chance of an outcome is much less certain. For example, the chance of surviving for 5 years after a diagnosis of bowel cancer may vary from less than 10% to near 100%, depending on the severity of the cancer.

There is a lot of uncertainty in medicine and in medical research, and yet media reports of health and science often give the impression of “black-and-white”, exact figures. The yoghurt in my fridge says “Best before 15/11/09”. Does that mean that all the yoghurts go off at the same time on the 15th of November? Of course not. The reality is that this date is an estimate and the date when my yoghurt goes off is within a range. Some yoghurts will go off earlier than the best before date, and others will go off after the best before date. This range is called a “confidence interval”. Confidence intervals can be set so that most of the possible results will be within that range. It might be that 99.9% of yoghurts are fine if eaten before 15/11/09, but this means that 0.1% of yoghurts will go off before or after that date.

In yesterday’s example, immobilisation for 15 minutes immediately after artificial insemination increased the relative risk of a successful pregnancy by 50%. However, the increase in relative risk actually lies between 10% and 120%. Moreover, the increase in risk is only in this range 95% of the time, and 5% of the time, it is outside of this range. A recent study of the UK’s GP database looked at the effect of statin therapy on future risk of gallstones and gallbladder surgery . The researchers showed that the odds of getting gallstones if you were on long-term statin therapy were 66% of the odds of gallstones if you were not taking statin therapy. However, the 95% confidence interval for the odds ratio was 59% to 70%. Therefore, long-term statin therapy seems to convincingly reduce the risk of gallstones and gallbladder surgery by a third. By convention, we accept 95% confidence intervals, and the narrower the range, the more certain we are of the finding.

In the New England Journal this week, American researchers were interested in whether or not the use of the heart-lung machine (“cardiopulmonary bypass”). during coronary-artery bypass graft (CABG) surgery affected death rates. Scientists test their results using hypotheses. The “null hypothesis” in this case was that use of the heart-lung machine during CABG surgery would make no difference to the death rate after 1 year. The “alternative hypothesis” was that use of the heart-lung machine during CABG surgery would make a difference to the death rate after 1 year. The alternative hypothesis was proved: (a) use if the heart-lung machine led to lower death rate and (b) less blockage in the grafted arteries at 1 year. By testing these results against the null hypothesis, we get a “p-value”. The p-value is simply the chance of the result occurring due to chance alone. Again, the cut-off is a p-value of 0.05 or 5%. For the death rate, the p value was 0.04 or 4%, whereas the p-value for the difference in graft blockage at 1 year was 0.01 or 1%. So the combination of confidence intervals and p-values tells us about how reliable a result is and with the rule of 5%, anybody can spot a chance finding and assess statistical significance.

Lesson 3

Historical Statistics -- Why?

I am surprised to learn that the medical research profession is still using test statistics, arbitrary alpha levels, P-values, leading to arbitrary decisions about "statistical significance." These historical methods were developed in the early and middle of the last century and are of little use today.

I read a recent paper where the author called these methods "paleo statistics." Why is medicine so out of touch with modern methodologies?
I can understand the reluctance to go Bayesian, even with "flat" priors on parameters and models. I cannot understand why there has been so little use of the information-theoretic approaches. P-values are not evidential (see Royall's 1997 book on this -- prior to retirement he was the head of the Department of Biostatistics at The Johns Hopkins University). Why call it "Evidence Based Medicine" when using P-values? I would think evidence based medicine would be based on formal evidence.

Help me.

David R. Anderson

Looks like I am preaching to the choir.....

Thanks for all the comments.

Michael-you are nearly right. A p value is the chance that a repeat experiment would have a result that is the same or more extreme, "assuming the null hypothesis is true". The lower the p value, the less likely that the result is due to chance and therefore the statistically significant the finding. If we are being exact, you can never really accept the experimental hypothesis, you can only reject the null hypothesis.

Phil and Carl- I agree. If we are studying the results of one study, a p value of >0.05 tells us that the findings from that study are due to chance greater than 5% of the time and so are not statistically significant. In a meta-analysis, such data must be included because by definition we want to include all available published and unpublished data (not just the data that is statistically significant). Unfortunately, there is a publication bias in journals to only publish positive findings....


sometimes you want more reassurance

Dear Ami

another neat article on EBM. It isi nteresting though that we builld our bridges with 99% certainty yet in health we only require 95% confidence. Couple of intetrsting caveats are when there are mutiple outcomes you may want to consder a bonferonni correction - used to address the problem of multiple comparisons

Or when there is only a single trial and the risk of bias is hign then you may want more certianty ie p<0.01

Cheers Carl

But why 5%?

I have heard a very good argument that this arbitrary number is not the best practice. It's a reasonable rule of thumb, but if you're looking in detail you shouldn't throw out all data with with p>0.05. If you do, you might well throw the baby out with the bathwater.

One place where you should keep all those high p-value results is in a meta-analysis, particularly if there have been many, relatively statistically insignificant studies. By combining the results you may end up with a p value less than the magic 0.05, even though each individual trial had a p-value > 0.05.

It's a reasonable rule of thumb, but isn't it better to accept that there is a scale of reliability of results, rather than a discrete reliable/unreliable distinction?


definition of p value

This is a nice explanation of p values. But, to be very picky, a p value is the chance that a repeat experiment would have a result that is the same or more extreme.


Twitter TrustTheEvidence.net


Search the TRIP Database

TRIP Database


Recent Comments