How we know the numbers are right.

Every figure on BedsideDx comes from the published medical literature, and we check each one against the original study before it goes on the site. Here is how we find the evidence, judge it, and report it — and how you can verify any number yourself.

Every number is real, and verified

We do not generate diagnostic statistics — we report them from peer-reviewed research, and we confirm each one against the source before publishing it. The study’s authors, year, journal, and the actual numbers all have to match. Every figure on the site links to its citation so you can check it yourself, and if a study’s claim doesn’t hold up on inspection, it doesn’t go on the site. We never estimate a value to fill a gap.

We grade the evidence, not just report it

A striking number from a single small study is not the same as one pooled from every study ever done on the question. So alongside each finding we show how strong the evidence behind it actually is:

Strong — drawn from the most rigorous evidence available: a high-quality systematic review or meta-analysis that pooled results across multiple studies of the same question.
Moderate — built on one substantial study, used in the uncommon case where reviews exist but none analyzes that specific sign for that specific diagnosis.
Limited — based on a single well-conducted study, or one study drawn from a larger review. Real evidence, but a narrower foundation.
Insufficient Evidence — we searched and found nothing that met our bar. Instead of inventing a number, we label the finding honestly and note what is missing. You still see the finding and its clinical context — just not a statistic we can’t stand behind.

The grade reflects the quality of the evidence, not the size of the number. A large likelihood ratio from one small study still rests on limited evidence, and we label it that way.

How we find the evidence, and what clears the bar

For every finding we work down a hierarchy:

Rigorous syntheses first. Systematic reviews and meta-analyses conducted to recognized standards for searching the literature and appraising risk of bias. When one has already answered the question, we use its pooled results as published.
Then well-designed primary studies. If no such synthesis exists, a single study can qualify — but only if it was done well: an appropriate patient population, a blinded comparison against a true diagnostic standard, an adequate sample, and complete reporting of its accuracy.
Otherwise, we say so. If nothing meets that bar, the finding is marked Insufficient Evidence.

We are deliberately strict about two things. The study has to be about the same diagnosis we are applying it to — we don’t borrow a number from a related condition or stretch one study’s result across several diagnoses. And we don’t anchor a rating on preprints, conference abstracts, or exam textbooks; those can inform context, but a rating has to rest on peer-reviewed evidence.

What the ratings mean for you

We lead with the likelihood ratio because it tells you how much a result should actually move you for the patient in front of you: a large ratio makes a diagnosis much more likely when the finding is present, and a very small one makes it much less likely when the finding is absent. Unlike sensitivity and specificity, it applies directly at the bedside.

So you don’t have to do the math in your head, we translate every ratio into a plain rating using the long-established Sackett thresholds:

Very helpful — a large, often decisive shift
Helpful — a moderate, frequently useful shift
Somewhat helpful — a small shift that occasionally matters
Minimally helpful — a shift too small to act on by itself
Not helpful — no meaningful shift

One rule we hold firmly: if a finding’s confidence interval includes the possibility of no effect at all, we rate it Not helpful regardless of how good the headline number looks. A result that might be statistically meaningless shouldn’t change a clinical decision.

Some findings are the reverse of what you’d expect — a positive result argues against the diagnosis. The classic example is reproducible chest-wall tenderness in suspected acute coronary syndrome, where finding it lowers the probability of a cardiac cause. We label these by their true clinical meaning so there’s no ambiguity.

The probability calculator

For subscribers, the calculator does the arithmetic. Enter your starting (pre-test) probability and the result you observed, and it returns the updated (post-test) probability using Bayes’ rule — the standard way to combine a prior probability with a test result. If a finding’s evidence is statistically shaky, the calculator declines to compute, because multiplying by a number that might be meaningless would only give you false confidence.

What we are upfront about

Diagnostic evidence is never perfect, and we would rather you know the caveats than discover them later:

The studied patients may not be yours. Accuracy studies are often done in particular settings — a referral clinic, an emergency department — and may not translate to a different population. Where the mismatch is large, we flag it.
Some findings are hard to reproduce. A number of physical signs are elicited or interpreted differently from one examiner to the next, which limits even a strong result in everyday practice.
Yardsticks change. What counted as the diagnostic gold standard decades ago has sometimes been surpassed — ultrasound replacing the chest X-ray for some questions, for instance — so older numbers can understate how an exam compares today.
Flattering results get published more. Studies showing a test works are likelier to appear in print than those showing it doesn’t, which can make a sign look better than it really is. We favor the most comprehensive syntheses available, but no source removes this risk entirely.

← Back to about