BASIC · EP 05 · BIOSTATISTICS
Before You Listen
Episode Setup
- Topic in one line: the back half of the ABPMR Part I biostatistics blueprint, picking up where Part 1 left off after the study-design hierarchy: the catalog of systematic biases that distort even well-powered studies (selection, Berkson, recall, observer or ascertainment, lead-time, length-time, Hawthorne, attrition, publication) and confounding with its mitigations, Type I error (alpha) and Type II error (beta) with power equal to 1 minus beta, the p-value and its frequent misinterpretation, the 95 percent confidence interval (CI) rule for differences and ratios, absolute risk reduction (ARR) and the number needed to treat (NNT) equal to 1 divided by ARR, the matching of statistical tests to data type and group structure with the Functional Independence Measure (FIM) as the prototypical ordinal scale, reliability quantified by the intraclass correlation coefficient (ICC) and Cohen kappa with internal consistency by Cronbach alpha, validity in its four classic flavors with the minimal clinically important difference (MCID) and minimal detectable change (MDC), the four pillars of bioethics with the four elements of decision-making capacity, and the four phases of clinical trials from Phase I safety dosing to Phase IV post-marketing surveillance.
- Prerequisites: Part 1 of BASIC-05 (measurement scales, central tendency and skew, the 68-95-99.7 empirical rule, the 2x2 contingency table with sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), likelihood ratios, parallel and serial testing, the receiver operating characteristic (ROC) curve, the evidence pyramid from meta-analysis down to case reports, and the mapping of relative risk (RR), odds ratio (OR), and hazard ratio (HR) onto cohort, case-control, and survival designs).
- Runtime: approximately 35 minutes for Part 2.
Vignette. A multicenter randomized controlled trial (RCT) of a new anticoagulant for stroke prevention in atrial fibrillation enrolls 20,000 patients. After 3 years, the stroke rate is 2.0 percent in the control arm and 1.0 percent in the treatment arm, with a hazard ratio of 0.50 and a 95 percent confidence interval (CI) of 0.42 to 0.59 and a p-value of less than 0.001. The drug manufacturer’s marketing material reads, “A 50 percent reduction in stroke risk.” A separate post-hoc subgroup analysis shows a kappa of 0.62 between two stroke adjudicators reviewing the imaging.
Calculate the absolute risk reduction (ARR), relative risk reduction (RRR), and number needed to treat (NNT). State whether the result is statistically significant by the CI rule. Interpret the kappa value. Then explain how a resident should counsel a patient about this drug, given the gap between the headline number and the clinical reality.
(Answer at the end of this chapter)
Section 1: Bias Catalog and Confounding
Bottom line: bias is any systematic error in study design, conduct, or analysis that drags results away from the truth. The board catalogs nine named biases, and the testable skill is matching a vignette to its bias label. Selection bias (with Berkson bias as the hospital-based subtype) recruits a non-representative sample. Recall bias is fatal to case-control studies because cases search their memory for exposures more vividly than controls. Observer or ascertainment bias is fixed by blinding. Lead-time bias and length-time bias distort screening trials by moving the diagnosis clock backward or selectively capturing slow-growing tumors. The Hawthorne effect is behavior change from being watched. Attrition bias is differential dropout. Publication bias is the systemic preference for positive results, detected by an asymmetric funnel plot. Confounding is a third variable linked to both exposure and outcome; it is controlled by randomization (best, including unknown confounders), matching, stratification, or multivariable analysis.
Bias is a systematic error in design, conduct, or analysis that produces results different from the truth. The board catalogs roughly nine named biases, and the high-yield skill is matching a vignette to the right label rather than memorizing every variant.
Selection bias appears when the study population does not represent the target population because of non-random sampling. Berkson bias is the canonical hospital-based subtype: inpatients carry different comorbidity profiles than the community, so any apparent association between two diseases studied only in admitted patients may exist purely because both raise the probability of admission. A study of back pain restricted to hospitalized patients overstates how severe and complex the average case is in the community.
Information bias (sometimes called measurement bias) is inaccurate data collection or classification. Misclassification is either differential (errors differ between groups) or non-differential (errors are equal across groups); non-differential misclassification biases the result toward the null.
Recall bias is the dominant threat to case-control studies. Patients with disease comb their memories for plausible exposures more thoroughly than healthy controls. A mother of a child with a birth defect remembers every medication taken during pregnancy in painful detail; a mother of a healthy child remembers almost nothing. The exposure rate in cases is artificially inflated by the search itself. Mitigation is to lean on objective records, such as pharmacy databases or registries, rather than interview recall.
Observer bias, also called ascertainment bias, occurs when the investigator’s knowledge of group assignment influences how outcomes are measured. A researcher invested in a new pain medication may unconsciously rate the treatment arm’s pain reports as lower. Blinding the outcome assessor closes the loophole; double-blind and triple-blind designs extend the fix to participants and analysts.
Lead-time bias is the screening illusion that survival has improved when only the diagnosis clock has moved earlier. The bottom line stays the same: the patient dies on the same calendar day. The correct measure of screening benefit is mortality reduction, not survival time. Length-time bias is a parallel screening trap: any annual screen preferentially catches slow-growing indolent disease because it spends years in a detectable window, while aggressive disease arises and kills between intervals. Screened populations therefore look better than unscreened populations even when screening confers no real benefit against aggressive disease.
The Hawthorne effect is the change in subject behavior caused by knowing they are being observed. Hospital hand-washing rates spike whenever an auditor is visible at the sink. Attrition bias is differential dropout; if sicker patients leave one arm at higher rates, the comparison is corrupted. The intention-to-treat (ITT) analysis introduced in Part 1 is the design fix. Publication bias is the systemic preference of journals for positive, statistically significant studies. Funnel plots are the screening tool: a symmetric inverted funnel of small studies scattered around the pooled estimate suggests no publication bias, while an asymmetric funnel with a missing chunk on the negative side suggests that small failed studies are sitting in file drawers.
Confounding is a third variable independently associated with both the exposure and the outcome, generating a spurious association between them. The textbook example is that people who carry lighters have higher rates of lung cancer; smoking is the hidden third variable. Confounding is controlled by randomization (the most effective method, the only one that balances unknown confounders), matching controls on key characteristics, stratification to analyze subgroups separately, or multivariable analysis to statistically adjust. When a board question specifies controlling for unknown confounders, only randomization qualifies.
::: {.callout-important}
## High Yield — Bias and confounding
- Selection bias (Berkson in hospital-based studies): random sampling.
- Recall bias: case-control studies; objective records.
- Observer / ascertainment bias: blinding.
- Lead-time bias: screening detects earlier; compare mortality rates, not survival time.
- Length-time bias: screening catches indolent disease; compare mortality.
- Hawthorne effect: behavior change from observation.
- Attrition bias: differential dropout; ITT analysis.
- Publication bias: positive studies published more; funnel plot asymmetry.
- Confounding: controlled by randomization (best, including unknown confounders), matching, stratification, multivariable analysis. :::
Board Trap — Lead-time bias inflates survival without changing outcome
A vignette describes a new screening test that detects breast cancer 2 years earlier than usual clinical presentation. Five-year survival is 90 percent in the screened group versus 60 percent in the unscreened group. The trap is to conclude the screening saved lives. Lead-time bias explains the apparent survival difference even when mortality is unchanged. By detecting disease 2 years sooner, the diagnosis-to-death window grows by 2 years while the calendar day of death does not move. The correct measure of screening benefit is mortality reduction, not survival time.
The survival time from diagnosis to death is five years. The researchers publish a paper claiming their new screening test increased survival time from two years to five years. It looks like a massive medical breakthrough, but the reality is the patient did not live a single day longer.
— BASIC-05-b podcast, ~38:00