5. Statistics Which Analyze Relationships
1. CORRELATION
What it is
Correlation tells you how strongly two numerical variables move together.
Examples:
- Height ↑ → Weight ↑
- Socioeconomic class ↑ → Mortality ↓
When you use it
Use correlation when you want to know if two variables have a linear relationship.
(Not to find cause—only association.)
Key term: Correlation coefficient (r)
The coefficient is written as r.
- r = +1 → perfect positive linear relationship
- r = –1 → perfect negative linear relationship
- r = 0 → no linear relationship at all
How to interpret r (book rule-of-thumb)
- 0 to 0.2 → very low, probably meaningless
- 0.2 to 0.4 → low correlation
- 0.4 to 0.6 → reasonable correlation
- 0.6 to 0.8 → high correlation
- 0.8 to 1.0 → very high (check for errors or duplication!)
Applies equally to negative values.
Example from the book
A nurse compared fasting glucose vs HbA1c in 12 diabetics.
Scatter plot showed a straight-line trend → r = 0.88, meaning:
- Very high positive correlation
- As glucose increases, HbA1c increases
Another example:
Activity level vs BMI → r = –0.34
- Low negative correlation
- Higher activity → slightly lower BMI
Spearman vs Pearson
- Pearson’s r: if data are normally distributed
- Spearman’s r (rs): if skewed
(Study used Spearman because prescribing data were skewed.)
Important warning
Correlation ≠ causation.
R² (coefficient of determination)
If r = –0.88 → R² = 0.77
Meaning 77% of variation in HbA1c is explained by glucose variation.
2. REGRESSION
Regression is correlation plus prediction.

What regression does
It draws a best-fit line through the scatter plot to quantify how much one variable changes when the other changes.
Regression line
Written as:
y = a + b x
Where:
- a = regression constant (where line hits vertical axis)
- b = regression coefficient (slope)
b tells you how much y changes when x increases by 1 unit.
Book example
Using the same glucose–HbA1c data:
Regression equation:
HbA1c = 3.2 + (0.45 × glucose)
Meaning:
- Every 1 mmol/L increase in glucose → HbA1c rises by 0.45%
- If glucose = 15 mmol/L → predicted HbA1c = 9.95%
This is exactly the graph shown in the book (Fig. 12).
What R² means in regression
R² = 0.77 → 77% of variation in HbA1c is explained by glucose.
Types of regression
- Linear regression → straight line (book example)
- Logistic regression → outcome has 2 categories (e.g., diseased vs not)
- Poisson regression → rare events / waiting times
- Cox regression → time-to-event (survival) analysis
Warnings
- You can only predict within the range of your data (no extrapolation).
- Regression requires a logical directional relationship (x should come before y).
Summary Table (Based on the Book)
Concept | Meaning | When to Use | Key Output |
Correlation | Measures strength of linear association | When seeing whether two numerical variables move together | r (from –1 to +1) |
Regression | Quantifies and predicts the relationship | When you want a formula to predict y from x | Regression line: y = a + bx; b, a, R² |