Owner

Untitled

Verification

5. Statistics Which Analyze Relationships

1. CORRELATION

What it is

Correlation tells you how strongly two numerical variables move together.

Examples:

Height ↑ → Weight ↑
Socioeconomic class ↑ → Mortality ↓

When you use it

Use correlation when you want to know if two variables have a linear relationship.

(Not to find cause—only association.)

Key term: Correlation coefficient (r)

The coefficient is written as r.

r = +1 → perfect positive linear relationship
r = –1 → perfect negative linear relationship
r = 0 → no linear relationship at all

How to interpret r (book rule-of-thumb)

0 to 0.2 → very low, probably meaningless
0.2 to 0.4 → low correlation
0.4 to 0.6 → reasonable correlation
0.6 to 0.8 → high correlation
0.8 to 1.0 → very high (check for errors or duplication!)

Applies equally to negative values.

Example from the book

A nurse compared fasting glucose vs HbA1c in 12 diabetics.

Scatter plot showed a straight-line trend → r = 0.88, meaning:

Very high positive correlation
As glucose increases, HbA1c increases

Another example:

Activity level vs BMI → r = –0.34

Low negative correlation
Higher activity → slightly lower BMI

Spearman vs Pearson

Pearson’s r: if data are normally distributed
Spearman’s r (rs): if skewed

(Study used Spearman because prescribing data were skewed.)

Important warning

Correlation ≠ causation.

R² (coefficient of determination)

If r = –0.88 → R² = 0.77

Meaning 77% of variation in HbA1c is explained by glucose variation.

2. REGRESSION

Regression is correlation plus prediction.

What regression does

It draws a best-fit line through the scatter plot to quantify how much one variable changes when the other changes.

Regression line

Written as:

y = a + b x

Where:

a = regression constant (where line hits vertical axis)
b = regression coefficient (slope)

b tells you how much y changes when x increases by 1 unit.

Book example

Using the same glucose–HbA1c data:

Regression equation:

HbA1c = 3.2 + (0.45 × glucose)

Meaning:

Every 1 mmol/L increase in glucose → HbA1c rises by 0.45%
If glucose = 15 mmol/L → predicted HbA1c = 9.95%

This is exactly the graph shown in the book (Fig. 12).

What R² means in regression

R² = 0.77 → 77% of variation in HbA1c is explained by glucose.

Types of regression

Linear regression → straight line (book example)
Logistic regression → outcome has 2 categories (e.g., diseased vs not)
Poisson regression → rare events / waiting times
Cox regression → time-to-event (survival) analysis

Warnings

You can only predict within the range of your data (no extrapolation).
Regression requires a logical directional relationship (x should come before y).

Summary Table (Based on the Book)

Concept	Meaning	When to Use	Key Output
Correlation	Measures strength of linear association	When seeing whether two numerical variables move together	r (from –1 to +1)
Regression	Quantifies and predicts the relationship	When you want a formula to predict y from x	Regression line: y = a + bx; b, a, R²

Owner

Untitled

Verification

5. Statistics Which Analyze Relationships

1. CORRELATION

What it is

Correlation tells you how strongly two numerical variables move together.

Examples:

Height ↑ → Weight ↑
Socioeconomic class ↑ → Mortality ↓

When you use it

Use correlation when you want to know if two variables have a linear relationship.

(Not to find cause—only association.)

Key term: Correlation coefficient (r)

The coefficient is written as r.

r = +1 → perfect positive linear relationship
r = –1 → perfect negative linear relationship
r = 0 → no linear relationship at all

How to interpret r (book rule-of-thumb)

0 to 0.2 → very low, probably meaningless
0.2 to 0.4 → low correlation
0.4 to 0.6 → reasonable correlation
0.6 to 0.8 → high correlation
0.8 to 1.0 → very high (check for errors or duplication!)

Applies equally to negative values.

Example from the book

A nurse compared fasting glucose vs HbA1c in 12 diabetics.

Scatter plot showed a straight-line trend → r = 0.88, meaning:

Very high positive correlation
As glucose increases, HbA1c increases

Another example:

Activity level vs BMI → r = –0.34

Low negative correlation
Higher activity → slightly lower BMI

Spearman vs Pearson

Pearson’s r: if data are normally distributed
Spearman’s r (rs): if skewed

(Study used Spearman because prescribing data were skewed.)

Important warning

Correlation ≠ causation.

R² (coefficient of determination)

If r = –0.88 → R² = 0.77

Meaning 77% of variation in HbA1c is explained by glucose variation.

2. REGRESSION

Regression is correlation plus prediction.

What regression does

It draws a best-fit line through the scatter plot to quantify how much one variable changes when the other changes.

Regression line

Written as:

y = a + b x

Where:

a = regression constant (where line hits vertical axis)
b = regression coefficient (slope)

b tells you how much y changes when x increases by 1 unit.

Book example

Using the same glucose–HbA1c data:

Regression equation:

HbA1c = 3.2 + (0.45 × glucose)

Meaning:

Every 1 mmol/L increase in glucose → HbA1c rises by 0.45%
If glucose = 15 mmol/L → predicted HbA1c = 9.95%

This is exactly the graph shown in the book (Fig. 12).

What R² means in regression

R² = 0.77 → 77% of variation in HbA1c is explained by glucose.

Types of regression

Linear regression → straight line (book example)
Logistic regression → outcome has 2 categories (e.g., diseased vs not)
Poisson regression → rare events / waiting times
Cox regression → time-to-event (survival) analysis

Warnings

You can only predict within the range of your data (no extrapolation).
Regression requires a logical directional relationship (x should come before y).

Summary Table (Based on the Book)

Concept	Meaning	When to Use	Key Output
Correlation	Measures strength of linear association	When seeing whether two numerical variables move together	r (from –1 to +1)
Regression	Quantifies and predicts the relationship	When you want a formula to predict y from x	Regression line: y = a + bx; b, a, R²

Correlation and Regression

5. Statistics Which Analyze Relationships

1. CORRELATION

What it is

When you use it

Key term: Correlation coefficient (r)

How to interpret r (book rule-of-thumb)

Example from the book

Spearman vs Pearson

Important warning

R² (coefficient of determination)

2. REGRESSION

What regression does

Regression line

Book example

What R² means in regression

Types of regression

Warnings

Summary Table (Based on the Book)

Correlation and Regression

5. Statistics Which Analyze Relationships

1. CORRELATION

What it is

When you use it

Key term: Correlation coefficient (r)

How to interpret r (book rule-of-thumb)

Example from the book

Spearman vs Pearson

Important warning

R² (coefficient of determination)

2. REGRESSION

What regression does

Regression line

Book example

What R² means in regression

Types of regression

Warnings

Summary Table (Based on the Book)