Part 1 obgyn notes Sri Lanka
    NOTES for part 1
    /
    statistics
    /
    Correlation and Regression

    Correlation and Regression

    Owner
    U
    Untitled
    Verification
    Tags

    5. Statistics Which Analyze Relationships

    1. CORRELATION

    What it is

    Correlation tells you how strongly two numerical variables move together.

    Examples:

    • Height ↑ → Weight ↑
    • Socioeconomic class ↑ → Mortality ↓

    When you use it

    Use correlation when you want to know if two variables have a linear relationship.

    (Not to find cause—only association.)

    Key term: Correlation coefficient (r)

    The coefficient is written as r.

    • r = +1 → perfect positive linear relationship
    • r = –1 → perfect negative linear relationship
    • r = 0 → no linear relationship at all

    How to interpret r (book rule-of-thumb)

    • 0 to 0.2 → very low, probably meaningless
    • 0.2 to 0.4 → low correlation
    • 0.4 to 0.6 → reasonable correlation
    • 0.6 to 0.8 → high correlation
    • 0.8 to 1.0 → very high (check for errors or duplication!)

    Applies equally to negative values.

    Example from the book

    A nurse compared fasting glucose vs HbA1c in 12 diabetics.

    Scatter plot showed a straight-line trend → r = 0.88, meaning:

    • Very high positive correlation
    • As glucose increases, HbA1c increases

    Another example:

    Activity level vs BMI → r = –0.34

    • Low negative correlation
    • Higher activity → slightly lower BMI

    Spearman vs Pearson

    • Pearson’s r: if data are normally distributed
    • Spearman’s r (rs): if skewed
    • (Study used Spearman because prescribing data were skewed.)

    Important warning

    Correlation ≠ causation.

    R² (coefficient of determination)

    If r = –0.88 → R² = 0.77

    Meaning 77% of variation in HbA1c is explained by glucose variation.

    2. REGRESSION

    Regression is correlation plus prediction.

    image

    What regression does

    It draws a best-fit line through the scatter plot to quantify how much one variable changes when the other changes.

    Regression line

    Written as:

    y = a + b x

    Where:

    • a = regression constant (where line hits vertical axis)
    • b = regression coefficient (slope)

    b tells you how much y changes when x increases by 1 unit.

    Book example

    Using the same glucose–HbA1c data:

    Regression equation:

    HbA1c = 3.2 + (0.45 × glucose)

    Meaning:

    • Every 1 mmol/L increase in glucose → HbA1c rises by 0.45%
    • If glucose = 15 mmol/L → predicted HbA1c = 9.95%

    This is exactly the graph shown in the book (Fig. 12).

    What R² means in regression

    R² = 0.77 → 77% of variation in HbA1c is explained by glucose.

    Types of regression

    • Linear regression → straight line (book example)
    • Logistic regression → outcome has 2 categories (e.g., diseased vs not)
    • Poisson regression → rare events / waiting times
    • Cox regression → time-to-event (survival) analysis

    Warnings

    • You can only predict within the range of your data (no extrapolation).
    • Regression requires a logical directional relationship (x should come before y).

    Summary Table (Based on the Book)

    Concept
    Meaning
    When to Use
    Key Output
    Correlation
    Measures strength of linear association
    When seeing whether two numerical variables move together
    r (from –1 to +1)
    Regression
    Quantifies and predicts the relationship
    When you want a formula to predict y from x
    Regression line: y = a + bx; b, a, R²