Module Aims:
This core module aims to provide students with the necessary knowledge and biostatistical skills to be able to interpret and conduct basic statistical analyses of population health data. Students may choose to take this, or Statistics for HDS
Module Learning Outcomes:
By the end of the module, students should be able to:
- Understand sampling variation in the context of population health studies
- Use R to manipulate data and to apply and interpret the output of commonly used statistical procedures
- Select, perform and interpret appropriate descriptive analyses of population health data
- Select, apply and interpret common regression models for the statistical analysis of population health data
- Perform standard sample size and power calculations
Pre-requisites:
Fluent numeracy and a good understanding of elementary algebra (e.g. rearranging equations, graphical interpretation of a linear equation in two variables, simultaneous linear equations when the solution is unique, quadratic equations), logarithms (and performing operations on logarithmic scale), summation notation (∑), and probability (including performing simple probability operations). Familiarity with scientific notation, and with performing automated calculations (e.g. in excel, R, or equivalent).
Teaching Strategy:
The module will be delivered using a combination of lectures, class discussion, small-group exercises, and computer practical. Some reading may be required prior to some sessions.
Assessment:
Written assessment at end of the module (50% of module grade).
A take-home assignment involving a dataset and a structured analysis plan, with students providing code, tables/figures/results, and written interpretation (50% of module grade).
Session List:
- Introduction to biostatistics (overview of biostatistics, statistics for research in the biomedical sciences, sampling variation, structures of datasets, types and units of variables)
- Introduction to R (basic operations, read in data from excel, use of R libraries)
- Descriptive statistics (measures of location and spread, standard deviation and standard errors, histograms, bar charts and box plots)
- Normal distribution and confidence intervals (small and large samples, normal distributions, assessing normality, 95% confidence intervals, log transformations)
- Hypothesis tests and p-values (one-tailed and two-tailed tests, performing and interpreting hypothesis tests, random error and chance, type I and II errors)
- Comparison of continuous variables between two groups (comparison of paired and unpaired groups, z-statistics, t-statistics and t-tests)
- Comparison of categorical variables between two groups (contingency tables, χ2 test, Yate’s continuity correction, Fisher’s exact test, χ2 test for trend)
- Comparison of variables between two groups using distribution free methods (non-parametric vs parametric tests, rank-based methods)
- Correlation and simple linear regression (scatter plots, correction coefficients, linear regression)
- Multiple linear regression with several continuous exposures (linear predictor, non-linear exposures, line of best fit, regression dilution)
- Multiple linear regression with binary and categorical exposures (ANOVA, linear regression with unordered and ordered categorical exposures, interactions)
- Using multiple linear regression in practice (model building, proportion of variance explained, assessing model fit, residuals)
- Survival analysis (time-to-event data and censoring, Kaplan-Meier, Cox regression)
- Logistic Regression (odds ratios, interpretation)
- Strategies for Analysis (analysis plans, sample size and power calculations, published statistical guidelines for analysis)
- Common pitfalls in medical statistics
Module Length: 8 days over 4 weeks