Applied Machine Learning in Bioinformatics
Module Aims: This module aims to link the fundamental concepts presented in “Introduction to Machine Learning” to practical examples frequently encountered in Health Data Science and, in parallel, introduce some advanced elements of previously discussed canonical methods.
Module Learning Outcomes: By the end of the course, students should be able to:
- Describe and evaluate several canonical machine learning methods and feature selection processes (including assumptions, algorithms and examples)
- Identify and apply appropriate machine-learning methods (and critically compare the stability of results obtained using standard approaches) to solve a range of inferential and prediction problems
- Recognise contexts in which versions of algorithms for inference and prediction derived from flexible high dimensional models can offer advantages over classical implementations or statistical methods
- Have a wider understanding of the information which can be derived from preliminary data mining, reinforcement learning or heuristics and apply flexible modelling on real world examples
- Identify useful objective function penalisations corresponding to a variety of structural assumptions
- Interpret the output of machine learning algorithms in the context of the underlying modelling assumptions.
Pre-requisites: Statistics for HDS, Introduction to Machine Learning, Advanced Biostatistics for HDS
Teaching Strategy: Lectures and computer practicals. Some preliminary reading may be required.
Assessment: Practical analysis with two-stage assessed report consisting of answers to parts 1 and 2 of the assignment. For part 1, worth 50% of the module grade, several datasets/examples will be presented covering various real-life datasets e.g. sequencing data, genomics data, imaging data. Students will employ various data mining approaches to determine the features of the datasets (e.g. noisy data but fully labelled, data with missing values, partially labelled data, sequencing data, imaging data), and submit their comments.
Students will subsequently choose one dataset for the second part of the assignment. Upon the selection of a dataset, a predefined set of questions will be forwarded; part 2 of the task will comprise of the assessment of 2-3 ML approaches (25%) and an open-ended analysis (25%).
Module Length and Dates: 4 days