This module aims to provide the understanding, specialist knowledge and practical skills required to analyse large genomic datasets to answer questions about the genetic contribution to human health and disease. The module will particularly focus on processing, annotating and interpreting short-read data, and applying a number of different analytical approaches to both rare and common genetic variation in the context of different diseases and traits.
Module Learning Outcomes:
By the end of this module, students should be able to:
- Understand how high-throughput sequencing (HTS) data are generated, and the related quality control processes
- Analyse and interpret HTS data using command-line programming and bioinformatic approaches
- Use appropriate publicly available resources to annotate genetic variation and interpret it in the context of the patient’s disease and clinical presentation
- Understand how the HTS data are utilised for providing genetic diagnoses for patients with rare disease in clinical diagnostic setting
- Understand how genetic associations with rare and common diseases are discovered via cohort studies and different analytical approaches
- Understand and appraise the purpose, current benefits, and future opportunities of the large genomic datasets generated by sequencing the genomes and deep phenotyping of both patient and population-based cohorts
- Appropriately apply different analytical approaches to rare genetic variation
- Develop basic practical skills in the analysis of genomic data in a Cloud computing environment
Understanding of: the DNA structure, transcription and translation; different types of genetic variation (e.g. single nucleotide substitutions, whole gene deletions, chromosomal translocations) and their functional effects; common vs rare variants; inheritance patterns (e.g. dominant, recessive, x-linked); linkage disequilibrium and haplotypes. Ideally students would have completed the Genetic Epidemiology Module.
The practical aspect of this module will involve linux-based command-line software, and it is expected that students have completed linux training and are comfortable with writing commands. Students are encouraged to refresh their linux skills, e.g. by working through some of the many online tutorials (e.g. https://tutorials.ubuntu.com/tutorial/command-line-for-beginners#0; http://www.ee.surrey.ac.uk/Teaching/Unix/index.html). Note that all necessary commands will be provided in the practical sessions.
Students are required to do 2-3 hours preparatory work for each session (self-directed learning). They will be told in advance of any software or applications that they should download in preparation for each session.
The sessions will be a mixture of interactive lectures/workshops/group discussions used to consolidate and expand the knowledge gained during the preparatory work, and hands-on genomic data analysis practicals. Each session will end with a 5-10min primer for the following session and the associated preparatory material. The module will use only publicly available resources, databases and software.
Group presentation, 40%: Groups of 3-4 students work on a practical sequence data analysis and interpretation project and make a 20min presentation during the last session. Group marks are given for the content of the overall presentation, answering the research question and responding to questions during a Q&A session following the presentation.
Individual technical report, 60%: Each student submits up to 1000-word technical report on a set task involving genomic data processing, troubleshooting and analysis. Detailed instructions will be provided in a worksheet.
Module Length: 4 days