This module aims to provide the understanding, specialist knowledge and practical skills required to analyse large genomic datasets to answer questions about the genetic contribution to human health and disease. The module will particularly focus on processing, annotating and interpreting exome and whole genome data, and applying a number of different analytical approaches to both rare and common genetic variation in the context of different diseases and traits.
Module Learning Outcomes:
By the end of this module, students should be able to:
- Understand how high-throughput sequencing (HTS) data are generated, and the related quality control processes
- Analyse and interpret HTS data using command-line programming and bioinformatic approaches
- Use appropriate publically available resources to annotate and interpret genetic variation
- Understand commonalities and differences in how the HTS data are utilised for providing genetic diagnoses for patients in clinical diagnostic setting, versus answering research questions and generating new knowledge
- Understand and appraise the purpose, current benefits, and future opportunities of the large genomic datasets such as Genomics England 100K Genomes Project (rare disease and cancer patients), and the UK Biobank 500K resource (population-based deeply phenotyped cohort)
- Critically appraise published analyses of large genomic datasets
- Appropriately apply different statistical approaches to common and rare genetic variation
- Develop basic practical skills in the analysis of genomic data either on high-performance computer cluster, or in the Cloud environment
Understanding of: the DNA structure, transcription and translation; different types of genetic variation and their functional effects; common vs rare variants; inheritance patterns; linkage disequilibrium and haplotypes. Ideally students would have completed the Genetic Epidemiology Module.
The practical aspect of this module will involve linux-based command-line software, and it is expected that students have completed linux training and are comfortable with writing commands. Students are encouraged to refresh their linux skills, e.g. by working through some of the many online tutorials (e.g. https://tutorials.ubuntu.com/tutorial/command-line-for-beginners#0; http://www.ee.surrey.ac.uk/Teaching/Unix/index.html). Note that all necessary commands will be provided in the practical sessions.
Students are required to do 2-3 hours preparatory work for each session (self-directed learning). They will be told in advance of any software or applications that they should download in preparation for each session.
The sessions will be a mixture of interactive lectures/workshops/group discussions used to consolidate and expand the knowledge gained during the preparatory work, and hands-on genomic data analysis practicals. Each session will end with a 5-10min primer for the following session and the associated preparatory material. The module will use only publicly available resources, databases and software.
Group presentation, 40%: Groups of 3 or 4 students work on a project and make a presentation on a selected research topic during the last session. The presentation consists of each student introducing their own contribution, and the group drawing these together to explain how their work has addressed the overall task. Group marks are given for the content of the overall presentation, answering the research question and responding to questions during a 10-minute Q&A session following the presentation. For the overall presentation mark, 50% is based on the average of the marks given by other students on the module, and 50% is awarded by the module leader or a similarly qualified assessor appointed by them.
Individual technical report, 60%: Each student submits a 1000-word technical report on their group project a week after the end of the module. The report will include a brief introduction to the context and methods used, as well as detailed results and their interpretation.
Module Length: 4 days