Abstract
Data-driven biology has the potential to revolutionise our understanding of health, disease, and the environment; as the scale and scope of the data we collect grows, an increasing need for tools which improve the annotation of this data exists.Increasingly, research into the microbiome is investigated on a global scale to identify trends which impact our health and the environment. To enable the findings from this research to be reproducible, geographical metadata annotation should be consistent across studies - which is demonstrably not the case in many past environmental microbiome studies. The OMEinfo tool presents a solution to this issue by enabling consistent annotation of features including rurality and population density.
The COVID-19 pandemic, caused by the SARS-CoV-2 virus, continues to pose a threat to health, and a structural and functional understanding of its evolution can help monitor emerging variants. While much research into the structure and function of the virus exists, few tools exist to transfer this knowledge to sequenced genomes. The presented tool SPEAR solves this by providing a point-of-sequencing annotation pipeline.
Enzymes are biological catalysts used in most metabolic processes, acting upon ligands, but are often not structurally characterised with the ligands they act upon in vivo (the cognate ligands), instead using analogues or inhibitors to aid crystallisation. A database mapping ligands in structures to their appropriate cognate ligands and domains of the enzyme involved is needed. The ProCogGraph database provides this information enzyme structures, integrating ligands and reactions from a diverse range of resources. Applications of this are explored through annotation of the cognate ligand binding potential of metatranscriptomes and predicted structures of enzymes.
In summary, this thesis identifies gaps in the annotation of biological data at three scales - global, virus and protein - and describes the development of tools to solve this: SPEAR, ProCogGraph and OMEinfo.
Date of Award | 27 Feb 2025 |
---|---|
Original language | English |
Awarding Institution |
|
Supervisor | Matthew Bashton (Supervisor) |
Keywords
- COVID-19
- Protein Domains
- Ligand interactions
- Geospatial Metadata
- Analysis Pipelines