Abstract
Systems genetics is the interdisciplinary field which deals with the consequences of genetic variation on all biomolecular levels of a biological system. The aim of systems genetics is to understand biological systems by partitioning variation into three major categories: genetic, environment and error variation, and explain how complex phenotypes arise from a combination of these three major factors across biomolecular levels.
Currently, naturally occurring genetic variation (or genetic pertubation) can be used to interrogate the genetic basis of phenotype variation on many biomolecular levels such as: genetics, transcriptomics, proteomics and metabolomics. Combined with environmental pertubation we can investigate the influence of different environments and the interaction between genetics and environment. Experimental design and advanced statistics are used to minimize and estimate error variance. To investigate all these factors influencing biological systems it is necessary to collect huge data sets on many individuals, many tissues, at all known biomolecular levels.
Modern high-throughput technologies generate large amounts of genomic,
transcriptomic, proteomic and metabolomic data, creating a major challenge in
bioinformatics because of the size of data collected and the multitude of technologies used. This thesis will highlight our solutions to the ‘Big Data’ challenges in systems genetics. We propose to develop smarter more optimized algorithms such as Pheno2Geno or Multiple QTL mapping, and to use a collaborative approach such as xQTL workbench to store and analyse high-throughput systems genetics data.
Chapter 1 contains an introduction to systems genetics, and highlights the challenges that inspired this thesis. These challenges, such as the massive increase in data production, cause an increasing complexity when diagnosing patients, or selecting crops for optimal yield.
Chapter 2 shows Pheno2Geno, an R package that deals with the creation of genetic maps from large scale omics data. The theory behind genetic map construction is around 100 years old. Most software was written in the 1980s, and software available for genetic map construction has not been adapted yet to make use of new technologies such as multithreading or cluster computing. Pheno2Geno aims to provide analysis of data from tiling arrays and/or RNA-Seq to generate gene based expression markers (GEMs) and create high density genetic maps.
Chapter 3 describes the implementation of the Multiple QTL mapping (MQM) routine into R/qtl, adding a ‘new’ algorithm to the R/qtl toolset to provide a wider range of QTL mapping tools for inbred crosses. R/qtl is the basis of a toolset built around a unified data structure allowing easy adaptation and extension of the software. R/qtl allows researchers to analyse data from different sources, and to quickly compare different approaches. This chapter showcases our contributions to the R/qtl package such as: MQM, visualizations, parallel computation of QTLs and an improved permutation scheme.
Chapter 4 describes our current work on using differences in correlation to generate interaction networks and detect cell type specific QTL effects. Correlated Traits Locus analysis (or CTL mapping) enables researchers to find genetic loci controlling correlation differences in segregating phenotypes. A variation on this method has proven valuable in discovering cell type specific eQTL effects. Using these effects it is possible to untangle cell mixtures seen in whole blood.
Chapter 5 details our work to provide computational infrastructure for the Life Sciences. Our system xQTL workbench is currently being used as a back end to the WormQTL and WormQTL-HD database. xQTL workbench allows users to store and share their data in a local or web environment, and run analysis across data sets using the power of distributed computing. It comes standard with QTL mapping tools such as: R/qtl, PLINK and qtlbim but also provides a web inferface, data importers, APIs and visualizations.
I trust you enjoy reading this thesis as much as I enjoyed creating it during the last four years
Currently, naturally occurring genetic variation (or genetic pertubation) can be used to interrogate the genetic basis of phenotype variation on many biomolecular levels such as: genetics, transcriptomics, proteomics and metabolomics. Combined with environmental pertubation we can investigate the influence of different environments and the interaction between genetics and environment. Experimental design and advanced statistics are used to minimize and estimate error variance. To investigate all these factors influencing biological systems it is necessary to collect huge data sets on many individuals, many tissues, at all known biomolecular levels.
Modern high-throughput technologies generate large amounts of genomic,
transcriptomic, proteomic and metabolomic data, creating a major challenge in
bioinformatics because of the size of data collected and the multitude of technologies used. This thesis will highlight our solutions to the ‘Big Data’ challenges in systems genetics. We propose to develop smarter more optimized algorithms such as Pheno2Geno or Multiple QTL mapping, and to use a collaborative approach such as xQTL workbench to store and analyse high-throughput systems genetics data.
Chapter 1 contains an introduction to systems genetics, and highlights the challenges that inspired this thesis. These challenges, such as the massive increase in data production, cause an increasing complexity when diagnosing patients, or selecting crops for optimal yield.
Chapter 2 shows Pheno2Geno, an R package that deals with the creation of genetic maps from large scale omics data. The theory behind genetic map construction is around 100 years old. Most software was written in the 1980s, and software available for genetic map construction has not been adapted yet to make use of new technologies such as multithreading or cluster computing. Pheno2Geno aims to provide analysis of data from tiling arrays and/or RNA-Seq to generate gene based expression markers (GEMs) and create high density genetic maps.
Chapter 3 describes the implementation of the Multiple QTL mapping (MQM) routine into R/qtl, adding a ‘new’ algorithm to the R/qtl toolset to provide a wider range of QTL mapping tools for inbred crosses. R/qtl is the basis of a toolset built around a unified data structure allowing easy adaptation and extension of the software. R/qtl allows researchers to analyse data from different sources, and to quickly compare different approaches. This chapter showcases our contributions to the R/qtl package such as: MQM, visualizations, parallel computation of QTLs and an improved permutation scheme.
Chapter 4 describes our current work on using differences in correlation to generate interaction networks and detect cell type specific QTL effects. Correlated Traits Locus analysis (or CTL mapping) enables researchers to find genetic loci controlling correlation differences in segregating phenotypes. A variation on this method has proven valuable in discovering cell type specific eQTL effects. Using these effects it is possible to untangle cell mixtures seen in whole blood.
Chapter 5 details our work to provide computational infrastructure for the Life Sciences. Our system xQTL workbench is currently being used as a back end to the WormQTL and WormQTL-HD database. xQTL workbench allows users to store and share their data in a local or web environment, and run analysis across data sets using the power of distributed computing. It comes standard with QTL mapping tools such as: R/qtl, PLINK and qtlbim but also provides a web inferface, data importers, APIs and visualizations.
I trust you enjoy reading this thesis as much as I enjoyed creating it during the last four years
Original language | English |
---|---|
Place of Publication | Groningen |
Publisher | Danny Arends |
Number of pages | 169 |
Volume | 1 |
Edition | 1 |
ISBN (Electronic) | 978-90-367-7209-9 |
ISBN (Print) | 978-90-367-7210-5 |
Publication status | Published - 2014 |