Data standardization in the omics field

Chimusa Emile Rugamika, Judit Kumuthini, Lyndon Zass, Melek Chaouch, Zoe Gill, Verena Ras, Zahra Mungloo-Dilmohamud, Dassen Sathan, Anisah W. Ghoorah, Faisal Fadlelmola, Christopher Fields, John Van Horn, Fouzia Radouani, Melissa Konopko, Shakuntala Baichoo

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review


In the past decade, the decreased cost of advanced high-throughput technologies has revolutionized biomedical sciences in terms of data volume and diversity. To handle the sheer volumes of sequencing data, quantitative techniques such as machine learning have been employed to handle and find meaning in these data. The need for the integration of complex and multidimensional datasets poses one of the grand challenges of modern bioinformatics. Integrating data from various sources to create larger datasets can allow for greater knowledge transfer and reuse following publication, whether data are submitted to a public repository or shared directly. Standardized procedures, data formats, and comprehensive quality management considerations are the cornerstones of data integration. Combining data from multiple sources can expand the knowledge of a subject. This chapter discusses the importance of incorporating data standardization and good data governance practices in the biomedical sciences. The chapter also describes existing standardization resources and efforts, as well as the challenges related to these practices, emphasizing the critical role of standardization in the omics era. The discussion has been supplemented with practical examples from different “omics” fields.
Original languageEnglish
Title of host publicationGenomic Data Sharing: Case Studies, Challenges, and Opportunities for Precision Medicine
EditorsJennifer Mccormick, Jyotishman Pathak
ISBN (Print)9780128198032
Publication statusPublished - Jan 2023

Cite this