The Bias in Genomic Databases
Since the Human Genome Project’s completion, scientists have dedicated efforts to populating and updating many databases that serve as the foundation for genomics research [7]. While these databases store a plethora of information, most of the data is biased towards European descent. Though European ancestry makes up only about 16% of the world’s population, approximately 80% of Genome-Wide Association Studies (GWAS) are conducted on the European population [10]. This small cohort size restricts scientific advancements, and the level of understanding scientists can reach in genomic research. Though European-based data serves as a tremendous comparative source and basis for studying other populations worldwide, diversity is critical to reduce pertinent health disparities and provide novel insights into the genetics of diseases.
The predominantly European population represented in genomic data limits the degree of successful genetic diagnosis and therapeutic choices available for the global population [5]. The narrow sample variety can increase the margin of error and reduce the study’s significance. For example, C-reactive protein (CRP) is a blood-based biomarker utilized in the detection and treatment of inflammation-related diseases such as infections, lupus, and Rheumatoid Arthritis (RA) [3]. A genetic variant that lowers CRP level had been found to be more common in African ancestry. Therefore, one could be suffering from RA but left undiagnosed or untreated as the CRP levels are below the diagnostic criteria. The underrepresentation of all ethnic populations may lead to undetected gene-disease relationships for which the outcome is not as significant in Europeans [4]. Further, it was found that across the 26 traits, the effect size, or strength correlation, was approximately 58% in African Americans, suggesting that predictions were less accurate in the minority population as opposed to European Americans [3]. Whether due to inadvertent selection bias or other factors, the lack of population diversity can also cause problems for the translation of genetic research into clinical practice and the fields of precision medicine [8].
Although genetic and genomic data collected primarily from European ancestry can serve as a good reference, applying that data to non-European populations will lead to weaker predictions, misidentifications, or omissions [3]. For example, a study done by Matoba et al. found that there was very little correlation of genome-wide genetic effects for smoking behaviors across East Asian and European samples [6]. Though most of the smoking-related single-nucleotide polymorphism (SNPs) identified in the study were shared between East Asian and European populations, two SNP variations were found to have a higher frequency in the Japanese population [10]. In this case, while the European samples could have served as a base for interpreting Japanese smoking-related genomic data, if the East Asian and European samples were not comparatively studied, the two SNP exceptions would not have been identified.
Collecting data from all ethnic groups will help reduce pertinent health disparities. For example, kidney disease has been known to have a higher prevalence among those with African ancestry. Variants in the APOL1 gene are strongly associated with the risk of progressive chronic and end-stage kidney disease [9]. While not much is entirely certain about the relationships between the variants and illnesses in the African American population, uncovering the genetic risk factor will lead to significant advances in understanding kidney disease’s pathophysiology [1]. The already observed ethnic disparities in kidney translation outcomes attributed to this variant could be reduced if more diverse populations are studied.
Inclusivity in scientific studies will allow for new insights into the genetics of diseases. PCSK9 inhibitors, a new class of cholesterol-lowering drugs, were developed due to genomic analysis of people with African Ancestry [2]. The study found a single non-functional copy of the PCSK9 gene was associated with low levels of cholesterol. Had researchers not analyzed a diverse ethnic cohort, this finding might not have occurred.
Many efforts are underway to combat these disparities. The 1000 Genomes Project has cataloged SNPs and variations in DNA, seen in at least 1 percent of 26 unique world populations, ranging from East and South Asian to the African population [7].
An expansion of research data will allow for genomic science to contribute to humanity’s welfare in all, rather than being a source of the exacerbation of ethnic disparities. An increase in genetic and genomic data diversity will not only further scientific research and clinical application but also close the existing gap and disparity in medicine, serving a more significant cause for mankind.
Works Cited
- Bentley, Amy R, et al. “Diversity and Inclusion in Genomic Research: Why the Uneven Progress?” Journal of Community Genetics, Springer Berlin Heidelberg, Oct. 2017, www.ncbi.nlm.nih.gov/pmc/articles/PMC5614884/.
- Genomics, Nebula, and About The Author Nebula Genomics. “Increasing Diversity in Genomic Research.” Nebula Genomics Blog, 7 Jan. 2021, nebula.org/blog/increasing-diversity-in-genomic-research/.
- “Lack of Diversity in Genetic Research a Problem.” Fred Hutch, 19 June 2019, www.fredhutch.org/en/news/center-news/2019/06/lack-diversity-genetic-research-problem.html.
- Landry, Latrice G., et al. “Lack Of Diversity In Genomic Databases Is A Barrier To Translating Precision Medicine Research Into Practice.” Health Affairs, vol. 37, no. 5, 2018, pp. 780–785., doi:10.1377/hlthaff.2017.1595.
- Lewis, Ricki. “Are Eurocentric Genetic Databases Hampering Health Care?” DNA Science, 18 May 2020, dnascience.plos.org/2019/03/21/are-eurocentric-genetic-databases-hampering-health-care/.
- Matoba, Nana, et al. “GWAS of Smoking Behaviour in 165,436 Japanese People Reveals Seven New Loci and Shared Genetic Architecture.” Nature News, Nature Publishing Group, 25 Mar. 2019, www.nature.com/articles/s41562-019-0557-y.
- “Opinion: Greater Diversity Is Needed in Human Genomic Data.” The Scientist Magazine®, www.the-scientist.com/critic-at-large/diversify-our-human-genomic-data-66308.
- Popejoy, Alice B. “Diversity In Precision Medicine And Pharmacogenetics: Methodological And Conceptual Considerations For Broadening Participation.” Pharmacogenomics and Personalized Medicine, Dove, 14 Oct. 2019, www.ncbi.nlm.nih.gov/pmc/articles/PMC6800456/
- Sirugo, Giorgio, et al. “The Missing Diversity in Human Genetic Studies.” Cell, vol. 177, no. 4, 2019, p. 1080., doi:10.1016/j.cell.2019.04.032.
- “Whose Genomics?” Nature News, Nature Publishing Group, 14 May 2019, www.nature.com/articles/s41562-019-0619-1.