Physical traits can be predicted with whole-genomic sequence and machine learning data
A research team from the Human Longevity, Inc. conducted a study that revealed machine learning and whole-genome sequencing can be utilized in the prophecy of individual faces and additional physical features. The lead author, Christoph Lippert, and the senior author, J. Craig Venter, mentioned that this research provides inventive techniques for forensics; and has noteworthy end results for data privacy, de-identification, and suitably informed consent. They deduced that noticeably larger public consideration is necessary as gradually more genomes are produced and accumulated in public databases.
The study permitted by Institutional Review Board consisted of 1061 study volunteers, in the range of ages between 18 and 82, of diverse ethnic groups, and whose genomes were sequenced at a minimum 30x depth. The phenotype data of these volunteers were gathered in the form of skin & eye color, age, height, 3-D facial images, weight, and voice samples.
The researchers could correctly predict the eye color, sex, and skin color; however, they faced problems while evaluating other intricate genetic features. Huge cohorts were needed by these researchers to enhance the efficiency of predicting, although their predictive models were efficient.
The team developed a novel algorithm, called as maximum entropy algorithm, to uncover the most favorable predictive model combinations so as to compare the whole-genome sequencing information with phenotypic and demographic information. Roughly, 8 out of 10 study volunteers of diverse ethnic groups and 5 out of 10 European or Afro-American study volunteers were recognized appropriately by this algorithm.
Venter, the Co-founder of Human Longevity, mentioned, “We intended to do this research to confirm that the genome is what codes for all that makes you, you. This is evidently a proof of concept with a small cohort but we consider that as we amplify the numbers of individuals in this research and in the database of Human Longevity, Inc. to hundreds of thousands, it would enable us to precisely guess all that can be estimated from the genomes of individuals.”
He further stated that the scientific community and the general public as well were not too apprehensive about the need for policies and protections for genomic data confidentiality of an individual and stressed on better technical solutions, in-depth analysis, and continued discussion.
According to a data scientist at Human Longevity, Inc., Lippert, this study signifies the effectiveness of imaging methods utilized in assessing the traits of more number of individuals. Machine learning holds a crucial part in scientific discovery and enables entire automated data interpretation.