|
HOW COMPUTER SCIENCE METHOD OF MACHINE LEARNING CAN BE APPLIED TO THE GENETICS PROBLEM
24 December 2006 - Duke University
| “If I tell you there’s an island the size of Greenland, and I have buried 600 treasure chests somewhere on the island, you know nothing,” Hartemink explains. “We’ve identified genetic regions, or parts of the landscape, that are more likely to be where the ‘treasure’ of imprinted genes is buried. In that sense it’s like a treasure map.” |
Their map, Jirtle says, reduces from tens of thousands to only 600 the number of genes likely to be imprinted. “I can handle 600,” he says. “I can’t handle 25,000.” For years, Jirtle had been investigating these curious genes whose pattern for turning on and off differs from the dominant/recessive model of classical genetics. During the formation of the sperm and egg, a molecular process called methylation imprints the genes with a mark that silences the copy coming from either the mother or father. The silenced copy of the gene is unavailable to compensate for possible flaws in the active copy, including flaws that may lead to disease. In his lab, Jirtle had identified imprinted genes in mice and sheep and the corresponding genes in humans. In one project, his lab determined that a rare characteristic in sheep, unusually big and muscular bottoms, was caused by an imprinted gene. Such discoveries of individual imprinted genes were the essential first step to understanding the genes’ broader role in diseases. But Jirtle found it prohibitively difficult to identify them more widely across the entire genome. Meanwhile, Hartemink, who graduated Duke in 1994 with degrees in mathematics, physics and economics, had returned to campus as a computer science professor. In collaborations with biology researchers, he was exploring ways the computer science method of “machine learning” could be used to find patterns in biological systems. In the spring of 2003, Hartemink and Jirtle began exploring how they might work together. Hartemink was teaching a course on “computational functional genomics” and asked the students whether anyone wanted to help him and Jirtle tackle the challenge of analyzing vast sequences of genetic information and other biological data to identify imprinted genes. “I knew pretty much from the beginning I wanted to do it,” Luedi said about choosing the project. “It combined my interest in sequence analysis and statistics." Working with Jirtle, Luedi compiled data on mouse genes that are known to be imprinted. Then, working with Hartemink, he came up with an algorithm that could distinguish imprinted genes from non-imprinted ones. “It involves a classification of ‘yes’ or ‘no’, ‘imprinted’ or ‘not imprinted,’” he said. “The statistics community calls that ‘regression;’ computer science calls it ‘machine learning.’” Next, they ran the algorithm against the entire mouse genome, most of whose genes’ imprinted status is unknown. That analysis identified 600 likely-imprinted genes out of 23,788. In the final step, the researchers looked for human genes that are located in regions of the human genome believed to contribute to certain diseases and that correspond to the mouse genes they predicted to be imprinted. The result is a paper that opens new avenues for scientists to decode the mysteries of imprinted genes, which have become the focus of intense scientific inquiry worldwide. “These collaborations are really truly collaborations because neither group would have been able to pull it off alone,” Jirtle said. Hartemink, Jirtle and Luedi are not the only scientists at Duke combining genetics experiments and computational methods. Part two of this story explores how the merger of math and genetics affects how young scientists are trained.
http://www.duke.edu
About: Duke University
Tracing its origins to a rural schoolhouse in 1838, Duke University has evolved into one of the world's leading institutions for education, research and medical care. |
More News:
For December 2006
From Duke University
For University
|