New Map Reveals Dynamic Variation in Human Genome
HHMI Bullettin, May 1, 08

A team of researchers led by Howard Hughes Medical Institute investigator Evan E. Eichler at the University of Washington has produced the first high-resolution map showing the structural variation that exists in the human genome. With the map, researchers can now begin to see how the underlying structure of one person's genome differs from that of another.

Eichler and a team of 45 colleagues examined the complete DNA sequences of eight people: four of African descent, two of Asian descent, and two of western European descent. They compared the DNA sequence of those eight people to the DNA sequence derived from the Human Genome Project, which is known as the reference sequence.

The resulting picture of the genome is much more complex than geneticists envisioned just a few years ago, but this complexity is vital. "These are the details we need to find further associations with disease," Eichler said.

In a research article published on May 1, 2008, in the journal Nature, Eichler and his colleagues analyzed long variant stretches of DNA ranging from a few thousand to a few million base pairs in length. The new information uncovered in the studies will help researchers understand how humans are genetically different from one another. "This is the first time that individual variants have been comprehensively cloned and sequenced to high quality," said Eichler. "That information suggests mechanisms for genetic change that previously could not be inferred."

Geneticists traditionally have focused on changes in single "letters" - or base pairs - of a DNA sequence. But in recent years, several groups of researchers - including Eichler's group - have demonstrated that some of the most important genetic differences between humans involve larger segments of DNA.

"Structural changes - insertions, duplications, deletions, and inversions of DNA - are extremely common in the human population," says Eichler. "In fact, more bases are involved in structural changes in the genome than are involved in single-base-pair changes."

In various parts of our genome, some people have segments of DNA sequence that other people do not have. In other parts of our DNA, large genetic regions may be flipped in one person compared with another. These genetic differences can influence a person's susceptibility to heart disease, autism, lupus, HIV infection, and many other diseases.

Across all nine genomes analyzed in the Nature article, the researchers found 1,695 regions where people had DNA insertions, deletions, or inversions more than about 6,000 base pairs long. In some of these locations, all nine of the genomes were structurally different. At other sites, just one or a few people had structural variants.

The new analyses showed that the eight new genomes studied also have 525 segments of DNA that are not in the original reference genome. The size of those segments range from a few thousand to 130,000 base pairs. "These results strongly argue that the human genome sequence is still incomplete," Eichler and his colleagues write in their paper. The authors suggest that it will be necessary to sequence additional genomes to fill the remaining gaps.

Taking a closer look, the researchers sequenced 261 of the structural variants letter by letter. In these regions, they found many structural variants smaller than 6,000 base pairs, ranging all the way down to insertions, deletions, or inversions of just a few base pairs. For unknown reasons, some parts of our genome are much more variable than others. For example, areas containing genes that are involved in the structural integrity of the body, such as the skin or the lining of the gut, are remarkably diverse. "That was surprising to us," Eichler said.

An analysis of the 1,695 variable regions reveals that many occur where segments of DNA are repeated. These repeated segments of DNA have a tendency to misalign during the process that produces sperm and egg cells, which results in insertions and deletions of DNA. "Roughly half of the insertions and deletions appear to be caused by that mechanism," said Jeffrey M. Kidd, a graduate student in Eichler's laboratory who was the lead author of the paper. "To assign these events to misalignments, you need high-quality data."

Understanding these mutational processes will be critical as an increasing number of human genomes are sequenced. "2008 will be a big year for sequencing genomes," Eichler said. The eight people that Eichler's team studied are part of a much larger group whose genomes will be sequenced as part of the 1,000 Genomes Project, an international effort to sequences the genomes of people from around the world. "Having eight new genomes like this will allow us to benchmark what we can and cannot detect."

Understanding structural variation also is essential in developing new technologies designed to detect the genetic differences among people. For example, so-called "SNP chips," whether used in research or in clinical applications, need to reflect this structural variation to find links between particular gene variants and diseases. "If you depended on the latest and greatest chip, you wouldn't find an association for about 50 percent of these sites," said Eichler.

Besides their potential applications, the new results provide a wealth of data to explore hypotheses and make discoveries, according to Eichler. "What's exciting to me is that we now have, in essence, eight new reference human genomes."

Read more related article from the same lab
Duplication-Mediated Variation, Disease, and Adaptive Evolution

Genomic duplication followed by adaptive mutation is considered one of the primary forces for evolution of new function. Duplicated sequences are also dynamic regions of rapid structural change during chromosome evolution. My long-term goal is to understand the evolution, pathology, and mechanism(s) of recent gene duplication and DNA transposition within the human genome. Our work involves the systematic discovery of these regions, the development of methods to assess their variation, the detection of signatures of rapid gene evolution, and ultimately the correlation of this genetic variation with phenotypic differences within and between species.

My research addresses a new paradigm that has emerged in the past few years regarding the dynamic nature of human genome structure. Particular chromosomal regions have been shown to be active in the acquisition, duplication, and dispersal of large gene-containing genomic segments. I hypothesize that these "jumping genomic segments," also known as segmental duplications, are part of an ongoing evolutionary process that results in a novel form of large-scale DNA variation and contributes to rapid primate gene evolution. At a structural level, duplications may be viewed as dynamic mutations-an initial event increases the probability of a second event. Sequence homology created as a result of duplication increases the probability of additional rounds of gene conversion, unequal crossing-over, and subsequent rearrangement. Not surprisingly, many of the largest blocks of sequence similarity generated by this process are substrates for recurrent chromosomal structural rearrangements associated with certain human diseases and disease susceptibility. Compared to unique nonfunctional or "neutral" DNA, these particular areas of the genome represent hot spots of evolutionary and contemporary change. Their impact on evolution and disease are only beginning to be understood. Our research falls into three broad categories.

Human Variation and Disease
The combined incidence of detected de novo rearrangements that are mediated by segmental duplications is estimated at 1/1,000 live births. This includes 3 percent of all birth defects where mental retardation is the primary diagnosis. We have identified ~130 regions of the human genome that we believe show a predilection to segmental aneusomy. Our paralogy map of the human genome therefore provides a "road map" to investigate regions with an increased probability of rearrangement. Children with undiagnosed mental retardation provide a sensitized background for the study of copy-number variation. One goal of our research is to assess the frequency of duplication-mediated segmental aneusomy within (1) the normal human population and (2) a population of patients with idiopathic mental retardation. Our aim is to address two fundamental questions: What is the nature and frequency of duplication-mediated structural polymorphisms within the human genome? Is there an excess of de novo events among children with mental retardation and congenital birth defects?

Our primary method for detection of variation in copy number is based on array comparative genomic hybridization (array CGH), using a well-characterized set of probes flanked by low-copy repeat sequences. As a second method, we have developed a computational approach based on the assessment of paired-end sequence against the reference genome. The latter has identified hundreds of sites of potential structural polymorphism, of which 82 encompass genes. I hypothesize that copy-number variation (deletion and duplication) is an underestimated mutational force contributing to genetic disease-particularly susceptibility loci. The characterization of this variation will provide the basis for developing the necessary assays to perform association studies with simple Mendelian and complex human genetic disease.

Phylogenetics and the Mechanism of Origin
As a complement to our understanding of human variation, we focus on understanding natural genomic variation between humans and other primates. Because of the limitations of assembled genome sequence, we employ computational tools we developed during the analysis of the human genome to characterize lineage-specific and shared duplications between humans and great apes. In addition to genome-wide analyses, targeted high-quality sequencing of specific regions will provide long-range continuity to model evolutionary processes within these regions. There are two objectives. We will reconstruct the evolutionary history of every recent (<40 million years) segmental duplication within the human genome. In collaboration with Pavel Pevzner (University of California, San Diego), we are developing computational methods to identify ancestral states based on outgroup genomic data and to extract historical associations by application of graph theory, which promises to deconvolute the subrepeat structure of mosaic duplications. Using comparative sequence from these regions, we are also modeling the frequency of gene conversion and its impact on the structure of these regions.

Our second objective will be to understand the underlying mechanism of segmental duplications. We have recently developed a donor-acceptor model for human duplications that indicates that Alu repeats are key elements for the mobilization of duplications, while low-complexity (GC- and AT-rich) sequences may account for the preferential integration of these elements into specific chromosomal regions. We propose to test this model directly by identifying and characterizing lineage-specific duplications within humans, chimpanzees, and gorillas. Studying the phylogenetic relationship of such sequences to their antecedents will provide fundamental insight into putative donor and acceptor sequences at the sites of transposition and integration, respectively. To date, in collaboration with Eric Green (National Human Genome Research Institute), we have cloned and mapped 12 of these new insertion sites within gorilla, chimpanzee, and orangutan. Large-scale sequence analyses of the integration sites suggest coordinated deletion of the insertion site during segmental duplication. Ultimately, these data will serve as the basis for future experimental modeling of this process.

Gene and Transcript Innovations
The process of segmental duplication provides a vehicle for primate gene innovation in two different ways. First, duplications may lead to the adaptive evolution of genes "liberated" from the selective constraints of ancestral function. Second, the accumulation of diverse duplications at prescribed locations in the genome juxtaposes different gene cassettes in novel genomic texts. This has led to the formation of "chimeric" transcripts in a process akin to "exon shuffling." Although most random mutations create duplicate pseudogenes, occasionally functional products may emerge. One highlight of our research has been the discovery of both rapidly evolving genes and fusion genes specific to the human and great ape lineages. The latter genes show a bias toward germline expression. We will extend this work to identify such novel gene products and compare expression profiles of each to their progenitor genes. Functionality will be determined by identifying signatures of either significant positive or purifying selection. Genes that show evidence of significant positive selection will be assessed for intraspecific variation as a test for evidence of a selective sweep through the population. The determination of the function of such genes in the absence of model organisms is a significant challenge and will remain my long-term objective.

My research program is committed to understanding the significance of human segmental duplications from the structural, genic, and phenotypic level. It is nontraditional, in that we work on some of the most biologically complex regions of the genome, which are not readily tractable by available genomic technologies. Furthermore, many evolutionary biologists are focused on understanding the significance of highly conserved genomic sequence among distantly related species. We strive to unravel the significance of regions undergoing rapid evolutionary change among closely related primates. Our research challenges the notion of a static genome that simply decays under a neutral model of evolution. Rather, the data implicate local dynamism, where hypermutable regions are nonrandomly distributed. A comprehensive assessment of this form of genetic variation will forge new links between evolutionary biology and human genetic disease. My research philosophy combines various disciplines (evolutionary, human genetics/genomics, and bioinformatics) to understand the mechanisms and consequences of novel forms of variation in the human genome. Such a synergism of various disciplines provides a powerful strategy to address biological processes of genome evolution. The development of tools and the conditions required to pursue such a holistic approach, with respect to studies of genome evolution, are unprecedented. With the advent of large-scale comparative sequencing and the integration of experimental and computational genomic approaches, such multifaceted research objectives have become increasingly tractable endeavors. My overall goal is to contribute to this new era of genomic science as it applies to evolution and medicine and to impart the value of this scientific design, through teaching and mentorship, to the next generation of scientists.

Grants from the National Institutes of Health, the March of Dimes, the National Science Foundation, and the Department of Energy provided support for these projects.

Copyright © 2007 3D Informatics, LLC. All rights reserved.