Human Genome Project
U.S. Department of Energy

DOE BERAC Advisory Subcommittee Meetings to Define 5-Year Goals

U.S. Human Genome Project 5-Year Research Goals 1998-2003

Meetings

Alta, Utah (December 1-2, 1997): Elbert Branscomb, Mario Capecchi, Marvin Frazier, Harold Garner, Raymond Gesteland (Chairman), Richard Gibbs, Phil Green, Trevor Hawkins, Mike Knotek, Miriam Meisler, Ari Patrinos, Jane Peterson (NIH staff), Lloyd Smith, Monte Westerfield

Gaithersburg, Maryland (March 12-13, 1998): Elbert Branscomb, Ken Buetow, George Church, Dan Drell (DOE), Elise Feingold (NIH), Marvin Frazier (DOE), Rainer Fuchs, Harold Garner, Raymond Gesteland (Chairman), Betty Graham (NIH), Trevor Hawkins, Elke Jordon (NIH), Miram Meisler, Ari Patrinos (DOE), Lloyd Smith, Randy Smith

Summary

The DOE Genome Project is in a transitional phase, evolving to balance the demands of large-scale sequencing and technology development, while at the same time setting the stage for genomic analysis of gene function. Discussion of progress and goals emphasized the following points:

  • The JGI will carry out the majority of DOE’s share of human genome sequencing with a 5 year goal of over 600mb. Highest priority must be on ensuring success of the JGI.
  • Sequencing costs must be reduced by 2-4 fold, with special emphasis on “hardening” of incremental technological improvements that will contribute to sequencing during the next 5 years.
  • Development of new sequencing technologies should be supported but they should hold the promise of 20-100 fold improvements over current methods.
  • The current modest investments in functional genomics should be enhanced as funds become available.

Sequencing

Microbial Genomes
The DOE microbial genome project has made a very substantial contribution to our understanding of the diversity of microbial life and the complexities of evolution. A large fraction of genes (30-50%) that are found in any newly sequenced microbial genome do not have known relatives. Sequencing of additional genomes should be high priority both to understand our biological world and to enlarge the repertoire of genes that may be of practical importance.

Joint Genome Institute
The DOE genome project has made a major commitment to support of a large scale DNA sequencing facility – the Joint Genome Institute (JGI) under the direction of Dr. Elbert Branscomb. Resources and scientific talent from genome efforts at three national laboratories have been pooled and brought to bear. Very challenging goals of ramping up production have been set and every effort must be made to ensure its success.

Currently LBL, LANL and LLNL are pursuing sequencing within their own structures to meet the production goal of 20 mega bases of finished sequence for fiscal 1998. So far, production is on track to meet this goal. This is being accomplished even under the pressure formulating a united scheme for the factory productions and planning the new facility. These pressures will escalate in the coming months and the stress of the move to the new facility and the need to double the sequence output to 40 mb in fiscal 1999. This latter goal will be very challenging because of the many distractions. Some patience should be shown so that the factory can get up and running effectively, so as to have a real shot at meeting future sequencing goals.

JGI needs to show that it can sustain production of high quality, contiguous, sequence. This is imperative even if it is at the cost of throughput and cost during the crucial first and second years. However, every effort within reason must be made to keep to the proposed aggressive ramp-up. The proposed goals are:

Year Output Cost/bp
1999 40mb $0.50
2000 100 $0.35
2001 150 $0.30
2002 200 $0.25
2003 200 $0.20

In light of the major investment of the DOE genome project in the JGI, and the importance of success of this venture, the first priority of funding genome research must be to strategic ventures that will help insure success of the JGI. The necessity of “getting on with sequencing” requires commitment of substantial funds to production at the JGI, which will limit the ability of the DOE genome project to support technology development.

Yet current technologies must be augmented by improvements in automation, in sequencing chemistries and in computer tools for assembling and interpreting DNA sequences in order to improve both efficiency and cost.

As the JASON report on the genome project points out, this is just the time when technology development is very important for the genome project and technology is DOE’s forte. Both the JASONs and the BERAC genome subcommittee are in accord that despite the immediate need for production sequencing, funding must be ensured both for short term developments that enhance current production and for more long term technologies that will provide the key tools for the future.

Technology Development
Current technology for genomic DNA sequencing has about equal cost contributions from labor and reagents (capital investment in equipment is quickly amortized) with an overall cost of about $0.50/base pair, although the many vagaries of calculating costs make this number quite soft and subject to individual lab interpretation. However, for the JGI (and the genome project in general) to meet its goals, the cost must come down by a factor of 3 to 4, without compromising sequence accuracy. To have any effect during the major sequence accumulation phase of the project (until 2005) only incremental improvements to current technology are likely to pay off. Incremental improvements to the sequencing process itself, including alternative chemistries and longer read lengths, are resulting in overall cost improvements. Automation of sample preparation and handling continue to hold the promise of cutting labor costs and improving reliability of the process. (See addendum “A” for more complete list of needed improvements.)

However, just the development of improved technologies is not sufficient. It is often the case that promising technologies languish because of the difficulties of moving them into production streams. Disruption of the production effort and dependence on a new, untested technology make the risks of implementation too high. Thus a major challenge is to find ways to support “hardening” of incremental technologies so that they can be moved into production with minimal risk. A targeted funding method is needed to solve this problem otherwise the investment in many incremental technologies will have been wasted. Perhaps new cooperative agreements can be the tool. However, it is crucial that appropriate measures be in place to ensure that the value of incremental technologies is assessed during this hardening phase. This will require monitoring usefulness and establishing milestones for performance.

There is a crucial need for development of new sequencing technologies that will be the tools for the future. The appetite for sequencing will only increase, but the costs of current methods, even with incremental improvements, will greatly limit sequencing capacity. While some new approaches are in the wings, (See addendum “A” for a list), they are not likely to contribute to the large-scale sequence accumulation needed by 2005 to meet the primary goals of the genome project. This reality should not discourage investment in longer-term development of new sequencing technology. However support should be predicated on new technologies being able to reduce the cost of sequencing by 20 to 100 fold. Anything less will be too late with too little. In addition support for long term technologies cannot be at the expense of JGI’s success.

Functional genomics

It is important to lay the groundwork now for this next stage of the genome project. Genome sequences are only starting points – the human sequence is a tool to use to identify the information for each of the 100,000 genes with the ultimate goal of determining the function of each gene. Defining the very complex network of interactions of gene products will be the heart of biomedical research for many decades.

Genome sequence is also a tool that permits examination of human variation with direct applicability to understanding individual susceptibility to disease and environmental insults such as exposure to low radiation doses. With the reference sequence in hand, genes that play a role in susceptibilities will be identified leading to an understanding of differential susceptibility in the population. Using the mouse as a model organism is particularly powerful.

There are many aspects of the application of genome technology to DOE missions. Analysis should be expanded on a number of these fronts if the resources can be found. This is the payoff – the harvest of the genome project.

  1. Efforts to sequence human and mouse cDNAs in order to get a reasonably complete picture of the DNA regions that are expressed should be continued and expanded if possible.
  2. Expression levels of large numbers of genes can be determined at one time with new chip technologies. For example the 6000 yeast genes can be monitored on one chip and changes in patterns with environmental or genetic differences can be determined. This important technology should be supported and enhanced for more complex organisms.
  3. Attention is beginning to focus on the complexity of gene products – the proteins. It is clear that one gene can yield multiple protein products but so far there is little understanding of the complexity on a genome wide basis. Development of new tools in this arena is important.
  4. The DOE has a major investment in the mouse as a model system for studying the effects of mutations in individual genes. The groundwork should be laid for application of new technologies will make it possible to systematically assess gene functions in the mouse on a genome wide scale. What is learned about mouse is directly relevant to human. Zebrafish should be assessed as another appropriate model system.
  5. Comparative sequencing of syntenic regions of mouse and human should be supported.

However, given the stringent demands of the production-sequencing phase of the Genome Project, presently available resources are far too constrained to do justice to the scope and importance of developing tools of this nature.

Informatics

Development of informatics tools continues to be high priority. However, there is still the nagging concern that tools to do equivalent job are developed independently in many different labs. While it may be true that each large scale sequencing center will need to develop informatics tools to support their own technologies, sharing of solutions needs to be encouraged. The development of data bases and their tools needs to be driven by the user community. It is anticipated that the joint NIH-DOE workshop to consider appropriate informatics goals will provide the needed guidance.

ELSI

Input will come from the joint NIH-DOE ERPEG.

Addendum A

  1. Sequencing Technology
    1. Near-term: areas for improvement include
      1. automated finishing
      2. improved assembly programs
      3. improved sequence analysis programs
      4. improved processes and protocols for all aspects of the sequencing process (e.g. library construction, template production, reaction automation…)
      5. improved Front End automation
      6. longer reads
      7. higher speeds
      8. more lanes
      9. pumpable and/or pre-fabricated gels
      10. systems integration
      11. improved methods for sequence-ready map construction and validation
      12. resequencing
      13. improved methods for “sequence back to the genome” validation
    2. Longer-term approaches include
      1. mass spectrometric approaches
      2. microfabricated system approaches
      3. array-based approaches
      4. single molecule approaches, including “pore” sequencing, scanning probe methodologies such as STM, AFM, and NSOM, flow cytometry
      5. free solution electrophoresis
  2. Functional Genomics Tools Needed for Analysis and Study of :
    1. Networks and Interactions
    2. Gene Expression (at the RNA level)
    3. Gene Expression (at the protein level)
    4. Variation analysis (at the DNA, RNA, and protein levels)
    5. Proteome Analysis
    6. Attaching Function to Genes
    7. Genome-Wide Protein Structure Determination