Summary- 4th Malaria Genome Meeting
For subsequent meeting reports please refer to MFI's Malaria Genome Database page at www.malaria.org/genome.html



A meeting was held on 11-12 December 1997 in Orlando, Florida to discuss progress in the malaria genome project, develop consensus on policy issues, and identify needs for future activities that will translate the knowledge gained from sequencing the genome into new knowledge about the disease.


 

Updates from funders

Michael Gottlieb outlined the National Institute of Allergy and Infectious Diseases' activities related to malaria and genomics, saying that a number of new sequencing initiatives have started and others are in the planning stages. NIAID will provide support, pending advisory council approval, to TIGR (Leda Cummings, PI) for sequencing chromosomes 10 and 11. John Dame will receive support for obtaining ESTs and GSTs from P. vivax and P. berghei. The Institute is committed to expanding its malaria research activities and has developed a ten year plan for the development of malaria vaccines. The plan is available at www.niaid.nih.gov/Dmid/malvacdv/toc.htm

In addition, the Institute is committed over the next eight years to the development of a malaria reference reagent repository www.niaid.nih.gov/reposit/malrep.htm

Martha Peck summarized the Burroughs Wellcome Fund's ongoing support of Malaria. The Fund has earmarked $4 million dollars for malaria genomics, and approximately $2.9 million of this has been spent. Approximately $3 million has also been given for malaria research through the Fund's regular award programs.

Cathy Fletcher of the Wellcome Trust reported that the Trust remains strongly committed to the malaria genome sequencing effort. In July 1997 the Trust approved an award of £4.8m to the Sanger Centre for sequencing approximately half of the P. falciparum genome. When added to previous awards amounting to 3.2 million (including pilot funding) , this means that a total of approximately £8 million has been made available for this project by the Trust. No applications for sequencing other Plasmodium genomes are under consideration at present.

Stephen Hoffman outlined the US Department of Defense's efforts in this area. DoD began working on malaria genomics two years ago and met with TIGR in January of 1996 to begin looking for ways to fund a malaria genome project. The DoD's objectives are to finish and annotate the complete sequence of the P. falciparum genome and to produce significant quantities of genomic sequence from P. vivax. DoD's budget for sequencing in malaria is projected to be $8 million during the next 5 years, with $1.425 million to be spent directly on the project in FY98. The majority of the funds are allocated to the TIGR/NMRI team with additional funds going to WRAIR for P. vivax cloning and development of bioinformatics expertise for use of the sequence data for malaria drug development.

Database issues
David Lipman described the evolving centralized database for the project. The database will provide "one stop shopping" for users and aims to be comprehensible to researchers, making access to genomic information easy and intuitive. Although the database will be centralized in terms of bringing data from the project, the data can be physically dispersed at sites around the world. Information now at the site includes EST data from John Dame's group; mapping, marker and sequencing data from Tom Wellems' group; and data at the web sites of the three large scale sequencing groups. After chromosome 2 is published, the data from that project will be made browsable. Lipman's group is developing research collaborations to develop a greater awareness of how the data will be used.

Lipman's malaria web site is at www.ncbi.nlm.nih.gov/Malaria/

Ross Coppell talked about the need for a next generation of archiving. His work with databases started well before the malaria genome project with a WHO-sponsored database giving researchers in endemic areas access to a copy of available malaria sequences. His database currently is built on an ACedb shell, which can pull together sequence information and the literature. The next generation of the database, Coppell says, needs to help biological labs make sense of the data that is stored there. Sequence browsing tools need to be developed so that bench scientists can ask questions of the genomic data. 

One issue to be addressed in putting together a centralized database will be giving people fast, useful access to the data. Centralized databases have a disadvantage: extensive BLAST searches cannot be done effectively by distant users. For this reason, the database needs to be mirrored, at least to Australia and Europe. It was suggested that a curator/moderator should be identified so that eventually one researcher could be in charge of maintaining the database.  

Michael Ashburner talked about making the data widely available, even in countries where net access is not good. The WHO/TDR funded parasite genome databases, which are hosted at EBI. WHO provides money for the database manager for that project to go into endemic areas and install the db locally and provide updates, either by FTP or sending compressed data on tape.  

Later in the meeting, Daniel Lawson indicated that tools for the community including better keyword searching, virtual PCR machines and an informatics FAQ are on his "to do list ".

Next steps for the centralized database:
Developing "user friendly tools"
Community input- encourage researchers to contribute to annotation of the sequence.

 

Chromosome-specific projects

 

Stanford- Chromosome 12
Richard Hyman reported on progress in sequencing the 2.5 megabase chromosome 12. Stanford is doing shotgun sequencing of YACs. They will do 20 YACs at low coverage to generate sequencing bins, then run shotgun sequenced chromosome 12 through the bins. ABRA, a well characterized 500 bp sequence, will be used as an "anchor". ABRA is about in the middle of the chromosome, and sequencing will proceed outward from ABRA toward both ends. The YACs that have been characterized cover the middle third of the chromosome.  

The Stanford group has also been doing comparative sequencing using dye primer and dye terminator chemistries to compare the costs of the two technologies. So far the sequence data and mapping data is largely in agreement. They hope in the next 6 months to a year to finish the shotgun sequencing phase of the project.

Sanger Center- Chromosomes 1, 3, 13 and da blob
Sharen Bowman reported Sanger's progress on chromosomes 1 and 3. Sanger's strategy is a whole chromosome shotgun skim, with YAC reads used to group contigs from the whole chromosomes into sets for finishing.  

Most of the data for chromosome 3 is from whole chromosome shotgun, but 10 YACs have been used to pull it all together. The library for chromosome 3 is overrepresented for telomeric sequences. They are now doing gap filling on this chromosome using oligo walking, shotgun sequencing of PCR products, and shotgun sequencing of pUC bridges across gaps.  

The shotgun portion of the chromosome 1 project is nearly finished. More than 18,000 reads from chromosome 1 have been entered into the database for a total contig length of nearly 900 kb at about 8 fold coverage. There are 166 contigs greater than 1 kb and 74 contigs of greater than 2 kb. For chromosome 4, more than 21,000 reads for a total contig length of 1.6 mb have been done, generating about 5 x coverage. 

The Sanger group is waiting for DNA from chromosome 13 from which to make a library, and libraries have been prepared for 2 fractions of the blob. Dan Carucci from NMRI agreed to send the chromosome 13 material. A small number of shotgun reads of sequences from the blob have been generated.

Daniel Lawson will be working full time on the malaria project starting in January. For chromosome 1, analysis has predicted 185 orfs, and there is an estimated gene density of 1 gene/4.5 kb, or a total of 200-250 genes on the chromosome. If this density is typical for the organism, then approximately 6500 genes are predicted for the whole organism. 26% of the genes identified are spliced, mostly edited with small splices at their 5' ends.

TIGR/NMRI- chromosomes 2 and 14
Steve Hoffman pointed out how much has been learned from the intensive effort that has gone into the assembly and closure phase of the chromosome 2 project after the end of the random sequencing phase. He also touched on how critical it will be to utilize this experience to develop the most efficient methods for assembling and closing the entire genome, since this level of effort my be impossible to sustain for the entire project.

Ham Smith made the libraries that are being used for this project. He has been sticking to using short (~1.5 kb) inserts for making libraries and has added a pol I/ligase step for making covalently closed circles before transforming plasmodium DNA into E. coli. Both of these approaches come out of Dyann Wirth's pilot projects examining plasmodium DNA stability in bacterial hosts. Chromosome 14 libraries have been generated within the last several weeks.  

Dan Carucci talked about functional genomics approaches, including generating microarrays from sequences in genbank and from sequences generated from chromosome 2. They would like to get to the point of using the genomic information for vaccine development and drug discovery.  

NMRI is also working on genomic projects in P. vivax and in the rodent malarias P. yoelii and P. bergei.

Malcolm Gardner reported on sequencing progress and the near-completion of chromosome 2. TIGR's sequencing is done shotgun from PFG-purified material. The TIGR assembler has been improved and new software also links contigs into groups. These groups can then be pinned onto the STS map of the chromosome. Sequence gap closure in the central part of the chromosome and physical gap closure in the subtelomeric regions is still being done on chromosome 2, using pcr with primers from adjacent contigs.  

Chromosome 2 has 80.2% AT as determined by sequence. At the time of the meeting there were 6 large contigs, 5 sequence gaps, and 3 physical gaps. There was 949 kb assembled, and closure was anticipated for January. The coverage criteria for TIGR is that there will be at least double clone coverage at every base pair, and that sequencing of each base pair will either be in both directions or by two chemistries.

 

Technology Development Projects

Dyann Wirth discussed cloning strategies worked out by her lab. They have been developing methods for cloning P. falciparum DNA in E. coli, and have found that large fragments are unstable in E. coli. This has been more problematic with DNA from P. falciparum than from the other plasmodia. Traditional phage and cosmid libraries made with P. falciparum DNA are not representative of the entire genome.

Wirth's lab has looked at a large number of different E. coli rec mutations and found that regardless of the genotype of the host, there is a great deal of rearrangement of falciparum DNA. These sequences are not inherently unstable in E. coli- rather, they are unstable in cloning: they have to be closed circular DNA to be stable when cloned. After searching through strains deficient in recombination and DNA repair, the SRB strain from Stratagene has been identified as a good host for cloning. SRB is very different from the newer cloning host strain SURE, which is now being advertised as a good host for cloning unstable DNA. The only difference in the advertised genotypes of SURE and SRB is that SURE is recB- and SRB is recB+, but the advertised genotypes may not be complete and the strains may differ in other ways.

David Schwartz gave the group an overview of how his optical mapping technique works. Briefly, DNA is applied to a charged microscope slide, where it adheres. It is then is treated with restriction enzymes, which cleave the DNA on the slide surface without displacing the fragments from each other or from the slide. Quantitative fluorescence allows determination of the length of the resulting fragments. Since the fragments remain in their proper physical order, it is possible to establish maps of large pieces of DNA- e.g. chromosomes. [Visuals are helpful: www.med.nyu.edu/Research/D.Schwartz-res.html gives a brief summary.] Schwartz has produced an optical map for chromosome 2, and showed a "weekend project"- a first rough optical map of the complete P. falciparum genome.

 

Mapping Groups and Other Genome Projects

John Dame, with funding from NIH, has been working on preparation of gene sequence tag databases and indexed clone banks from (bergeii?) and vivax. He is working toward 10,000 genomic clones and 1,500 cDNA clones. The clones are being identified by comparing translated sequence tags with sequences already in the database. The sequence tagged clones will be provided to the malaria repository now being established by NIH.

Jane Carlton, a postdoc in Dame's lab, has been doing some work looking at the degree of synteny between different species of plasmodium. She has looked at 50 genes and has found that most are conserved in location between sequences. Studies of 40 genes have shown a 60% conservation of location between P. falciparum and rodent malaria species. She is currently looking at the conservation of genes between the four human malarias.  

David Roos discussed sequencing efforts in toxoplasma. A collaboration between several groups led to establishment of a toxoplasma EST projects. 12,000 ESTs were established at a cost of $250,000, a low price made possible by the ongoing human EST project at Washington University. The informatics group at Penn has generated a tool for working with the EST db, allowing researchers to look at clusters of ESTs and identify "consensus ESTS". There may be important connections between work in toxoplasma and plasmodium: in toxoplasma, gene knockouts are easier to establish and transformation is available.  

David Kemp was not able to attend the meeting, but sent along a letter that was read to the group.In it he asked whether the community feels the completion of the chromosome 13 map would still be useful.. The status of the YAC mapping project was shown to the group. There are two chromosomes that don't have YACs- chromosomes 10 and 11, and chromosomes 6 and 7 haven't had much work done. Since the availability of optical mapping may change the project's need for other kinds of physical mapping, consideration of the necessity for completion of the chromosome 13 map has been put on hold for the moment.

Data validation, usage, and publication

Malcolm Gardner reiterated TIGR's coverage criteria- sequencing in two direction or sequencing by two chemistries. They are trying to make the best use they can of the chromosome 2 YACs in assembling the sequence. Daniel Lawson said that Sanger also looks for multiple clones over regions. Richard Hyman said Stanford looks for multiple clones or confirmation by two chemistries. Hyman pointed out that with all the groups using the same techniques, it is possible that there is systematic error that won't be detected.

Improved physical maps should provide an important check against such systematic error.

Discussion revealed that the data usage/acknowledgment statement circulated before the meeting was still not right, as it left doubt about when data could be used and what constituted fair use. Concern was expressed that those who are working hard to finish chromosomes could miss out on the chance to use the data that they have generated. The evolving data release policy represents a balance between recognition of the genome centers' need to publish fully annotated sequence and the need to bring the genomic data rapidly to the greater malaria research community. It was agreed that there were no fundamental and insoluble disagreements- rather, the objections to the statement could be solved by "better wordsmithing". It was agreed that another pass at the statement should be made, and Michael Gottlieb took responsibility for circulating a reworded statement.

Getting Genomic Information to the Community

Mary Galinski talked about the Malaria Foundation web site (www.malaria.org). This site provides or will provide access to the genome databases; regular, succinct summaries of the malaria genome meetings; introductory presentations- ie powerpoint slides- for the community to access; topic discussions through the Malaria Research Network; and more. Attention also needs to be paid to serving malariologists in places where web access is not common.  

Dyann Wirth said that as sequence becomes available, interacting with population biologists and epidemiologists looking at sequences that are polymorphic. This is going to bring people who haven't ever thought about the genome into quick interest in the outcome of the genome project. Having a user-friendly interface is going to be very important. It may be time, she said, to start thinking about coordinating an effort to develop DNA chip technologies for malaria. The developing Malaria Repository may have a role to play in these new opportunities arising from the genome project.  

Wirth also suggested developing fellowships for getting endemic country scientists involved in the genome project. ICCBNet, an international bioinformatics network for the third world, is a UN-sponsored project being handled by the Weizmann Institute. It may be a model for developing better outreach to scientists in endemic countries.

Michael Gottlieb talked about the Malaria Repository: NIAID has made a commitment for 1 year (this year) and put out a request for proposals for a 7 year contract to build and maintain a malaria repository. The repository will offer reference standards for the community: strains, antigens, etc. In the future, it might be expanded to include things like the DNA microarray technology. Within the first year there won't be any live materials, but the initial repository will have field samples, dot blots, and other non-living samples.

Where Do We Go From Here?

The whole group discussed future directions and priorities:
1. Goals for this project

6-month goals: before the next meeting, it is anticipated that-

Longer-term goals:
  • Finish sequencing the genome!
  • Make maps of the genome- Optical mapping could make exciting progress toward this goal
  • Make full length cDNAs of everything
  • Making full length cDNAs of chromosome 14 is to be funded as a 2 yr project
  • Verification/validation- checking whole chromosome vs. known sequences
  • Develop the database- ultimately will be, effectively, the encyclopedia of malaria
  • Develop better sequence tools for the biologist- something up, running, and ready to be played with.
  • Identify potential curators to maintain the db in the long term- ultimately, db should reside in a malaria lab
  • Establish better access away from the US: the entirety of the information at NCBI can't be mirrored because some information there is designed to be handled there


2. Applications of this project

A. Develop new targets for therapy/intervention: drugs, vaccines, diagnostics, and better understanding of basic biology

The 5th malaria genome meeting was held 30 June - 1 July 1998 in Hinxton, UK (Report)