Methods


Mycoplasma strain

The strain Mycoplasma pneumoniae M129 (ATTC 29342) in the 18th broth passage was used to construct an ordered cosmid library containing the complete genome (15). This cosmid library was the basis for the DNA sequence analysis. We selected this specific bacterial strain, because it has been used in cytadherence and pathogenicity studies (2, 16, 17). The strain in the 20th broth passage was still infectious in hamsters (H. Brunner, unpublished).

Sequencing strategy

The general strategy is to sequence both strands in an directed fashion by primer walking and to limit random ("shot gun") sequencing to a minimum. DNA sequence data are being generated by the Sanger method using fluorescent dye labelled primers or dideoxynucleotides in combination with a semi-automated DNA sequencing unit.
An ordered cosmid library containing the complete M. pneumoniae genome in 34 overlapping cosmids, two l phages and one plasmid was the starting point for the project. The cosmid library was constructed by partial digestion of the M. pneumoniae genome with the restriction endonuclease EcoRI. The individual EcoRI DNA fragments from the cosmids are being further subcloned into a plasmid vector, resulting in a plasmid library consisting of clones each carrying one individual EcoRI DNA fragment. These fragments are between 0.1 and 28 kbp long and are sequenced individually. The following methods are applied depending on the insert size.
i) Inserts up to 3kbp long are sequenced by primer walking only.

ii) Sets of nested deletions are constructed by the exonuclease III method from clones with inserts between 3 and 10 kbp long. A set comprising 20 nested deletions is normally sufficient to obtain the sequence from one strand of a 6 kbp long fragment. The complementary strand and possible gaps in the first strand are then sequenced by primer walking.

iii) For sequence analysis of all other plasmids with inserts between 10 and 28 kbp we apply a limited "shotgun cloning" and sequencing strategy. Suitable frequently cutting restriction endonucleases like Sau3A, AluI or HaeIII, are used to establish two or three different sets of about 20 subclones carrying fragments 100 to 500 bp long. Both ends of individual cloned fragments are sequenced and aligned to contigs. Gaps are filled by primer walking on plasmids or cosmids carrying the EcoRI fragment in question.
The project is organized in such a way that many different plasmids can be sequenced at the same time and waiting for the synthesis of new primers is not a limiting step. Furthermore, sequencing efforts may be shifted to any region of interest on the genome. The speed of sequencing can be calculated since the complete genome has been cloned and therefore, the frequently painful analysis of the last few percent of the genome as a result of missing or not clonable DNA regions will not be an extra burden. The ordered cosmid library has been used to construct an EcoRI restriction map of the entire M. pneumoniae chromosome. Therefore, any DNA sequence can be attributed to a defined position on the physical map. This permits to establish a detailed genetic map parallel to the sequencing project.
Material and Methods

Cloning of EcoRI fragments

The EcoRI fragments were subcloned from an ordered cosmid library comprising the complete M. pneumoniae genome. Standard procedures were applied using as vector the plasmid pBC (STRATAGENE) and the E. coli strains HB101 or XL1-Blue (STRATAGENE) for propagation of these plasmids. The plasmid clones containing the individual EcoRI fragments which were used for sequencing were purified by Qiagen column chromatography according to the protocol provided by the manufacturer (Qiagen).
The cloning vector pBC was purified by centrifugation to equilibrium in two sequential cesium chloride-ethidium bromide gradients.
The following nomenclature was used for the plasmid clones as well as EcoRI fragments: The cosmid always had the prefix pcosMP and a letter and a number e.g. pcosMPD2. The plasmid carrying an EcoRI fragment from this cosmid received a "p" as prefix and the letter and number applied to the cosmid and additionally the size of the EcoRI fragment in kbp e.g.: pD2/4.8 . The EcoRI fragment alone was named D2/4.8.

Long range PCR

To determine or to check the orientation of adjacent EcoRI fragments the improved PCR method for the amplification of DNA fragments up to 45kbp long was used. The reactions were done with the GeneAmp XLPCR kit from Perkin Elmer according to the manufacturers protocol. Genomic M. pneumoniae DNA used for amplification was purified as described. The primers for the reactions were designed as 22-mers with a melting temperature of 68°C.

Synthesis of oligonucleotides

Synthetic oligonucleotides were synthesized (R. Frank et al., ZMBH) according to phosphoramidite chemistry using a solid carrier on a model 394 DNA/RNA synthesizer from Applied Biosystems. Oligonucleotides between 17-20 nucleotides long were specifically designed using the program OLIGO 4.0 (National Biosciences Inc.) and were used without further purification following a standard dilution for the sequencing reaction.

DNA sequencing

The sequence data for this study was exclusively generated by the enzymatic dideoxy chain-termination method described by Sanger et al. The radioactive label was substituted by a fluorescent label and Taq polymerase was used in the reaction. The protocols were adopted from cycle sequencing protocols introduced by Craxton in which the basic principle of this method is the linear amplification of the target DNA with a single primer.
All data was generated on a fluorescent-based sequence-gel reader (Model 373A, Applied Biosystems). Either fluorescently labelled universal primers (-21M13, M13 RP, T3, and T7) or fluorescently labelled dideoxynucleotides were used as label.
Taq dye primer cycle sequencing and Taq dye deoxy cycle sequencing were done as provided in the manufacturer's protocol. In each sequencing reaction 1 mg plasmid DNA or 2.3 mg cosmid DNA and 10 pmoles primer were used.
In a typical sequence analysis about 500 nucleotides were read. Primers for primer walking were selected between nucleotide 300 and 400 from such a sequence. All sequence chromatograms were visually inspected and edited by the SeqEd program (Version 1.03) from Applied Biosystems. Sequence Assembly was performed by using the Sequence Project Management program of the DNA* program package by Lasergene.

Computer assisted analysis

Sequence assembly, map drawing and multiple alignments were done with the Lasergene program package (DNA STAR).
Other analyses were performed with the HUSAR (Heidelberg Unix Sequence Analysis Resources) program package release 4.0 at the German Cancer Research Center, Heidelberg, Germany. This package is based on the GCG program package version Unix-8.1 of the Genetics Computer Group, Wisconsin. For searching the DNA- and protein databases (SWISS-PROT and PIR) the FASTA and BLAST programs (BLASTX, BLASTN and BLASTP) were used. Conserved motifs in proteins and peptides were identified by using the program PROSITE. Open reading frames (ORFs) were calculated by the program FRAMES allowing AUG (or GUG, UUG) as start codons using the Mycoplasma translation table where UGA codes for tryptophan. The G+C content was calculated by the program WINDOW. Codon usage was performed with the program CODONFREQUENCY.
The programs TopPred 1.1.1 (Manuel G. Carlos, Ecole Normale Superieure, Laboratoire de Genetique Moleculaire, Paris, France) and PSORT (http://psort.nibb.ac.jp/) were used for the prediction of transmembrane domains and the membrane topology of proteins.
Each ORF analysis is accessible as a File Maker Pro (Claris) database which can be accessed at our world wide web (www) site (http://zmbh.uni-heidelberg.de/M_pneumoniae). It contains besides genome and cosmid position of each ORF/gene data about expression, availibility of antibodies, comments, literature, prosite patterns, amino acid composition, and database search homology scores. All annotations in this paper were done on the basis of the highest score values.

Accession number

The complete M.pneumoniae sequence has been annotated in GenBank (NCBI) with the Accession number U00089.

[Back to Results]