Methods
Mycoplasma strain
The strain Mycoplasma pneumoniae M129 (ATTC 29342) in the 18th broth passage was used to construct an ordered cosmid library containing the complete genome (15). This cosmid library was the basis for the DNA sequence analysis. We selected this specific bacterial strain, because it has been used in cytadherence and pathogenicity studies (2, 16, 17). The strain in the 20th broth passage was still infectious in hamsters (H. Brunner, unpublished).
Sequencing strategy
The general strategy is to sequence both strands in an directed fashion
by primer walking and to limit random ("shot gun") sequencing
to a minimum. DNA sequence data are being generated by the Sanger method
using fluorescent dye labelled primers or dideoxynucleotides in combination
with a semi-automated DNA sequencing unit.
An ordered cosmid library containing the complete M. pneumoniae genome in
34 overlapping cosmids, two l phages and one plasmid was the starting point
for the project. The cosmid library was constructed by partial digestion
of the M. pneumoniae genome with the restriction endonuclease EcoRI. The
individual EcoRI DNA fragments from the cosmids are being further subcloned
into a plasmid vector, resulting in a plasmid library consisting of clones
each carrying one individual EcoRI DNA fragment. These fragments are between
0.1 and 28 kbp long and are sequenced individually. The following methods
are applied depending on the insert size.
i) Inserts up to 3kbp long are sequenced by primer walking only.
ii) Sets of nested deletions are constructed by the exonuclease III method
from clones with inserts between 3 and 10 kbp long. A set comprising 20
nested deletions is normally sufficient to obtain the sequence from one
strand of a 6 kbp long fragment. The complementary strand and possible gaps
in the first strand are then sequenced by primer walking.
iii) For sequence analysis of all other plasmids with inserts between 10
and 28 kbp we apply a limited "shotgun cloning" and sequencing
strategy. Suitable frequently cutting restriction endonucleases like Sau3A,
AluI or HaeIII, are used to establish two or three different sets of about
20 subclones carrying fragments 100 to 500 bp long. Both ends of individual
cloned fragments are sequenced and aligned to contigs. Gaps are filled by
primer walking on plasmids or cosmids carrying the EcoRI fragment in question.
The project is organized in such a way that many different plasmids can
be sequenced at the same time and waiting for the synthesis of new primers
is not a limiting step. Furthermore, sequencing efforts may be shifted to
any region of interest on the genome. The speed of sequencing can be calculated
since the complete genome has been cloned and therefore, the frequently
painful analysis of the last few percent of the genome as a result of missing
or not clonable DNA regions will not be an extra burden. The ordered cosmid
library has been used to construct an EcoRI restriction map of the entire
M. pneumoniae chromosome. Therefore, any DNA sequence can be attributed
to a defined position on the physical map. This permits to establish a detailed
genetic map parallel to the sequencing project.
Material and Methods
Cloning of EcoRI fragments
The EcoRI fragments were subcloned from an ordered cosmid library comprising
the complete M. pneumoniae genome. Standard procedures were applied using
as vector the plasmid pBC (STRATAGENE) and the E. coli strains HB101 or
XL1-Blue (STRATAGENE) for propagation of these plasmids. The plasmid clones
containing the individual EcoRI fragments which were used for sequencing
were purified by Qiagen column chromatography according to the protocol
provided by the manufacturer (Qiagen).
The cloning vector pBC was purified by centrifugation to equilibrium in
two sequential cesium chloride-ethidium bromide gradients.
The following nomenclature was used for the plasmid clones as well as EcoRI
fragments: The cosmid always had the prefix pcosMP and a letter and a number
e.g. pcosMPD2. The plasmid carrying an EcoRI fragment from this cosmid received
a "p" as prefix and the letter and number applied to the cosmid
and additionally the size of the EcoRI fragment in kbp e.g.: pD2/4.8 . The
EcoRI fragment alone was named D2/4.8.
Long range PCR
To determine or to check the orientation of adjacent EcoRI fragments the
improved PCR method for the amplification of DNA fragments up to 45kbp long
was used. The reactions were done with the GeneAmp XLPCR kit from Perkin
Elmer according to the manufacturers protocol. Genomic M. pneumoniae DNA
used for amplification was purified as described. The primers for the reactions
were designed as 22-mers with a melting temperature of 68°C.
Synthesis of oligonucleotides
Synthetic oligonucleotides were synthesized (R. Frank et al., ZMBH) according
to phosphoramidite chemistry using a solid carrier on a model 394 DNA/RNA
synthesizer from Applied Biosystems. Oligonucleotides between 17-20 nucleotides
long were specifically designed using the program OLIGO 4.0 (National Biosciences
Inc.) and were used without further purification following a standard dilution
for the sequencing reaction.
DNA sequencing
The sequence data for this study was exclusively generated by the enzymatic
dideoxy chain-termination method described by Sanger et al. The radioactive
label was substituted by a fluorescent label and Taq polymerase was used
in the reaction. The protocols were adopted from cycle sequencing protocols
introduced by Craxton in which the basic principle of this method is the
linear amplification of the target DNA with a single primer.
All data was generated on a fluorescent-based sequence-gel reader (Model
373A, Applied Biosystems). Either fluorescently labelled universal primers
(-21M13, M13 RP, T3, and T7) or fluorescently labelled dideoxynucleotides
were used as label.
Taq dye primer cycle sequencing and Taq dye deoxy cycle sequencing were
done as provided in the manufacturer's protocol. In each sequencing reaction
1 mg plasmid DNA or 2.3 mg cosmid DNA and 10 pmoles primer were used.
In a typical sequence analysis about 500 nucleotides were read. Primers
for primer walking were selected between nucleotide 300 and 400 from such
a sequence. All sequence chromatograms were visually inspected and edited
by the SeqEd program (Version 1.03) from Applied Biosystems. Sequence Assembly
was performed by using the Sequence Project Management program of the DNA*
program package by Lasergene.
Computer assisted analysis
Sequence assembly, map drawing and multiple alignments were done with the Lasergene program package (DNA STAR).
Other analyses were performed with the HUSAR (Heidelberg Unix Sequence Analysis Resources) program package release 4.0 at the German Cancer Research Center, Heidelberg, Germany. This package is based on the GCG program package version Unix-8.1 of the Genetics Computer Group, Wisconsin. For searching the DNA- and protein databases (SWISS-PROT and PIR) the FASTA and BLAST programs (BLASTX, BLASTN and BLASTP) were used. Conserved motifs in proteins and peptides were identified by using the program PROSITE. Open reading frames (ORFs) were calculated by the program FRAMES allowing AUG (or GUG, UUG) as start codons using the Mycoplasma translation table where UGA codes for tryptophan. The G+C content was calculated by the program WINDOW. Codon usage was performed with the program CODONFREQUENCY.
The programs TopPred 1.1.1 (Manuel G. Carlos, Ecole Normale Superieure, Laboratoire de Genetique Moleculaire, Paris, France) and PSORT (http://psort.nibb.ac.jp/) were used for the prediction of transmembrane domains and the membrane topology of proteins.
Each ORF analysis is accessible as a File Maker Pro (Claris) database which can be accessed at our world wide web (www) site (http://zmbh.uni-heidelberg.de/M_pneumoniae). It contains besides genome and cosmid position of each ORF/gene data about expression, availibility of antibodies, comments, literature, prosite patterns, amino acid composition, and database search homology scores. All annotations in this paper were done on the basis of the highest score values.
Accession number
The complete M.pneumoniae sequence has been annotated in GenBank (NCBI) with the Accession number U00089.
[Back to Results]