Asymmetry of chromosomes in their coding properties

S. Cebrat, M. R. Dudek, A. Gierlik, M. Kowalczuk P. Mackiewicz, 1999, Effect of replication on the third base of codons.Physica A, 265(1-2), 78-84. (abstract)

P. Mackiewicz,A.Gierlik, M. Kowalczuk, M. R. Dudek, S. Cebrat, 1999, How does replication-associated mutational pressure influence amino acid composition of proteins? Genome Research,9(5), 409-416. (abstract) www.genome.org

P. Mackiewicz, A.Gierlik, M. Kowalczuk, M. R. Dudek, S. Cebrat, 1999, Asymmetry of nucleotide composition of prokaryotic chromosomes. Journal of Applied Genetics, 40(1), 1-14. (abstract)

P. Mackiewicz, A. Gierlik, M. Kowalczuk, D. Szczepanik, M.R. Dudek, S. Cebrat, 1999, Mechanisms generating correlation in nucleotide composition in Borrelia burgdorferi genome. Physica A, 273, 103 - 115.
 M. Kowalczuk, A. Gierlik, P. Mackiewicz, S. Cebrat, M.R. Dudek, 1999, Optimization of Gene Sequences under Constant Mutational Pressure and Selection. Physica A, 273,  116 - 131.

    Traditional DNA walks supply a lot of information on chemical structure of DNA. Nevertheless, it is easy to modify DNA walks in the way which enables the logical DNA analyses.
    The most important approach is analysis of coding capacity of DNA and coding properties of genomes. We have already shown how DNA walks can depict differences in coding density between leading and lagging DNA strands (DNA walks). But our walker can recognise any feature of analysed sequence, not only nucleotide. It can recognise individual codons, or classes of codons groupped according to their nucleotide composition or coding sense. Thus, we can get information on many specific properties of genomes.
    In the section DNA walks we have shown that in many  bacterial genomes a specific asymmetry between leading and lagging strands in nucleotide composition occurs. There are many mechanisms which introduce or may introduce the compositional asymmetry into DNA molecule. A random DNA sequence should not exhibit any statistically significant compositional bias between the two complementary strands. Nevertheless, there are some processes which do not treat the two strands of natural DNA molecule equally. One of these processes is replication. The main cause of unequal fidelity of leading and lagging strand replication is still not clear. It is controversial if replication of only one or both strands is discontinuous (Okazaki et al. 1968, Kornberg and Baker 1992, Wang and Chen 1992, 1994). Nevertheless, the topology of the replication fork itself requires the involvement of different enzymatic mechanisms in replication of each DNA strand (Kunkel 1992, Waga and Stillman 1994). Besides the above-mentioned mechanisms, differences in processivity of leading and lagging DNA strands may be responsible for differential accuracy of DNA replication of these two strands (Fijalkowska et al. 1998). Thus, both strands are exposed to different mutational pressures and compositional bias has been found between them as a result (Lobry 1996a and 1996b; Blattner et al. 1997; Mrazek and Karlin 1998; Grigoriev 1998; Freeman et al., 1998; McLean et al., 1998).

The asymmetry in nucleotide compostion of DNA implicate a question:
Are there any differences in the  amino acid composition of proteins coded by genes located on leading and lagging strands?

    It is possible to get a kind of "degenerated" information about the influence of replication-associated mutational pressure on amino acid composition. Some substitution in the third positions in codons, e.g. almost all transitions, are silent, but others are not and belong to the class of missense mutations. If we assume that most of the accumulated mutations are in the four fold degenerated codons where each mutation in the third position is silent, we should find differences in the accumulation of mutations in codons where transversions in the third positions are missens (two fold degenerated codons). To check this, we have performed separate walks on the third positions of two fold and four fold degenerated codons. Both classes of codons accumulate mutations and some of these mutations (transversions in two fold degenerated codons) are of missense class. In Fig. 1 we have presented subtraction of DNA walks (DNA walks) for many bacterial genomes. In these walks walkers moved up when the analysed nucleotide in the third codon position was a purine or down when it was a pyrimidine. Each plot shows walks done separately on two-fold degenerated codons (blue lines) and four-fold degenerated codons (red lines). For each genome the two plots where normalised in such a way that for both kinds of codons the shape of curves can be compared. We can observe two different relations. In the Chlamydia trachomatis, Escherichia coli and Haemophilus influenzae genomes the accumulation of transversions in two fold degenerated codons is almost exactly the same as in four fold degenerated codons. On the other hand, in Borrelia burgdorferi genome the number of substitutions accumulated in the two fold degenerated codons is four times lower than the number of mutations accumulated in the four fold degenerated codons.
Note: the begining of the plots is at the origin of replication (also for linear B. burgdorferi genome)

Fig. 1. Detrended DNA walks (subtraction of walk on C strands from walks on W strands) on two fold degenerated (blue) and four fold degenerated (red) codons. Walkers move up when the visited nucleotide at the third position of codon is purine and down when it is pyrimidine. Note: in two fold degenerated codons all transversions are missense mutations. Numbers on X-axis represent positions on chromosome in bp.
 

Since even in the third positions a transversion can change the encoded amino acid, we have performed walks on amino acids coded by ORFs lying on the two DNA strands, and we have subtracted and added the resulting walks to separate the effect of replication-associated mutational pressure from the effect of transcription and/or other effects. In Fig. 2  the effect of replication on amino acid composition of proteins coded by genes lying on leading and lagging strands of many bacterial genomes is shown. Analysing the results of subtraction of walks, we have found amino acids which prevail on the leading or on the lagging strand in different genomes. In genomes of E. coli, B. subtilis, T. pallidum, B. burgdorferi and C. trachomatis Gly, Val, and Asp were relatively more frequently coded on the leading strand, while Ile, Thr, and His on the lagging strand. Nevertheless,  eubacterial genomes differ significantly in prevalence of specific amino acids on leading or lagging strands. These results prove that the previously found skew in the prevalence of some codons in genes transcribed in the direction of replication (Fraser et. al., 1998), is connected to replication-associated mutational pressure.
 
 
 aa
Ala
Arg
 Asn
 Asp
 Cys
 Gln
 Glu
 Gly
 His
 Ile
 Leu
 Lys
 Met
 Phe
 Pro
 Ser
 Thr
 Trp
 Tyr
 Val

Fig. 2. The effect of subtraction of walks ?on aminoacids? for eight prokaryotic genomes. Numbers on y-axis indicate the relative cumulative abundance of the amino acids. Numbers on x-axis represent positions on chromosome in triplets.

In the all axamined genomes no signifficant effects other than these connected with the leading/lagging role of DNA strands on protein composition have been observed. However, in large genomes (E. coli and B. subtilis) addition of DNA walks done for ORFs from W and C strands differentiates regions proximal and distal to the origin of replication of chromosome (Fig. 3). Note that replication-associated effects divide chromosomes into two replichores ? left and right, with extrema in the centre of plots. Other effects which we have observed are connected with proximal/distal parts of chromosomes with extrema near the middle of replichores. The trends at the left and right ends of the plot (Fig. 3) are the same and reciprocal to the trends in the central part of the plots. Central part of the plot corresponds to the region close to the terminus of replication (from both sides), and both ends of plots correspond to regions close to the origin of replication (from both sides).

Fig. 3. Additions of walks on B. subtilis genome done for codons coding twelve amino acids with significant proximal/distal trends. Numbers on X-axis represent positions on chromosome in bp.