UTRs were predicted by identifying the operons’ boundaries These

UTRs were predicted by identifying the operons’ boundaries. These were defined as sharp declines in coverage of the regions upstream or downstream of the start or stop codons, respectively (Methods).

Accordingly, 745 5’UTRs were identified and the median UTR length was approximately 29 nucleotides (nt) (Sheet 1 of Additional file 2). Although most 5’UTRs were small and typically similar to many other bacterial [24, 34], 8.86% of the 5’UTRs identified were longer than 100 nt. Long 5’UTR, particularly in prokaryotes, may contain cis-regulation element(s) such as the Shine-Dalgarno (SD) sequence, which mediates mRNA translational efficiency. Potential RNA elements (5’UTR > 15 nt) were scanned using the Rfam [35], but no conserved elements were identified. These observations are in agreement with previous work [36] and suggest Prochlorococcus may contain unknown cis-regulatory GANT61 sequences, like targets for ncRNAs. We also identified 337 3’UTRs (Sheet 2 of Additional file 2). When these sequences (3’UTR > 10 nt) were searched by the ARNold [37], only 11 significant termination signals were identified (Sheet 2 of Additional file 2). However, the high proportion (35.6%) of long 3’UTRs (> 60 nt) suggests that these regions may have other important roles that require further exploration. To identify new ORFs and ncRNAs, we analyzed the intergenic regions determined by current gene annotation (Sheet 2 of Additional file 3). Seven transcript units were identified

with high confidence, including two ORFs and five ncRNAs (Additional file 4). The two ORFs were conserved hypothetical proteins see more present in related subspecies such as P. marinus MIT9202, P. marinus W9, and P. marinus Telomerase MIT9515. All five identified ncRNAs were expressed in at least eight conditions (Additional file 4). In particular, TibYfr5 was the highest expressed ncRNA among five predicted ncRNAs, whereas TibYfr1 consistently showed the highest abundance under the light–dark conditions [38]. This suggests that TibYfr1

and TibYfr5 expression level may be influenced by changes in light. Highly expressed genes were overrepresented in the core genome but not in the flexible genome Using genome-wide expression data, we compared gene expression profiles between the MED4 core and flexible genomes [6]. Up to 94.3% of the 1251 genes in the core genome were expressed, and this was significantly higher than 84.9% of the genes expressed in the flexible genome (P < 0.001). Furthermore, a moderate but significant correlation was observed between the gene expression levels (mean RPKM of ten samples for each gene) and corresponding protein nonsynonymous substitution rates (Ka) (N = 1275, Spearman’s r = -0.68, P < 0.001; Figure 2). This observation that higher expressed genes evolve slowly, which has been observed in various organisms [13, 15, 17], might also be true in Prochlorococcus MED4. Figure 2 Correlation between the gene expression levels and nonsynonymous substitution rates (Ka).

Comments are closed.