To address this question we randomized the O-glycosylation
positions for all the proteins. In this new set of data, the proteins displayed the same number of O-glycosylation sites as predicted by NetOGlyc but their positions were chosen at random. When these hypothetical proteins were analyzed in search of pHGRs, we obtained the results presented in Figure 3. The number of proteins displaying pHGRs was considerably smaller when the positions of the O-glycosylation sites were randomized. Between 42.6% (S. cerevisiae) to 75.7% (M. grisea) of the proteins displaying pHGRs with the O-glycosylation sites predicted by NetOGlyc lost them with the randomization of the O-glycosylation positions, indicating that at least in the majority of proteins there is really a selective pressure to localize the O-glycosylation sites grouped in pHGRs. The total number Trichostatin A research buy of pHGRs
was also lower with the randomized data (Figure 3B), although in this case the difference was not so big, and in the case of S. cerevisiae the total number of pHGRs actually increased with the randomization of the O-glycosylation positions. The Selonsertib cell line reason for this result may be related to the presence of proteins predicted to have a very high number of O-glycosylation sites in this yeast, for which the randomization of the O-glycosylation positions leads to the scattering of the sites throughout the whole protein and the appearance of a greater number of smaller pHGRs. As discussed before, S. cerevisiae differentiates from the rest of the organisms under study in the sense that it possesses a higher proportion of these highly O-glycosylated proteins (Figure 2). Figure 3 Effect of the randomization of the position of the O -glycosylation sites on pHGR prediction. Number of proteins with pHGRs (A) and total number of pHGRs (B) found in every genome with the O-glycosylation positions predicted by NetOGlyc (blue columns) or the randomized positions (red columns). pHGRs
show a small tendency to be located at protein ends We then addressed the question of whether the location of pHGRs shows a random distribution along the click here length of the proteins or, alternatively, there is preference for any given regions such as the C- or N-terminus. The central positions of all pHGRs detected next for any given organism were calculated and classified in ten different groups according to their relative location along their respective protein. The first group contained those pHGRs having their center in the N-terminal 10% of the protein sequence; the second group those with center in the second 10%, and so on. Figure 4A shows the frequency distribution of these ten groups for the eight fungi and indicates that there is no clear preference for any protein region, although slightly higher frequencies are observed for the N- and C-terminus, especially the latter, for almost all fungi examined. The clearer exception is S.