Analysis Strategy and Research Method for N-linked Intact Glyco

  • click to rate

    3.5. Selection of sugar chain and peptide database

    In the process of database search, choosing different databases will also affect the search time and identification results. At present, the commonly used databases are mainly the following:

    3.5.1. Select whole genome database

    For example, Qin et al. searched for the glycosylation site of fetuin-A in the Uniprot database. Woo et al. identified glycopeptides of human cancer cell line proteins in the human protein database. Woodin et al. provided the existing resource list and detailed description of the glycosylation database, which will not be repeated here. It should be pointed out that because the glycopeptide database is generally composed of all polypeptides containing NXS / T (X cannot be proline) sequence in the genome database. All possible sugar chains are constructed one by one (that is, all possible combinations), so the whole genome-based complete glycopeptide database is generally large. It can be seen that the identification of glycopeptides based on the whole genome database can identify various sugars more comprehensively peptide, but due to the large database, the matching speed is slow and the false positive rate is greatly increased. Since the O-glycosylation site does not have a fixed conserved sequence, the O-glycopeptide constructed based on the whole genome protein sequence database will be even larger, which is one of the important reasons why O-glycopeptides are more difficult to analyze.

    3.5.2. Create a unique database of samples based on the identification results

    Parker et al. used the N-sugar data obtained by tandem mass spectrometry to establish an N-glycosylation-modified database, and then used PNGase F to process the enriched glycopeptides to remove N-sugars. After mass spectrometry analysis and database search the glycopeptide central cascade database containing each glycopeptide fragment. After searching these two self-built databases, the CID and HCD MS2 spectra of N-glycopeptides were efficiently identified from 161 rat brain tissue glycoproteins. A total of 863 complete N-linked glycopeptides were produced. Toghi Eshghi et al used tandem mass spectrometry to analyze the glycosylation site peptides of samples, and established a library of glycosylation site peptides of samples. The spectrum of glycopeptides is compared, so that the spectrum of each complete glycopeptide is assigned to different glycosylation site polypeptides, and the sugar occupying each site is determined by the difference in molecular mass. The self-built sample unique database has a relatively small database, accurate identification and fast running speed, but the comprehensiveness of the search results of this method is less than that of the whole genome database.

    3.6. Evaluation and control of false positive rate

    After searching the MS2 spectrum database, it is necessary to comprehensively evaluate and control its possible false positive matches. Among them, the establishment of a decoy database (decoy database) is the most commonly used method in false discovery rate (FDR) evaluation. This One method is to add an apparently incorrect "decoy" sequence to the search space. Searching with the same parameters will correspond to the apparently incorrect search results, but these incorrect results may be used in some identification methods. It is regarded as correct. Therefore, the number of such incorrectly regarded as correct phenomena can be used as a good estimate of the number of false positives, that is, FDR is evaluated according to the percentage of the map matched to the decoy glycopeptide. The specific implementation of the decoy database method is different, and the following two methods are commonly used:

    3.6.1. Sequence reversal

    Reversing the sequence of polypeptide sequences and creating a reverse database is the most common method. Reverse database consists of reversing the sequence of amino acids sequence of all identified glycosylation site polypeptides and dividing it into targets. The database has decoy peptides of the same length. Therefore, the reverse database and the target database have similar peptide number, peptide length and precursor ion molecular mass, but have completely different MS2 maps. Therefore, the target peptide matches the reverse database It is similar to the random matching in the target database. GPQuest uses this method to evaluate FDR. Parker et al. reverse the sequence of peptide sequences of the peptide-spectrum matches (PSMs) and fragment ion matching, FDR evaluation based on matching frequency, similar to the principle of this method.

    3.6.2. Random matching

    Yang et al. used gp120 to deglycosylate the b / y ion of the peptide to match with a glycopeptide that does not contain any gp120 glycopeptide, that is, to use a completely irrelevant database to match the target database.

    3.6.3. False positive control

    The above is only a method to estimate the false positive rate, but in the actual analysis, find the factors that affect the false positive rate and exclude the influencing factors by setting search parameters in the actual data analysis, and reduce the false positive rate as much as possible for glycopeptide analysis For example, in the analysis of intact glycopeptides, many laboratories have pointed out that the auxiliary effect of Y1 ion, the correction of monoisotopic peaks and the b / y ion abundance reference can effectively reduce the false positive rate, thereby enhancing the identification of intact glycopeptides.

    Conclusion

    In recent years, with the development of research techniques related to glycoproteomics, methods for identifying the peptide composition and sugar chain structure of intact glycopeptides by mass spectrometry have been established one after another. People are enriching in intact glycopeptides, mass spectrometry analysis, and spectrum identification. Part identification of peptides and determination of sugar chain structure, etc. have initially established a comprehensive set of analysis strategies and workflows. The establishment and development of these analysis strategies and software tools have enabled us to characterize the glycosylation sites of glycoproteins. Understanding of sugar chain structure information becomes possible. But to achieve more accurate identification and analysis of glycopeptides, it also depends on further exploration and breakthroughs in many links in the above analysis process. In addition, high-throughput quantitative methods for intact glycopeptides also needs further development. I believe that with the continuous innovation and improvement of technical methods, the analysis and research of intact glycopeptides will continue to advance. These glycoproteomics research tools will be used as genomics, proteomics, glycomics, and metabolomics. It is a powerful supplement to provide powerful research for protein glycosylation structure and function analysis, discovery of new biomarkers and pathogenic mechanism analysis.

    References

    1.  Haltiwanger R S. Regulation of signal transduction pathways in development by glycosylation. Current Opinion in Structural Biology, 2002, 12(5): 593-598 DOI:10.1016/S0959-440X (02)00371-8
    2.  Hirabayashi J, Arata Y, Kasai K. Glycome project: Concept, strategy and preliminary application to Caenorhabditis elegans. Proteomics, 2001, 1(2): 295-303 DOI:10.1002/(ISSN)1615-9861
    3.  Cao Q C, Zhao X Y, Zhao Q, et al. Strategy integrating stepped fragmentation and glycan diagnostic ion-based spectrum refinement for the identification of core fucosylated glycoproteome using mass spectrometry. Analytical Chemistry, 2014, 86(14): 6804-6811 DOI:10.1021/ac501154a
    4.  Zauner G, Deelder A M, Wuhrer M. Recent advances in hydrophilic interaction liquid chromatography (HILIC) for structural glycomics. Electrophoresis, 2011, 32(24): 3456-3466 DOI:10.1002/elps.201100247
    5.  Rudd P M, Elliott T, Cresswell P, et al. Glycosylation and the immune system. Science, 2001, 291(5512): 2370-2376 DOI:10.1126/science.291.5512.2370