A systematic framework to derive N-glycan biosynthesis process and the automated construction of glycosylation networks

Glycosylation is an important and highly complex post-translational modification that generates an extensive functional capability from a limited set of genes and encompasses the biosynthesis of sugar moieties in the endoplasmic reticulum (ER) and Golgi apparatus [13]. Glycans are highly variable and structurally diverse compounds consisting of a large number of monosaccharides, including mannose, fucose, and galactose, linked through an enzymatic process called glycosylation [4]. Unlike protein structures, glycan structures are neither directly encoded in the genome nor arranged in a simple linear chain [5]. Instead, the structure of secreted and membrane-bound glycans is determined during their assembly in the endoplasmic reticulum and the Golgi apparatus by a controlled sequence of glycosyltransferase and glycosidase processing reactions [6].

One of the major types of glycans attached to asparagine residues of proteins, N-linked glycans, is determined by a manageable number of enzymes that catalyze monosaccharide attachment. N-linked glycosylation occurs co-translationally in endoplasmic reticulum compartments. Glycoproteins migrate into the Golgi apparatus once the protein finishes folding and some residues in the glycan trim successfully [7]. Many of these enzymes can generally accept several N-linked glycans as substrates, therefore generating a large number of glycan products and their glycosylation pathways [7]. Processing involves the removal of mannose groups, which is facilitated by mannosidases, and the addition of diverse monosaccharides driven by specific glycosyltransferases to the substrate glycan. Therefore, the glycosylation pathways of N-linked glycans comprise consecutive enzymatic steps, which are determined by the glycan structures produced by the previous enzyme, to produce a new glycan structure as the substrate of the next glycosylation reaction [7].

From research conducted in the past decades, it is clear that the glycosylation of diseased cells and healthy cells often results in different glycan changes that contribute to pathological progression, leading to the possibility that disease-specific glycan structures exist [812]. This has potential medical applications; for example, the potential to distinguish benign forms of prostate cancer from highly malignant cancer based on the changes in enzymes’ activities and intracellular processing events [13, 14]. Effective engineering of glycosylation pathways can potentially lead to an improved therapeutic performance of glycoprotein products. Considerable progress has been made in Prostate-Specific Antigen (PSA) research; and analytic approaches in analyzing large data sets have also been utilized to expand analyses from PSA to numerous cancers and diseases known to have abnormalities in glycosylation. However, more research is still needed in acquiring and interpreting datasets to completely characterize glycosylation, including enzymatic profiles involved in glycosylation and the large quantities of glycans produced by enzymes.

Fortunately, a wealth of data is available from glycosylation-specific databases such as the Consortium for Functional Glycomics (CFG) website [15] and GlycomeDB [16]. Also, Liquid Chromatography (LC) and Mass Spectrometric (MS) techniques have been emerging as enabling and important techniques in glycomics. A number of LC/MS methods have been incorporated into glycomics workflows for permethylated and aminated glycans reduction [17, 18]. Statistical methods have been proposed to predict glycan structures from gene expression data [1921]. However, these traditional and qualitative methods in biochemistry or cell biology research do not provide a detailed understanding of the complex glycosylation mechanism quantitatively.

Systems biology-based mathematical models have been developed to overcome this limitation [6, 2226]. In this regard, the construction of glycosylation reaction networks in silico is an important step that can enable the quantitative analysis of biochemical experimental data. Whereas several studies have been done to construct glycosylation reaction networks automatically on computers, they are limited by the lack of a systematic definition of the linkage, stereochemical specificity and reaction conditions of enzymes that are involved in the reactions.

Liu and Neelamegham [27] made a significant contribution by designing an open-source MATLAB-based toolbox, Glycosylation Network Analysis Toolbox (GNAT), for studies of systems glycobiology. This toolbox enables a streamlined machine-readable definition for the glycosylation enzyme class and the construction as well as adjustment of glycosylation reaction networks [27].

This paper extends Liu and Neelamegham’s work [27] to predict a wider range of glycans produced by enzymes encountered in human cell expression systems. In addition, our model can be applied to a larger range of experimental conditions that might be encountered in a cell culture environment. We expand the scope by inclusion of additional enzyme classes involved in gene expression data. We extend the framework of KB2005 [6] through involving 22 enzymes (27 enzyme reaction rules) in our network.

To the best of our knowledge, these 22 enzymes (27 enzyme reaction rules) are all enzymes associated with N-glycan that exist in Golgi compartments. Networks constructed can be used to relate the observed mass spectrometric measurements to the underlying gene expression weights. This relationship can be used to quantitatively understand how changes in enzymes’ activities affect the profile of glycan structures produced in the biosynthesis process.