analysis of glycan immunogenicity



Deep-Learning Resources for Studying Glycan-

Mediated Host-Microbe Interactions 

Graphical Abstract


d Glycan-focused language models can be used for sequence-

to-function models

d Information in glycans predicts immunogenicity,

pathogenicity, and taxonomic origin

d Glycan alignments shed light on bacterial virulence

Bojar et al., 2021, Cell Host & Microbe 29, 132–144 January 13, 2021, ª 2020 The Author(s). Published by Elsevier In


Daniel Bojar, Rani K. Powers,

Diogo M. Camacho, James J. Collins

Correspondence (D.M.C.), (J.J.C.)

In Brief

Bojar et al. present a workflow that

combines machine learning and

bioinformatics techniques to analyze the

the prominent role of glycans in host-microbe

interactions. The herein developed

glycan-focused language models and

alignments allow for the prediction and

analysis of glycan immunogenicity,

association with pathogenicity, and

taxonomic classification.

c. ll






Deep-Learning Resources for Studying Glycan-Mediated Host-Microbe Interactions Daniel Bojar,1,2 Rani K. Powers,1,2 Diogo M. Camacho,1,4,* and James J. Collins1,2,3,4,5,* 1Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA 2Department of Biological Engineering and Institute for Medical Engineering & Science, Massachusetts Institute of Technology, Cambridge,

MA 02139, USA 3Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA 4These authors contributed equally 5Lead Contact

*Correspondence: (D.M.C.), (J.J.C.)


Glycans, the most diverse biopolymer, are shaped by evolutionary pressures stemming from host-microbe interactions. Here, we present machine learning and bioinformatics methods to leverage the evolutionary in- formation present in glycans to gain insights into how pathogens and commensals interact with hosts. By using techniques from natural language processing, we develop deep-learning models for glycans that are trained on a curated dataset of 19,299 unique glycans and can be used to study and predict glycan functions. We show that these models can be utilized to predict glycan immunogenicity and the pathogenicity of bac- terial strains, as well as investigate glycan-mediated immune evasion via molecular mimicry. We also develop glycan-alignment methods and use these to analyze virulence-determining glycan motifs in the capsular polysaccharides of bacterial pathogens. These resources enable one to identify and study glycan motifs involved in immunogenicity, pathogenicity, molecular mimicry, and immune evasion, expanding our under- standing of host-microbe interactions.


In contrast to RNA and proteins, whose sequences can be eluci-

dated from their associated DNA sequence, glycans are the only

biopolymer outside the rules of the central dogma of molecular

biology. Although glycans are synthesized by DNA-encoded en-

zymes (Lairson et al., 2008), an individual glycan sequence is

dependent on the interplay between multiple enzymes and

cellular conditions. Additionally, the expansive glycan alphabet

of hundreds of different monosaccharides allows for a large

number of potential oligosaccharides, built with different mono-

saccharides, lengths, connectivity, and branching. Glycans are

present as modifications on all other biopolymers (Varki, 2017),

exerting varying effects on biomolecules, including stabilization

and modulation of their functionality (Dekkers et al., 2017; Solá

and Griebenow, 2009). Apart from influencing the function of in-

dividual proteins, glycans are also crucial for cell-cell contact in

the case of glycan-glycan interactions during the attachment of

pathogenic bacteria to host cells (Day et al., 2015), and they

mediate essential developmental processes such as nervous

system development (Haltiwanger and Lowe, 2004). Recently,

Lauc et al. hypothesized that the plethora of available glycoforms

and their plasticity facilitated the evolution of complex multicel-

lular lifeforms (Lauc et al., 2014), reasoning that is supported

by the essential roles of glycans in developmental processes

132 Cell Host & Microbe 29, 132–144, January 13, 2021 ª 2020 The This is an open access article under the CC BY-NC-ND license (http://

and cell-cell communication and emphasizes the evolutionary

information in glycans.

Because glycans make up the outermost layer of both eukary-

otic and prokaryotic cells, cross-kingdom interactions will

necessarily involve these molecules (Day et al., 2015). The prom-

inent role of glycans in host-pathogen interactions (Varki, 2017)

has resulted in evolutionary pressures and opportunities on

both sides of the interaction—natural selection can modify

host glycan receptors used by pathogens without losing their

functionalities, whereas pathogens and commensals need to

alter their glycans to evade the host immune system. These inter-

actions provide a window into understanding glycan-mediated

host-microbe relationships. Glycans display great phenotypic

variability: sequences can be changed depending on environ-

mental conditions, such as the level of extracellular metabolites

(Park et al., 2017), without the need for genetic mutations, poten-

tially facilitating rapid responses to changes in host-microbe


Given the aforementioned glycan-mediated host-microbe in-

teractions, glycans could provide insights into pathogenicity

and commensalism determinants, as, for instance, molecular

mimicry of host glycans by both pathogens and commensals fa-

cilitates their immune evasion (Carlin et al., 2009; Varki and Gag-

neux, 2015). Additional therapeutic potential is enabled by the

widespread usage of glycans by viruses for cell adhesion and

Author(s). Published by Elsevier Inc.


"Is this question part of your assignment? We can help"