Resource
Deep-Learning Resources for Studying Glycan-
Mediated Host-Microbe Interactions
Graphical Abstract
Highlights
d Glycan-focused language models can be used for sequence-
to-function models
d Information in glycans predicts immunogenicity,
pathogenicity, and taxonomic origin
d Glycan alignments shed light on bacterial virulence
Bojar et al., 2021, Cell Host & Microbe 29, 132–144 January 13, 2021, ª 2020 The Author(s). Published by Elsevier In https://doi.org/10.1016/j.chom.2020.10.004
Authors
Daniel Bojar, Rani K. Powers,
Diogo M. Camacho, James J. Collins
Correspondence diogo.camacho@wyss.harvard.edu (D.M.C.), jimjc@mit.edu (J.J.C.)
In Brief
Bojar et al. present a workflow that
combines machine learning and
bioinformatics techniques to analyze the
the prominent role of glycans in host-microbe
interactions. The herein developed
glycan-focused language models and
alignments allow for the prediction and
analysis of glycan immunogenicity,
association with pathogenicity, and
taxonomic classification.
c. ll
OPEN ACCESS
ll
Resource
Deep-Learning Resources for Studying Glycan-Mediated Host-Microbe Interactions Daniel Bojar,1,2 Rani K. Powers,1,2 Diogo M. Camacho,1,4,* and James J. Collins1,2,3,4,5,* 1Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA 2Department of Biological Engineering and Institute for Medical Engineering & Science, Massachusetts Institute of Technology, Cambridge,
MA 02139, USA 3Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA 4These authors contributed equally 5Lead Contact
*Correspondence: diogo.camacho@wyss.harvard.edu (D.M.C.), jimjc@mit.edu (J.J.C.)
https://doi.org/10.1016/j.chom.2020.10.004
SUMMARY
Glycans, the most diverse biopolymer, are shaped by evolutionary pressures stemming from host-microbe interactions. Here, we present machine learning and bioinformatics methods to leverage the evolutionary in- formation present in glycans to gain insights into how pathogens and commensals interact with hosts. By using techniques from natural language processing, we develop deep-learning models for glycans that are trained on a curated dataset of 19,299 unique glycans and can be used to study and predict glycan functions. We show that these models can be utilized to predict glycan immunogenicity and the pathogenicity of bac- terial strains, as well as investigate glycan-mediated immune evasion via molecular mimicry. We also develop glycan-alignment methods and use these to analyze virulence-determining glycan motifs in the capsular polysaccharides of bacterial pathogens. These resources enable one to identify and study glycan motifs involved in immunogenicity, pathogenicity, molecular mimicry, and immune evasion, expanding our under- standing of host-microbe interactions.
INTRODUCTION
In contrast to RNA and proteins, whose sequences can be eluci-
dated from their associated DNA sequence, glycans are the only
biopolymer outside the rules of the central dogma of molecular
biology. Although glycans are synthesized by DNA-encoded en-
zymes (Lairson et al., 2008), an individual glycan sequence is
dependent on the interplay between multiple enzymes and
cellular conditions. Additionally, the expansive glycan alphabet
of hundreds of different monosaccharides allows for a large
number of potential oligosaccharides, built with different mono-
saccharides, lengths, connectivity, and branching. Glycans are
present as modifications on all other biopolymers (Varki, 2017),
exerting varying effects on biomolecules, including stabilization
and modulation of their functionality (Dekkers et al., 2017; Solá
and Griebenow, 2009). Apart from influencing the function of in-
dividual proteins, glycans are also crucial for cell-cell contact in
the case of glycan-glycan interactions during the attachment of
pathogenic bacteria to host cells (Day et al., 2015), and they
mediate essential developmental processes such as nervous
system development (Haltiwanger and Lowe, 2004). Recently,
Lauc et al. hypothesized that the plethora of available glycoforms
and their plasticity facilitated the evolution of complex multicel-
lular lifeforms (Lauc et al., 2014), reasoning that is supported
by the essential roles of glycans in developmental processes
132 Cell Host & Microbe 29, 132–144, January 13, 2021 ª 2020 The This is an open access article under the CC BY-NC-ND license (http://
and cell-cell communication and emphasizes the evolutionary
information in glycans.
Because glycans make up the outermost layer of both eukary-
otic and prokaryotic cells, cross-kingdom interactions will
necessarily involve these molecules (Day et al., 2015). The prom-
inent role of glycans in host-pathogen interactions (Varki, 2017)
has resulted in evolutionary pressures and opportunities on
both sides of the interaction—natural selection can modify
host glycan receptors used by pathogens without losing their
functionalities, whereas pathogens and commensals need to
alter their glycans to evade the host immune system. These inter-
actions provide a window into understanding glycan-mediated
host-microbe relationships. Glycans display great phenotypic
variability: sequences can be changed depending on environ-
mental conditions, such as the level of extracellular metabolites
(Park et al., 2017), without the need for genetic mutations, poten-
tially facilitating rapid responses to changes in host-microbe
relationships.
Given the aforementioned glycan-mediated host-microbe in-
teractions, glycans could provide insights into pathogenicity
and commensalism determinants, as, for instance, molecular
mimicry of host glycans by both pathogens and commensals fa-
cilitates their immune evasion (Carlin et al., 2009; Varki and Gag-
neux, 2015). Additional therapeutic potential is enabled by the
widespread usage of glycans by viruses for cell adhesion and
Author(s). Published by Elsevier Inc. creativecommons.org/licenses/by-nc-nd/4.0/).