TY - JOUR
T1 - Ancestral sequences of a large promiscuous enzyme family correspond to bridges in sequence space in a network representation
AU - Buchholz, Patrick C. F.
AU - van Loo, Bert
AU - Eenink, Bernard D.G.
AU - Bornberg-Bauer, Erich
AU - Pleiss, Jürgen
N1 - Funding Information:
B.v.L., B.D.G.E. and E.B.B. acknowledge funding by EU under the Horizon 2020 Research and Innovation Framework Programme (grant no. 722610). P.C.F.B. and J.P. acknowledge funding by BMBF (grant no. 031B0571A). Acknowledgements
Publisher Copyright:
© 2021 The Author(s).
PY - 2021/11/3
Y1 - 2021/11/3
N2 - Evolutionary relationships of protein families can be characterized either by networks or by trees. Whereas trees allow for hierarchical grouping and reconstruction of the most likely ancestral sequences, networks lack a time axis but allow for thresholds of pairwise sequence identity to be chosen and, therefore, the clustering of family members with presumably more similar functions. Here, we use the large family of arylsulfatases and phosphonate monoester hydrolases to investigate similarities, strengths and weaknesses in tree and network representations. For varying thresholds of pairwise sequence identity, values of betweenness centrality and clustering coefficients were derived for nodes of the reconstructed ancestors to measure the propensity to act as a bridge in a network. Based on these properties, ancestral protein sequences emerge as bridges in protein sequence networks. Interestingly, many ancestral protein sequences appear close to extant sequences. Therefore, reconstructed ancestor sequences might also be interpreted as yet-to-be-identified homologues. The concept of ancestor reconstruction is compared to consensus sequences, too. It was found that hub sequences in a network, e.g. reconstructed ancestral sequences that are connected to many neighbouring sequences, share closer similarity with derived consensus sequences. Therefore, some reconstructed ancestor sequences can also be interpreted as consensus sequences.
AB - Evolutionary relationships of protein families can be characterized either by networks or by trees. Whereas trees allow for hierarchical grouping and reconstruction of the most likely ancestral sequences, networks lack a time axis but allow for thresholds of pairwise sequence identity to be chosen and, therefore, the clustering of family members with presumably more similar functions. Here, we use the large family of arylsulfatases and phosphonate monoester hydrolases to investigate similarities, strengths and weaknesses in tree and network representations. For varying thresholds of pairwise sequence identity, values of betweenness centrality and clustering coefficients were derived for nodes of the reconstructed ancestors to measure the propensity to act as a bridge in a network. Based on these properties, ancestral protein sequences emerge as bridges in protein sequence networks. Interestingly, many ancestral protein sequences appear close to extant sequences. Therefore, reconstructed ancestor sequences might also be interpreted as yet-to-be-identified homologues. The concept of ancestor reconstruction is compared to consensus sequences, too. It was found that hub sequences in a network, e.g. reconstructed ancestral sequences that are connected to many neighbouring sequences, share closer similarity with derived consensus sequences. Therefore, some reconstructed ancestor sequences can also be interpreted as consensus sequences.
KW - ancestor reconstruction
KW - consensus sequence
KW - network biology
KW - phylogeny
KW - protein evolution
UR - http://www.scopus.com/inward/record.url?scp=85121540556&partnerID=8YFLogxK
U2 - 10.1098/rsif.2021.0389
DO - 10.1098/rsif.2021.0389
M3 - Article
SN - 1742-5689
VL - 18
JO - Journal of the Royal Society Interface
JF - Journal of the Royal Society Interface
IS - 184
M1 - 20210389
ER -