Structural Computational Biology
EnGens
Collecting unsupervised learning algorithms under an umbrella for analysis of protein conformational datasets
EnGens is a pipeline for unsupervised learning for protein conformational datasets. It collects algorithms for dimensionality reduction (PCA, UMAP, TICA, SRV) and clustering (K-Means, GMMs) along with preprocessing steps with the goal of generating representative conformational ensembles.
EnGens works on static and dynamic datasets
Static datasets can be collected from the Protein Databank for a single protein and related isophorms. Dynamic datasets can be generated by running MD simulations. For both types of data, EnGens streamlines analysis of the conformational content and generates representative ensembles.
Making EnGens accessible
To facilitate easy use of the EnGens pipeline, we create a python package, Jupyter Notebooks and a set of Google Colab notebooks for both static and dynamic datasets.
References
If you use EnGens in your work, please cite the tool as shown:
A. Conev, M. M. Rigo, D. Devaurs, A. F. Fonseca, H. Kalavadwala, M. V. de Freitas, C. Clementi, G. Zanatta, D. A. Antunes, and L. E. Kavraki, “EnGens: a computational framework for generation and analysis of representative protein conformational ensembles,” Briefings in Bioinformatics, p. bbad242, Jul. 2023.