Structural computational biology

Structural Computational Biology

EnGens

EnGens collects unsupervised learning algorithms and adapts them to protein conformational datasets to generate representative conformational ensembles.

Collecting unsupervised learning algorithms under an umbrella for analysis of protein conformational datasets

EnGens is a pipeline for unsupervised learning for protein conformational datasets. It collects algorithms for dimensionality reduction (PCA, UMAP, TICA, SRV) and clustering (K-Means, GMMs) along with preprocessing steps with the goal of generating representative conformational ensembles.

EnGens works on static and dynamic datasets

Static datasets can be collected from the Protein Databank for a single protein and related isophorms. Dynamic datasets can be generated by running MD simulations. For both types of data, EnGens streamlines analysis of the conformational content and generates representative ensembles.

Making EnGens accessible

To facilitate easy use of the EnGens pipeline, we create a python package, Jupyter Notebooks and a set of Google Colab notebooks for both static and dynamic datasets.

Further links

Check out our github repo here: Github
Check out our Google Colab here: Colab
You can find the paper here: Paper

References

If you use EnGens in your work, please cite the tool as shown:

A. Conev, M. M. Rigo, D. Devaurs, A. F. Fonseca, H. Kalavadwala, M. V. de Freitas, C. Clementi, G. Zanatta, D. A. Antunes, and L. E. Kavraki, “EnGens: a computational framework for generation and analysis of representative protein conformational ensembles,” Briefings in Bioinformatics, p. bbad242, Jul. 2023.
PDF Publisher