DeepGO-SE Revolutionizes Protein Function Prediction with Knowledge-Enhanced Learning

DeepGO-SE Revolutionizes Protein Function Prediction | The Lifesciences Magazine

In a breakthrough study recently published in the prestigious journal Nature Machine Intelligence, scientists have unveiled a cutting-edge method named “DeepGO-SE” for predicting gene ontology (GO) functions from protein sequences. Overcoming the challenges posed by limited known functions and the complexity of protein interactions, the researchers leveraged a large, pre-trained protein language model to significantly advance protein function prediction.

The Challenge of Protein Function Prediction

While advancements in protein structure prediction have been notable, the accurate prediction of protein functions remains a formidable challenge. This challenge is compounded by the limited number of known functions, intricate protein interactions, and the varying sequences of proteins with similar structures. The Gene Ontology (GO) framework, which categorizes proteins into three sub-ontologies based on molecular functions (MFO), biological processes (BPO), and cellular components (CCO), plays a crucial role in describing protein functions.

DeepGO-SE Method and Performance Evaluation

In their groundbreaking study, researchers introduced DeepGO-SE, a method that employs knowledge-enhanced learning through semantic entailment. The method comprises three crucial steps: generating an approximate model using ELEmbeddings based on GO axioms, utilizing evolutionary scale model 2 (ESM2) embeddings to represent single proteins, and repeating the process to generate multiple models for approximate semantic entailment.

Comparative analysis against five baseline methods using a UniProtKB/Swiss-Prot dataset demonstrated DeepGO-SE’s superiority. In molecular functions (MFO), DeepGO-SE achieved a maximum F measure (F max) of 0.554, outperforming DeepGoZero and MLP methods by 7%. In biological processes (BPO), its F max of 0.432 surpassed DeepGraphGO by 8%. For cellular components (CCO), DeepGO-SE excelled with an F max of 0.721. Notably, the team modified protein embeddings to incorporate additional information about the proteome and its interactions, leading to further improvements.

Incorporating protein-protein interactions (PPIs) in DeepGO-SE enhanced CCO prediction (F max: 0.736). Experiments combining ESM2 embeddings with PPIs in DeepGOGAT-SE showed mixed results, with a decrease in MFO prediction performance but marginal improvement in semantic distance. DeepGOGATMF-SE and DeepGOGATMF-SE-Pred, utilizing experimental annotations and model-derived prediction scores, respectively, demonstrated improved BPO prediction.

The team also validated their findings using the neXtPro dataset, where DeepGO-SE outperformed competitors with an F max of 0.386. A detailed ablation study underscored the importance of individual components, revealing the impact of ELEmbeddings axiom loss functions on MFO performance and the role of axioms and semantic entailment in BPO and CCO predictions.

DeepGO-SE emerges as a groundbreaking advancement in the realm of protein function prediction, showcasing the potential of knowledge-enhanced learning to overcome traditional limitations. This method not only outperforms existing techniques but also provides a more comprehensive understanding of protein functions, paving the way for further advancements in molecular biology and bioinformatics.

Share Now