BibliotecaPortal de investigación
es | gl
  • Home
  • Contact us
  • Give feedback
  • Help
    • About Investigo
    • Search and Find
    • Submit
    • Intellectual Property
    • Open Access Policy
  • Links
    • Sherpa / Romeo
    • Dulcinea
    • OpenDOAR
    • Dialnet Plus
    • ORCID
    • Creative Commons
    • UNESCO Nomenclature
    • español
    • English
    • Gallegan
JavaScript is disabled for your browser. Some features of this site may not work without it.
All of InvestigoAuthorsTitles Materias Unesco Research GroupsType of ContentsJournal TitlesThis CollectionAuthorsTitlesUNESCO SubjectsResearch GroupsType of ContentsJournal Titles

Library guides

Self-archivingRequest PermissionRelated guides

Statistics

View Usage Statistics

PPI prediction from sequences via transfer learning on balanced but yet biased datasets: an open problem

Nogueira Rodríguez, AlbaAutor UVIGO; González Peña, DanielAutor UVIGO; Vieira, Cristina P.; Vieira, Jorge; López Fernández, HugoAutor UVIGO
DATE: 2024
UNIVERSAL IDENTIFIER: http://hdl.handle.net/11093/7493
UNESCO SUBJECT: 3314.99 Otras
DOCUMENT TYPE: conferenceObject

ABSTRACT

Computational approaches for Protein-Protein Interaction (PPI) prediction, and particularly, methods that predict interactions by leveraging only amino acid sequences are of paramount interest. In this study, we aimed to evaluate the suitability of pre-trained protein sequence embeddings, namely ProtBert and SeqVec, as feature extractors for classical machine learning algorithms. Consistent with recent reports, we found that performance metrics calculated over random train-test splits of balanced PPIs datasets, such as holdout or cross-validation, lead to highly overestimated values, mainly due to a non-evident bias present in such datasets. We demonstrate this bias by using two PPIs datasets and conducting a 5-fold cross-validation, which yields relatively high values for most tested models, including a custom baseline model, named PPIIBM, which predicts the interaction status based only on the a priori positivity of proteins found in the train split only. This baseline PPIIBM model achieves results similar to state of the art models, even of those based on deep learning, showing that predicting PPIs from sequences remains an open challenge, where careful validation pipelines should be implemented
Show full item record

Files in this item

[PDF]
Name:
2024_lopez_ppi_predictions.pdf
Size:
575.9Kb
Format:
PDF
Description:
Embargo indefinido por copyright
View/Open

Send to

MendeleyZoteroRefworks

The Institutional Repository of the University of Vigo Investigo is disseminated in:

University library
Rúa Leonardo da Vinci, s/n
As Lagoas, Marcosende
36310 Vigo

Location

Information
+34 986 813 821
investigo@uvigo.gal

Accessibility | Legal notice | Data protection
Logo UVigo

INFORMACIÓN
+34 986 812 000
informacion@uvigo.gal

CONTACTO

CAMPUS DO MAR

CAMPUS DE OURENSE
+34 988 387 102
Campus da Auga

CAIXA DE QUEIXAS, SUXESTIÓNS E PARABÉNS

TRANSPARENCIA

CAMPUS DE PONTEVEDRA
+34 986 801 949
Campus CREA

OUTRAS WEBS INSTITUCIONAIS

EMERXENCIAS

CAMPUS DE VIGO
+34 986 812 000
Campus Vigo Tecnolóxico

MURO SOCIAL