Show simple item record

dc.contributor.authorNovo Lourés, María 
dc.contributor.authorRuano Ordás, David Alfonso 
dc.contributor.authorPavón Rial, Maria Reyes 
dc.contributor.authorLaza Fidalgo, Rosalía 
dc.contributor.authorGómez Meire, Silvana 
dc.contributor.authorMéndez Reboredo, José Ramón 
dc.date.accessioned2021-11-26T08:00:28Z
dc.date.available2021-11-26T08:00:28Z
dc.date.issued2022-03
dc.identifier.citationInformation Processing & Management, 59(2): 102812 (2022)spa
dc.identifier.issn03064573
dc.identifier.urihttp://hdl.handle.net/11093/2761
dc.descriptionFinanciado para publicación en acceso aberto: Universidade de Vigo/CISUG
dc.description.abstractThis study addresses the usage of different features to complement synset-based and bag-of-words representations of texts in the context of using classical ML approaches for spam filtering (Ferrara, 2019). Despite the existence of a large number of complementary features, in order to improve the applicability of this study, we have selected only those that can be computed regardless of the communication channel used to distribute content. Feature evaluation has been performed using content distributed through different channels (social networks and email) and classifiers (Adaboost, Flexible Bayes, Naïve Bayes, Random Forests, and SVMs). The results have revealed the usefulness of detecting some non-textual entities (such as URLs, Uniform Resource Locators) in the addressed distribution channels. Moreover, we also found that compression properties and/or information regarding the probability of correctly guessing the language of target texts could be successfully used to improve the classification in a wide range of situations. Finally, we have also detected features that are influenced by specific fashions and habits of users of certain Internet services (e.g. the existence of words written in capital letters) that are not useful for spam filtering.en
dc.description.sponsorshipXunta de Galicia | Ref. ED481D-2021/024spa
dc.description.sponsorshipAgencia Estatal de Investigación | Ref. TIN2017-84658-C2-1-Rspa
dc.language.isoengen
dc.publisherInformation Processing & Managementspa
dc.relationinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2017-84658-C2-1-R/ES/INTEGRACION DE CONOCIMIENTO SEMANTICO PARA EL FILTRADO DE SPAM BASADO EN CONTENIDO
dc.rightsAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.titleEnhancing representation in the context of multiple-channel spam filteringeng
dc.typearticlespa
dc.rights.accessRightsopenAccessspa
dc.identifier.doi10.1016/j.ipm.2021.102812
dc.identifier.editorhttps://doi.org/10.1016/j.ipm.2021.102812spa
dc.publisher.departamentoInformáticaspa
dc.publisher.grupoinvestigacionSistemas Informáticos de Nova Xeraciónspa
dc.publisher.grupoinvestigacionGrupo de Informática Gráfica y Multimedia (Gig)spa
dc.subject.unesco3304.99 Otrasspa
dc.date.updated2021-11-25T17:06:19Z
dc.computerCitationpub_title=Information Processing & Management|volume=59|journal_number=2|start_pag=102812|end_pag=spa


Files in this item

[PDF]

    Show simple item record

    Attribution 4.0 International
    Except where otherwise noted, this item's license is described as Attribution 4.0 International