dc.contributor.author | Novo Lourés, María | |
dc.contributor.author | Ruano Ordás, David Alfonso | |
dc.contributor.author | Pavón Rial, Maria Reyes | |
dc.contributor.author | Laza Fidalgo, Rosalía | |
dc.contributor.author | Gómez Meire, Silvana | |
dc.contributor.author | Méndez Reboredo, José Ramón | |
dc.date.accessioned | 2021-11-26T08:00:28Z | |
dc.date.available | 2021-11-26T08:00:28Z | |
dc.date.issued | 2022-03 | |
dc.identifier.citation | Information Processing & Management, 59(2): 102812 (2022) | spa |
dc.identifier.issn | 03064573 | |
dc.identifier.uri | http://hdl.handle.net/11093/2761 | |
dc.description | Financiado para publicación en acceso aberto: Universidade de Vigo/CISUG | |
dc.description.abstract | This study addresses the usage of different features to complement synset-based and bag-of-words representations of texts in the context of using classical ML approaches for spam filtering (Ferrara, 2019). Despite the existence of a large number of complementary features, in order to improve the applicability of this study, we have selected only those that can be computed regardless of the communication channel used to distribute content. Feature evaluation has been performed using content distributed through different channels (social networks and email) and classifiers (Adaboost, Flexible Bayes, Naïve Bayes, Random Forests, and SVMs). The results have revealed the usefulness of detecting some non-textual entities (such as URLs, Uniform Resource Locators) in the addressed distribution channels. Moreover, we also found that compression properties and/or information regarding the probability of correctly guessing the language of target texts could be successfully used to improve the classification in a wide range of situations. Finally, we have also detected features that are influenced by specific fashions and habits of users of certain Internet services (e.g. the existence of words written in capital letters) that are not useful for spam filtering. | en |
dc.description.sponsorship | Xunta de Galicia | Ref. ED481D-2021/024 | spa |
dc.description.sponsorship | Agencia Estatal de Investigación | Ref. TIN2017-84658-C2-1-R | spa |
dc.language.iso | eng | en |
dc.publisher | Information Processing & Management | spa |
dc.relation | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2017-84658-C2-1-R/ES/INTEGRACION DE CONOCIMIENTO SEMANTICO PARA EL FILTRADO DE SPAM BASADO EN CONTENIDO | |
dc.rights | Attribution 4.0 International | |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | |
dc.title | Enhancing representation in the context of multiple-channel spam filtering | eng |
dc.type | article | spa |
dc.rights.accessRights | openAccess | spa |
dc.identifier.doi | 10.1016/j.ipm.2021.102812 | |
dc.identifier.editor | https://doi.org/10.1016/j.ipm.2021.102812 | spa |
dc.publisher.departamento | Informática | spa |
dc.publisher.grupoinvestigacion | Sistemas Informáticos de Nova Xeración | spa |
dc.publisher.grupoinvestigacion | Grupo de Informática Gráfica y Multimedia (Gig) | spa |
dc.subject.unesco | 3304.99 Otras | spa |
dc.date.updated | 2021-11-25T17:06:19Z | |
dc.computerCitation | pub_title=Information Processing & Management|volume=59|journal_number=2|start_pag=102812|end_pag= | spa |