Multi-Label Topic Classification in a Twitter Corpus of Public Communication of Science in Mexican Spanish
DOI:
https://doi.org/10.61467/2007.1558.2024.v15i4.510Keywords:
natural language processing, Multi-label text classification, public communication of science, Corpus, Machine Learning, Transformers, Deep Learning, Large Language Models, scientific communicationAbstract
In the context of Mexico, comprehensive studies on the public communication of science (PCS) through social networks remain an unaddressed area of research. To address this gap, the present work is conducted from the perspective of natural language processing (NLP). The objective of this study is to develop and evaluate an automatic multilabel topic classification system for PCS tweets published in Mexico. This is achieved by training various machine learning models, which include traditional algorithms and transformer-based models. Utilizing a manually labeled corpus that identifies eighteen distinct areas or themes of science, the study evaluates and compares several approaches for the automatic identification and classification of thematic areas within PCS tweets. The findings indicate that transformer-based models, such as XLM-RoBERTa, demonstrate superior performance compared to classic algorithms, while the emerging LLM models, such as BLOOM, present a promising alternative for a range of NLP tasks.Downloads
Published
2024-11-04
How to Cite
Sánchez-Montero, A., Bel-Enguix, G., & Ojeda-Trueba, S.-L. (2024). Multi-Label Topic Classification in a Twitter Corpus of Public Communication of Science in Mexican Spanish. International Journal of Combinatorial Optimization Problems and Informatics, 15(4), 199–210. https://doi.org/10.61467/2007.1558.2024.v15i4.510
Issue
Section
Articles
License
Copyright (c) 2024 International Journal of Combinatorial Optimization Problems and Informatics
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.