NLP with Transformers for toxicity detection: corpus construction and evaluation for MisProfesores.com platform
DOI:
https://doi.org/10.61467/2007.1558.2025.v16i2.534Keywords:
Sentiment analysis, Deep learning, BERT, Corpus, Toxicity, TransformersAbstract
The growth of social networks as mass media has enabled faster and closer interaction between users, but it also presents challenges, such as the risk of spreading hate speech. Early detection of such harmful posts is critical. This article presents a methodology to create a unique corpus of Spanish-language comments collected from MisProfesores.com platform, covering all states in Mexico. This process resulted in a dataset of 18,000 unlabeled samples and 853 manually labeled samples. In addition to describing the corpus construction process, the results of the evaluation of different models trained with these data are presented, as well as their comparison with previous works for toxicity detection, highlighting the relevance of the Spanish corpus development for specific tasks. As a result, our Transformer-based model performed better than the state-of-the-art models in the binary toxicity classification, reaching a value of 0.9649 in accuracy and 0.9645 in F1 score.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 International Journal of Combinatorial Optimization Problems and Informatics

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.