Method of extraction of feature in the classification of texts for authorship attribution

Omar González Brito; Jose Luis Tapia Fabela; Silvia Salas Hernández

doi:10.61467/2007.1558.2021.v12i3.225

Authors

Omar González Brito Universidad Autonóma del Estado de México, UAP Tianguistenco
Jose Luis Tapia Fabela Universidad Autónoma del Estado de México, UAP Tianguistenco
Silvia Salas Hernández Universidad Autónoma del Estado de México, Centro Universitario Atlacomulco

DOI:

https://doi.org/10.61467/2007.1558.2021.v12i3.225

Keywords:

Authorship attribution, feature extraction, text classification, supervised learning

Abstract

The authorship attribution has been analyzed mainly through the classification of texts, the process of the extraction of features has been carried out under two approaches: based on profile and instances, through the analysis of textual features or linguistic features that allow finding the style of author's writing. In profile and instance approaches, extracting features from the authors' set of documents results in high feature dimensionality that can impair classification performance.

Therefore, an approach that does not depend on the set of documents for the extraction of features and does not depend on the selection of features is proposed, the classification of texts was carried out with different supervised learning methods. In the present investigation, it is determined if all the features of an author are contained in a single document that describe their writing style. For the experimentation, we worked with three corpus (C10, C50 and PAN12), these were selected based on the review of the literature. According to the results obtained, it was concluded that the approach shows results superior to the state of the art in unbalanced samples, consistent results when evaluated in different contexts and robust when analyzing 10 or 50 authors.

From this approach it is determined that in 500 words without repeating the writing style of an author is contained, presenting a classification accuracy of 79.68%

Author Biographies

Jose Luis Tapia Fabela, Universidad Autónoma del Estado de México, UAP Tianguistenco

Profesor de la Unidad Académica Profesional Tianguistenco de la Universidad Autónoma del Estado de México

Silvia Salas Hernández, Universidad Autónoma del Estado de México, Centro Universitario Atlacomulco

Estudiante del Posgrado en Ciencias de dela Computación, Centro Univertsitario U.A.E.M Atlacomulco

Method of extraction of feature in the classification of texts for authorship attribution

Authors

DOI:

Keywords:

Abstract

Author Biographies

Jose Luis Tapia Fabela, Universidad Autónoma del Estado de México, UAP Tianguistenco

Silvia Salas Hernández, Universidad Autónoma del Estado de México, Centro Universitario Atlacomulco

Downloads

Published

How to Cite

Issue

Section

License

Information

Current Issue