Dimensionality reduction for SARS-CoV-2 antibodies prediction

Authors

  • Yasmín Hernández TecNM/Cenidet
  • Samuel Narciso TecNM/Cenidet
  • P. Alejandra Cuevas-Chávez TecNM/Cenidet
  • Javier Ortiz-Hernández TecNM Centro Nacional de Investigación y Desarrollo Tecnológico
  • Juan Antonio Miguel-Ruiz TecNM Centro Nacional de Investigación y Desarrollo Tecnológico

DOI:

https://doi.org/10.61467/2007.1558.2025.v16i4.1142

Keywords:

Dimensionality reduction, PCA, Air quality, Environment, ARIMA, Predictive model, Public health, Pollution, Guadalajara.

Abstract

The analysis of genomic data allows to comprehend biological processes at the molecular level. A challenging application is the classification of antibodies according to the antigens they bind. Antibodies, the heart of the immune system, are proteins that bind to specific antigens to inactivate pathogens. Antibody classification requires datasets with structural and functional information about antibodies. The Observed Antibody Space is a dataset collecting genomic sequences of antibodies from several species and OAS contains information on the structure of antibodies. This paper examines the impact of dimensionality reduction techniques on the classification of SARS-CoV-2 antibodies, utilizing genetic sequence data from the Observed Antibody Space database. Specifically, we focus on transforming amino acid sequences from the Complementarity Determining Region, CDR, into word embeddings for subsequent processing in machine learning models. This transformation enables the use of unlabeled, high-dimensional data but presents the challenge of the curse of dimensionality, which can affect the models’ accuracy and efficiency. To address this problem, two dimensionality reduction techniques are applied and evaluated: Principal Component Analysis and Uniform Manifold Approximation and Projection. We developed 36 classification models using Support Vector Machines, Random Forests, and K-Nearest Neighbors algorithms, testing each on original datasets and on reduced datasets. The objective is to determine whether dimensionality reduction improves model performance. The study provides insights into how these techniques can facilitate predictive analysis in bioinformatics and contribute to the development of efficient models for identifying relevant antibodies in immunology.

Downloads

Published

2025-10-12

How to Cite

Hernández, Y. ., Narciso, S., Cuevas-Chávez, P. A., Ortiz-Hernández, J., & Miguel-Ruiz, J. A. (2025). Dimensionality reduction for SARS-CoV-2 antibodies prediction. International Journal of Combinatorial Optimization Problems and Informatics, 16(4), 131–145. https://doi.org/10.61467/2007.1558.2025.v16i4.1142

Issue

Section

Advances in Computer Science

Most read articles by the same author(s)