Dimensionality reduction for SARS-CoV-2 antibodies prediction
DOI:
https://doi.org/10.61467/2007.1558.2025.v16i4.1142Keywords:
Dimensionality reduction, PCA, Air quality, Environment, ARIMA, Predictive model, Public health, Pollution, Guadalajara.Abstract
The analysis of genomic data allows to comprehend biological processes at the molecular level. A challenging application is the classification of antibodies according to the antigens they bind. Antibodies, the heart of the immune system, are proteins that bind to specific antigens to inactivate pathogens. Antibody classification requires datasets with structural and functional information about antibodies. The Observed Antibody Space is a dataset collecting genomic sequences of antibodies from several species and OAS contains information on the structure of antibodies. This paper examines the impact of dimensionality reduction techniques on the classification of SARS-CoV-2 antibodies, utilizing genetic sequence data from the Observed Antibody Space database. Specifically, we focus on transforming amino acid sequences from the Complementarity Determining Region, CDR, into word embeddings for subsequent processing in machine learning models. This transformation enables the use of unlabeled, high-dimensional data but presents the challenge of the curse of dimensionality, which can affect the models’ accuracy and efficiency. To address this problem, two dimensionality reduction techniques are applied and evaluated: Principal Component Analysis and Uniform Manifold Approximation and Projection. We developed 36 classification models using Support Vector Machines, Random Forests, and K-Nearest Neighbors algorithms, testing each on original datasets and on reduced datasets. The objective is to determine whether dimensionality reduction improves model performance. The study provides insights into how these techniques can facilitate predictive analysis in bioinformatics and contribute to the development of efficient models for identifying relevant antibodies in immunology.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 International Journal of Combinatorial Optimization Problems and Informatics

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.