Instance Selection for Hybrid and Incomplete Data based on Clustering

Authors

  • Claudia C. Tusell-Rey Instituto Politécnico Nacional, Centro de Investigación en Computación, Juan de Dios Bátiz s/n, GAM, CDMX 07738
  • Yenny Villuendas-Rey Instituto Politécnico Nacional, Centro de Innovación y Desarrollo Tecnológico en Cómputo
  • Viridiana Salinas-García Instituto Politécnico Nacional, Centro de Innovación y Desarrollo Tecnológico en Cómputo, Juan de Dios Bátiz s/n, GAM, CDMX 07700
  • Oscar Camacho-Nieto Instituto Politécnico Nacional, Centro de Innovación y Desarrollo Tecnológico en Cómputo, Juan de Dios Bátiz s/n, GAM, CDMX 07700
  • Cornelio Yáñez-Márquez Instituto Politécnico Nacional, Centro de Investigación en Computación, Juan de Dios Bátiz s/n, GAM, CDMX 07738

DOI:

https://doi.org/10.61467/2007.1558.2025.v16i3.845

Keywords:

instance selection, hybrid and incomplete data, clustering

Abstract

This paper presents the HICCS algorithm, a novel clustering approach that handles mixed and incomplete data. HICCS improves clustering by using compact sets as initial clusters, employing holotypes to measure intergroup dissimilarity, and merging clusters based on similarity in an order-independent manner. Additionally, it incorporates a user-defined similarity function, making it adaptable to various real-world domains. Furthermore, we introduce the IS-HICCS algorithm for instance selection, which reduces the instance set without compromising classifier accuracy, highlighting clustering's potential to enhance supervised classification models. We evaluate HICCS and IS-HICCS on synthetic and real-life datasets, showing their statistically superior performance compared to other clustering and instance selection methods, respectively

Downloads

Published

2025-07-14

How to Cite

Tusell-Rey, C. C., Villuendas-Rey, Y., Salinas-García, V., Camacho-Nieto, O., & Yáñez-Márquez, C. (2025). Instance Selection for Hybrid and Incomplete Data based on Clustering. International Journal of Combinatorial Optimization Problems and Informatics, 16(3), 405–419. https://doi.org/10.61467/2007.1558.2025.v16i3.845

Issue

Section

Recent Advances on Soft Computing

Most read articles by the same author(s)