Towards A Clinical Interface For Speaker Identification And Speech-To-Text Transcription For Recording Medical Consultations In Spanish

Authors

DOI:

https://doi.org/10.61467/2007.1558.2025.v16i4.900

Keywords:

NLP, Speech to Text, Speaker recognition, Natural Language Interface

Abstract

This paper presents the development of an advanced clinical interface built on the LattePanda Sigma, an embedded device designed for edge computing. The interface integrates OpenAI language models and Whisper for automated speech-to-text transcription, together with accurate speaker diarisation in clinical settings using the pyannote/speaker-diarization-3.1 model. A dataset of ten doctor–patient conversations in Spanish—translated and re-recorded to suit the local context—was used to evaluate the models. Automatic transcriptions generated by the models were compared with the reference transcripts using the ROUGE metric. Average ROUGE scores of 0.9028 for the Small model and 0.9260 for the Medium model indicate high transcription accuracy. The reference transcripts were also used to assess the segments identified by the pyannote model. Finally, the paper analyses the system’s usefulness and effectiveness in improving Spanish-language clinical records.

Downloads

Published

2025-10-12

How to Cite

Zavala Díaz, J., Olivares Rojas, J. C., Gutiérrez Gnecchi, J. A., Téllez anguiano, A., Ramos Díaz, J. G., & Reyes Archundia, E. (2025). Towards A Clinical Interface For Speaker Identification And Speech-To-Text Transcription For Recording Medical Consultations In Spanish. International Journal of Combinatorial Optimization Problems and Informatics, 16(4), 364–374. https://doi.org/10.61467/2007.1558.2025.v16i4.900

Issue

Section

Advances in Computer Science

Most read articles by the same author(s)