Towards A Clinical Interface For Speaker Identification And Speech-To-Text Transcription For Recording Medical Consultations In Spanish

Jonathan Zavala Díaz; Juan Carlos Olivares Rojas; José Antonio Gutiérrez Gnecchi; Adriana Téllez anguiano; J. Guadalupe Ramos Díaz; Enrique Reyes Archundia

doi:10.61467/2007.1558.2025.v16i4.900

Authors

Jonathan Zavala Díaz Tecnológico Nacional de México / Instituto Tecnológico de Morelia https://orcid.org/0009-0002-2228-7658
Juan Carlos Olivares Rojas Tecnológico Nacional de México / Instituto Tecnológico de Morelia http://orcid.org/0000-0001-5302-1786
José Antonio Gutiérrez Gnecchi Tecnológico Nacional de México / Instituto Tecnológico de Morelia https://orcid.org/0000-0001-7898-604X
Adriana Téllez anguiano Tecnológico Nacional de México / Instituto Tecnológico de Morelia https://orcid.org/0000-0002-0945-2076
J. Guadalupe Ramos Díaz Tecnológico Nacional de México / Instituto Tecnológico de Morelia https://orcid.org/0000-0002-7281-7461
Enrique Reyes Archundia Tecnológico Nacional de México / Instituto Tecnológico de Morelia https://orcid.org/0000-0003-3374-0059

DOI:

https://doi.org/10.61467/2007.1558.2025.v16i4.900

Keywords:

NLP, Speech to Text, Speaker recognition, Natural Language Interface

Abstract

This paper presents the development of an advanced clinical interface built on the LattePanda Sigma, an embedded device designed for edge computing. The interface integrates OpenAI language models and Whisper for automated speech-to-text transcription, together with accurate speaker diarisation in clinical settings using the pyannote/speaker-diarization-3.1 model. A dataset of ten doctor–patient conversations in Spanish—translated and re-recorded to suit the local context—was used to evaluate the models. Automatic transcriptions generated by the models were compared with the reference transcripts using the ROUGE metric. Average ROUGE scores of 0.9028 for the Small model and 0.9260 for the Medium model indicate high transcription accuracy. The reference transcripts were also used to assess the segments identified by the pyannote model. Finally, the paper analyses the system’s usefulness and effectiveness in improving Spanish-language clinical records.

Smart citations: https://scite.ai/reports/10.61467/2007.1558.2025.v16i4.900

Towards A Clinical Interface For Speaker Identification And Speech-To-Text Transcription For Recording Medical Consultations In Spanish

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Information

Current Issue