|
Matthew Perez
My name is Matthew Perez and I am a Senior Machine Learning Research Scientist at Rad AI.
I graduated from the University of Michigan with a Ph.D. in Computer Science and Engineering working with Dr. Emily Mower Provost at the Computational Human Artificial Intelligence (CHAI) Lab.
My research interest is broadly in speech- and language-based machine learning applications.
My work has spanned topics including ASR, speaker diarization, error characterization, grapheme-to-phoneme, and emotion recognition and modeling
I received my M.S. in Computer Science and Engineering at the University of Michigan and my B.S. in Computer Science at the University of Notre Dame. I have been fortunate to receive the GEM Fellowship ('19) and the NSF Graduate Research Fellowship ('20).
|
|
Publications
Multimodal Classroom Diarization with GPT Re-scoring: Teacher or Student?
Matthew Perez,
Berk Coker,
Kemal Berk Kocabagli,
Jessica Vitale,
Alyssa Van Camp
[Paper] AIED, 2025
We propose a multimodal pipeline that integrates Large Language Model (LLM) re-scoring to enhance classroom speaker diarization. Our approach combines Adaptive Centroid Enrollment (ACE) for audio embedding restructuring and GPT-based refinement of clustering predictions, achieving relative improvements of 8.2% in word-level accuracy and 16.5% in F1-score on classroom datasets.
|
Beyond Binary: Multiclass Paraphasia Detection with Generative Pretrained Transformers and End-to-End Models
Matthew Perez,
Aneesha Sampath,
Minxue Niu,
Emily Mower Provost
[Paper | Code] INTERSPEECH, 2024
We introduce novel approaches for automatic paraphasia detection, leveraging a generative pretrained transformer (GPT) and end-to-end models that jointly handle ASR and classification.
Our results show that a single-sequence model outperforms GPT baselines for identifying multiple types of paraphasias in speech.
|
PronScribe: Highly Accurate Multimodal Phonemic Transcription From Speech and Text
Yang Yu,
Matthew Perez,
Anker Bapna,
Fadi Haik,
Siamak Tazari,
Yu Zhang
[Paper] INTERSPEECH, 2023
We present PronScribe, a novel method for phonemic transcription from speech and text input based on careful finetuning and adaptation of a massive, multilingual, multimodal speech-text pretrained model. We show that our model is capable of phonemically transcribing pronunciations of full utterances with accurate word boundaries in a variety of languages covering diverse phonological phenomena, achieving phoneme error rates in the vicinity of 1-2% which is comparable to human transcribers.
|
Episodic Memory For Domain-Adaptable, Robust Speech Emotion Recognition
James Tavernor,
Matthew Perez,
Emily Mower Provost
[Paper] INTERSPEECH, 2023
|
Mind the gap: On the value of silence representations to lexical-based speech emotion recognition
Matthew Perez,
Mimansa Jaiswal,
Minxue Niu,
Cristina Gorrostieta,
Matthew Roddy,
Kye Taylor,
Reza Lotfian,
John Kane,
Emily Mower Provost
[Paper] INTERSPEECH, 2022
|
Enabling Off-the-Shelf Disfluency Detection and Categorization for Pathological Speech
Amrit Romana,
Minxue Niu,
Matthew Perez,
Angela Roberts,
Emily Mower Provost
[Paper] INTERSPEECH, 2022
|
Articulatory Coordination for Speech Motor Tracking in Huntington Disease
Matthew Perez,
Amrit Romana,
Angela Roberts,
Noelle Carlozzi,
Jennifer Ann Miner,
Praveen Dayalu,
Emily Mower Provost
[Paper | Code] INTERSPEECH, 2021
|
Automatically Detecting Errors and Disfluencies in Read Speech to Predict Cognitive Impairment in People with Parkinson’s Disease
Amrit Romana,
John Bandon,
Matthew Perez,
Stephanie Gutierrez,
Richard Richter,
Angela Roberts,
Emily Mower Provost
[Paper | Code] INTERSPEECH, 2021
|
Learning Paralinguistic Attributes from Audiobooks with Voice Conversion
Zakaria Aldeneh,
Matthew Perez,
Emily Mower Provost
[Paper] NAACL, 2021
|
Aphasic Speech Recognition using a Mixture of Speech Intelligibility Experts
Matthew Perez,
Zakaria Aldeneh,
Emily Mower Provost
[Paper] INTERSPEECH, 2020
|
Classification of Huntington Disease using Acoustic and Lexical Features
Matthew Perez,
Wenyu Jin,
Duc Le,
Noelle Carlozzi,
Praveen Dayalu,
Angela Roberts,
Emily Mower Provost
[Paper] INTERSPEECH, 2018
|
Portable mTBI Assessment Using Temporal and Frequency Analysis of Speech
Louis Daudet,
Nikhil Yadav,
Matthew Perez,
Christian Poellabauer,
Sandra Schneider,
Alan Huebner
[Paper] IEEE Journal of Biomedical and Health Informatics, 2016
|
|