Matthew Perez

My name is Matthew Perez and I am a Computer Science Ph.D. candidate at the University of Michigan. I work with Dr. Emily Mower Provost at the Computational Human Artificial Intelligence (CHAI) Lab. My interest is broadly in using speech analysis and machine learning to create intelligent systems that understand principles relating to human health and behavior. Currently, my focus is on applying deep learning techniques to improve speech recognition for low resource applications (i.e. disordered speech).

I received my M.S. in Computer Science and Engineering at the University of Michigan and my B.S. in Computer Science at the University of Notre Dame. I have been fortunate to receive the GEM Fellowship ('19) and the NSF Graduate Research Fellowship ('20).

Email  /  CV  /  Google Scholar  /  Github

profile photo
Mind the gap: On the value of silence representations to lexical-based speech emotion recognition
Matthew Perez, Mimansa Jaiswal, Minxue Niu, Cristina Gorrostieta, Matthew Roddy, Kye Taylor, Reza Lotfian, John Kane, Emily Mower Provost
Paper |

Utilizing non-speech frames (i.e. silence) in a BERT-framework to improve speech emotion recognition. We find that silence has as significant impact on predicting valence and our token analysis suggests that the presence of and proximity to silence are important factors in latent text features extracted from BERT.

Enabling Off-the-Shelf Disfluency Detection and Categorization for Pathological Speech
Amrit Romana, Minxue Niu, Matthew Perez, Angela Roberts, Emily Mower Provost

This work investigates the use of BERT for dysfluency detection and categorization. We propose finetuning BERT with an additional triplet loss function in order to specifically focus on reptitions and revisions (which are categories which underperform using a baseline BERT model). We show that the added triplet loss leads to improved BERT performance for both revisions and repetitions while preserving performance on other categories.

Articulatory Coordination for Speech Motor Tracking in Huntington Disease
Matthew Perez, Amrit Romana, Angela Roberts, Noelle Carlozzi, Jennifer Ann Miner, Praveen Dayalu, Emily Mower Provost
Paper | Github

Acoustic biomarkers which capture articulatory coordination are particularly promising for characterizing motor symptom progression in people affected by Huntington Disease. In this paper, we utilize Vocal Tract Coordination (VTC) features extracted from read speech to estimate a motor severity score and show these features outperform other common baselines.

Automatically Detecting Errors and Disfluencies in Read Speech to Predict Cognitive Impairment in People with Parkinson’s Disease
Amrit Romana, John Bandon, Matthew Perez, Stephanie Gutierrez, Richard Richter, Angela Roberts, Emily Mower Provost

This work investigates the use of speech errors and disfluencies in people with Parkinson's Disease as a means of analyzing cognitive impairment. In this study, we focus on read speech, which offers a controlled template from which we can detect errors and disfluencies, and we analyze how errors and disfluencies vary with cognitive impairment

Learning Paralinguistic Attributes from Audiobooks with Voice Conversion
Zakaria Aldeneh, Matthew Perez, Emily Mower Provost
NAACL, 2021

Paralinguistic tasks, specifically speech emotion recognition, have limited access to large datasets with accurate labels, which makes it difficult to train models that capture paralinguistic attributes via supervised learning. In this work, we propose the Expressive Voice Conversion Autoencoder (EVoCA), which is a framework for capturing paralinguistic (e.g., emotion) attributes from a large-scale (i.e., 200 hours) audio-textual data without requiring manual emotion annotations. The proposed network utilizes the conversion of synthesized (neutral) speech and real (expressive) speech in order to learn what makes speech expressive in an unsupervised manner.

Aphasic Speech Recognition using a Mixture of Speech Intelligibility Experts
Matthew Perez, Zakaria Aldeneh, Emily Mower Provost

Automatic speech recognition (ASR) is a key component for automatic, aphasic speech analysis. However, current approaches of using a standard, one-size-fits-all ASR model might be sub-optimal due to the wide range of speech intelligibility that exists both within and between speakers. This work investigates how speech intelligibility can be estimated using a neural network and how intelligibility variability can be addressed within an acoustic model architecture using a mixture of experts. Our results show that this style of modeling leads to significant phone recognition improvement compared to a traditional, one-size-fits-all model.

Classification of Huntington Disease using Acoustic and Lexical Features
Matthew Perez, Wenyu Jin, Duc Le, Noelle Carlozzi, Praveen Dayalu, Angela Roberts, Emily Mower Provost

This works presents a pipeline for an automatic, end-to-end classification system using speech as the primary input for predicting Huntington Disease. We explore using transcript-based features to capture speech-characteristics of interest and use methods such as k-Nearest Neighbors (with euclidean and dynamic time warped distances) as well as more modern neural net approaches for classification.

Portable mTBI Assessment Using Temporal and Frequency Analysis of Speech
Louis Daudet, Nikhil Yadav, Matthew Perez, Christian Poellabauer, Sandra Schneider, Alan Huebner
IEEE Journal of Biomedical and Health Informatics , 2016

This work investigates the use of mobile devices for the extraction and analysis of various acoustic features at detecting mild traumatic brain injury (mTBI). Our results suggest strong correlation between certain temporal and frequency features and likelihood of a concussion.

Website Template