Speech Breathing
My thesis work included a rhythmic-prosodic analysis of speech breathing kinematic patterns. These patterns were recorded from > 40 Londoners, measured using inductance plethysmography (a device that shows how your chest moves when you breathe). Some of the code I wrote to analyse the breathing patterns is available on my GitHub and described in more detail in the paper listed above.
Audible inhalation appears to facilitate smooth turn-taking during conversation; however, it is unclear whether listeners form strong temporal expectations concerning the onset of speech that follows a breath sound. Across three experiments, we explored this idea using modified auditory gap detection tasks. In one version, participants reported whether or not they heard a silent gap that was imposed between a breath sound and speech. In the other, participants identified where, within an utterance containing two breath sounds, they thought the silent gap occurred. We found that, in general, listeners are sensitive to violations of the natural speech breathing time series at the level of a few hundred milliseconds. Additionally, nonverbal rhythm discrimination ability consistently predicts how well listeners can place where a gap has occurred, but not whether or the gap was there at all. This could mean that our sense of rhythm may help us to make relative, but not absolute, judgements about speech breathing timing. Importantly, gap detection accuracy and thresholds are superior for trials where the gap occurs after, rather than before, a breath. This suggests that breath sounds may help focus listeners' attending to a particular point in time, with potential ramifications for speech entrainment.
Together with collaborators at Imperial College London, we used the same speech breathing data set in a deep learning application to predict the respiratory signal from the acoustic speech recordings. You can watch the model's prediction from a speaker's voice here in the accompanying video.
Machine learning of respiratory activity in speech from our @interspeech20 paper. 🫁🗣The dashed line is the breath signal predicted by end2end DL from audio, solid line is ground truth @georgios_rizos @Imperial_GLAM @antoniahamilton @UCL_ICN pic.twitter.com/QUu3tn1ZlG
— Alexis Deighton MacIntyre (@alexisdeighton) June 17, 2021