Speech Perception
-
3 components of Speech Perception
-
1. Physical signal
-
Process sound waves
-
Varies in 3 parameters
-
Amplitude
-
Frequency
-
Time
-
-
-
2. Extract phonemes
-
The smallest unit in a language that is capable of conveying a distinction in meaning
-
Make fine distinctions between similar patterns of sound
-
Examples
-
Buh or Tuh
-
M of mat and B of bat
-
-
Phone = a particular sound used by any language
-
eg the sound [r]
-
Phoneme = a sound used in contrast to another in a particular language
-
eg the category /r/ as distinct from /l/
-
1. Phoneme extraction is categorical
-
i.e. if physical characteristics of the signal are changed slowly, there is a sudden change in which phoneme is perceived
-
-
2. The speech recognition system can modify fuzzy input to give the listener the correct sound
-
-
-
3. Perception
-
Extract meaning
-
Allows us to recognize the same sounds spoken in different ways
-
e.g. by two different people
-
-
There are no natural breaks in speech
-
We “hallucinate” word boundaries
-
Oronyms
-
Two or more sentences that use the same sounds but have different words
-
Such as
-
Scuse me while I kiss the sky
-
Scuse me while I kiss this guy
-
-
Or
-
Some others I know
-
Some mothers I know
-
-
-
Speech perception errors
-
Eugene O’neil won a Pullet Surprise
-
-
-
-
Research methods
-
Voice-Onset Time
-
The amount of time between the release of a stop consonant and the onset of glottal vibrations in the following vowel
-
In English, if we start the laryngeal tone exactly at the beginning of the “P” sound, it becomes (is perceived as) the “B” sound
-
If we delay progressively longer in small increments the beginning of the voice (voice onset time), there is a point in time that it would become the “P” sound
-
VOT may be negative, zero or positive
-
Zero
-
Vocal-cord vibration has begun simultaneously with the release of the plosive consonant
-
A voiceless unaspirated stop (eg. [k]) has zero VOT
-
-
Negative
-
Vibration beginning earlier than the release
-
A pre-voiced stop (eg. [ɡ]) has negative VOT
-
-
Positive
-
Vibration beginning after the release
-
A voiceless aspirated stop (eg. [k ʰ]) has positive VOT
-
-
-
Different languages have different methods of phonetic realization of this feature
-
Categorical Perception
-
Definition
-
Sharp phoneme boundary
-
Discrimination peak at phoneme boundary
-
Discrimination predicted from identification
-
only “different” if different phoneme
-
-
-
Occurs with consonants, not vowels
-
Not restricted to speech
-
Also found in comparison of musical intervals
-
-
Not restricted to humans
-
Chinchillas and quails show the same Voice Onset Time boundary as humans
-
Macaques show discrimination peaks at human VOT and place-of-articulation boundaries
-
-
Innate & Acquired
-
Infants born with ability to make many speech discriminations that they can subsequently NOT make
-
Adults have lost the ability to make distinctions that their language does not use
-
-
Each language has its own distinctive set of phonemic categories
-
English distinguishes /r/ from /l/ but Japanese doesn’t
-
Tamil distinguishes dental /t1/ from an alveolar /t2/ from a retroflex /t3/. English doesn’t.
-
-
Phonemes in a particular language are defined by minimal pairs
-
i.e. since in English “lice” and “rice” have a different meaning, then they contain different phonemes: /l/ and /r/
-
But there is no such minimal pair in Japanese, so they have a single phoneme /r/
-
-
Can Japanese really not hear any difference?
-
For English speakers /d/-/g/ boundary is in a different position after /l/ than after /r/.
-
This is also true for Japanese who can hear /r/ vs /l/
-
But ALSO true for those who can’t.
-
Is this because of language knowledge (implicit phonetics)? No, its phoneme distinction.
-
QUAILS DO IT TOO !!!
-
Auditory Agnosia
-
Definition
-
The defective recognition of auditory stimuli in the context of preserved hearing – as tested with audiometry
-
Primary signs
-
Difficulty in understanding the meaning of spoken words
-
Can refer to a generalized disorder affecting perception of all types of auditory stimuli including non-verbal sounds, speech and music
-
-
Associated with bilateral, or unilateral lesions of the left superior temporal cortex
-
Although some cases have been described following unilateral right temporal lobe damage
-
-
By far the most common cause is cerebro-vascular accident
-
But some cases have been reported following encephalitis
-
-
Types of sound recognition disorders
-
1. Apperceptive
-
Impaired acoustical analysis of the perceptual structure of an auditory stimulus
-
(frequency, pitch, timbre)
-
-
2. Associative
-
An inability to associate a successfully perceived auditory stimulus with a conceptual (semantic) meaning
-
-
-
Spoken Word recognition
-
Morton’s 3-stage model
-
Auditory analysis system = identifies phonemes in the speech wave
-
Auditory input lexicon = identifies the phonological properties of known words
-
Semantic system = identifies the meanings of known words
-
-