A Course in Phonetics by Peter Ladefoged

Articulatory Phonetics

The Vocal Organ:

The basic source of power is the respiratory system pushing air out of the lungs. Air from the lungs goes up to the wind pipe the windpipe (the trachea) into the larynx, at which point it must pass between two small muscular folds called vocal folds. If the vocal folds are apart the air from the lungs will have a relatively free passage into the pharynx and the mouth. But if the vocal folds are adjusted so that there is only a narrow passage between them, the airstream will cause them to vibrate. Sounds produced when the vocal folds are vibrating are called are said to be voiced, as opposed to those in which the vocal folds are apart, which are said to be voiceless. For example, [v] is voiced and [f] is voiceless. Putting fingertips against larynx or hearing buzzing of the vibration or trying a pair of words one can identify the voiced and voiceless sounds. The differences between voiced and voiceless sounds are often important in distinguishing sounds. The air passages above the larynx are known as vocal tract which is like a tube and plays an important role in the production of speech sounds. The parts of the vocal tract that can be used to form sounds are called articulators. Given below are different Human Speech Organs:

1. Upper lip and upper teeth (frontal incisors)
2. Alveolar ridge
3. Hard palate
4. Soft palate or Velum (can cause velic closure)
5. Uvula
6. Pharynx
7. Larynx
8. Lower lip
9. Lower teeth
10. Tongue – tip, blade or front, centre, back, root
11. Nasal Cavity
12. Epiglottis

Places of Articulation:

Speech gestures using the lips are called labial articulations; those using the tip or the blade of the tongue are called coronal articulations; those using the back of the tongue are called dorsal articulations.

 bilabial – using closing movement of both lips, e.g. /p/ and /m/
 labio-dental – using lower lip and the upper teeth, e.g. /f/ and /v/
 dental – the tongue tip is used either between the teeth or close to the upper teeth, e.g. /θ/ and /ð/
 interdental – sounds in which the tongue protrudes between the teeth
 retroflex – tongue tip and the back of the alveolar ridge, e.g. /r/ in hour, air, etc.
 alveolar – the blade of the tongue is used close to the alveloar ridge, e.g. /t/ and /d/
 palato-alveolar or post-alverlar- the blade (or tip) of the tongue is used just behind the alveolar ridge, e.g. /ʧ/ and /ʤ/ - church and judge, /ʃ/ and /ʒ/, e.g. ship and zip
 palatal – the front of the tongue is raised close to the palate, e.g. /j/ - yes, you
 velar – the back of the tongue is used against the soft palate, e.g. /k/ and /ŋ/
 glottal - the gap between the vocal cords is used to make audible friction, e.g. /h/

The bilabial and labiodentals can be classified as labial articulations. Dental, alveolar, retroflex, palate-alveolar or post-alveolar can be coronal articulations and velar and glottal can be dorsal articulators. Palatal sounds can sometimes be coronal and sometimes dorsal articulations.

The Oro-Nasal Process:

In most speech, the soft palate is raised so that there is a velic closure producing a nasal sound or else it will be an oral sound. For example, ‘rang’ or ‘ran’.

Manners of Articulations:

(a) Stop or Plosive: There are two types of stops; oral and nasal. Oral – e.g. /p/ & /b/- pie (bilabial closure), /t/ and /d/ - tie (alveolar closure), /k/ and /g/ - guy (velar closure). Nasal – e.g. /m/ - my (bilabial closure), /n/ - night (alveolar closure), /ŋ/ - sang (velar closure).
(b) Fricative: /f/, /v/ - (labio-dental closure), /θ/, /ð/ - (dental closure), /s/, /z/ - (alveolar), /ʃ/ , /ʒ/ - (palate-alveolar closure). Higher pitched sounds with a more obvious hissing sound is sometimes called sibilants.
(c) Approximant: An articulation in which one articulator is close to another, but without the vocal tract being narrowed to such an extent that a turbulent airstream is produced. Central - /w/, /y/, /r/ and Lateral - /l/
(d) Additional Consonantal Articulations:

- Trill (often called roll) – Scottish English, e,g. raw
- Tap (often called flap) – many forms of American English, e.g. pretty
- Affricate – combination of stop followed by fricative sounds, e.g. cheap

To Summarize:

The consonants can be described in terms of five factors:

1. State of the Vocal Folds – Voiceless or Voiced
2. Place of Articulation
3. Central or Lateral Articulation
4. Oral or Nasal
5. Manner of Articulation

The Articulation of Vowel Sounds:

In the production of vowel sounds, the articulators do not come very close together, and the passage of the airstream is relatively unobstructed. Vowel sounds may be specified in terms of the positions of the highest point of the tongue and the position of the lips.

In Summary, vowels can be described in terms of three factors:

1. The height of the body of the tongue – high, mid, low
2. The front-back position of the tongue – front, central, back
3. The degree of lip rounding – rounded, unrounded

Suprasegmentals:

Vowels and consonants can be thought of as the segments of which speech is composed. Together they form the syllables, which go to make up utterances. Superimposed on the syllables are other features known as suprasegmentals. These include variations in stress and pitch. Variations in length are also usually considered to be suprasegmental features, although they can affect single segments as well as whole syllables. The pitch pattern in a sentence is called intonation.

Phonology and Phonetic Transcription

A phonetician is a person who can describe speech, who understands the mechanisms of speech production and speech perception, and who knows how languages use these mechanisms. Phonetic transcription is a useful tool that phoneticians use in the description of speech. When phoneticians transcribe an utterance, they usually do so by noting how the sounds convey differences in meaning. For the most part, they concern themselves with describing only the significant articulations rather than the total set of movements of the vocal organs. In order to understand what we transcribe and what we don’t, it is necessary to understand the basic principles of phonology. Phonology is the description of the systems and patterns of sounds that occur in a language. It involves studying a language to determine its distinctive sounds and to find out which sounds convey a difference in meaning. When two sounds can be used to differentiate words, they are said to belong to different phonemes. A phoneme is not a single sound but a name for a group of sounds. There is a group of /t/ sounds. These groups of sounds – the phonemes – are abstract units that form the basis for writing down a language systematically and unambiguously. We often want to record all and only the variations between sounds that cause a difference in meaning. Transcriptions of this kind are called phonemic transcriptions.

Phonology:

The variants of the phonemes that occur in detailed phonetic transcriptions are known as allophones. For example, there can be different allophones of the phonemes /t/. The term broad transcription is often used to designate a transcription that uses a simple set of symbols. Conversely, a narrow transcription is one that shows more phonetic detail, either just by using more specific symbols or by also representing some allophonic differences. The use of diacritics, small marks that can be added to a symbol to modify its value, is a means of increasing precision. A transcription that shows the allophones so detailed that it shows all the rule-governed alternations among the sounds is called a completely systematic phonetic transcription. In practice, it is difficult to make a transcription so narrow that it shows every detail of the sounds involved. When writing down an unknown language or when transcribing a child or a patient not seen previously, one does not know what rules will apply. In these circumstances, the symbols indicate only the phonetic value of the sounds. This kind of transcription is called an impressionistic transcription.

The Transcription of Vowels:

The transcription of the contrasting vowels (the vowel phonemes) in English is more difficult than the transcription of consonants for two reasons. First, accents of English differ more in their use of vowels than in their use of consonants. Second, authorities differ widely in their views of what constitutes an appropriate description of vowels. Most speakers of British English distinguish these words by using different dipthongs – movements from one vowel to another within a single syllable.

[:] indicates long vowel
[’] indicates stress mark
[˜] indicates r-coloring the vowel

The Transcription of Consonants:

We can begin searching for phonemes by considering the contrasting consonant sounds in English. A good way is to find sets of words that rhyme. A set of words, each of which differs from all the others by only one sound, is called a minimal set. For example, high and sigh will be a minimal set. The sounds [tʃ] and [dʒ] (as in ‘church’ and ‘judge’) are really single units and are better transcribed with a single symbol, such as [č] and [jˇ]. Given below is IPA Chart with various symbols. For example, /θ/(theta), /ð/(eth), /ŋ/(eng or angma), /ʃ/(esh), etc.

Task 01 - Present English Consonants in a Quadrilateral Chart incorporating Places and Manners of Articulations.
Task -02 – List the IPA symbols along with IPA symbol names, phonetic descriptions, and Unicode names.

Acoustic Phonetics

The way in which we hear a sound depends on its acoustic structure. Linguists and speech pathologists need to explain why certain sounds are confused with one another. They can also give better descriptions of some sounds (such as vowels) by describing their acoustic structure rather than by describing the articulatory movements involved. A knowledge of acoustic phonetics is also helpful for understanding how computers synthesize speech and how speech recognition works. If we want to analyze speech we have to work from a recording. We can get more information than is available from merely listening to a recording by making acoustic analyses of the sounds. We can hear that sounds with the same length can differ from one another in three ways. They can be the same or different in (1) pitch, (2) loudness, and (3) quality.

Sound Waves:

Sound consists of small variations in air pressure that occurs very rapidly one after another. These variations are caused by actions of the speaker’s vocal organs that are (for the most part) superimposed on the outgoing flow of lung air. In the case of voiced sounds, the vibrating vocal folds chop up the stream of lung air so that pluses of relatively high pressure alternate with moments of lower pressure. In fricative sounds, the airstream is forced through a narrow gap so that it becomes turbulent, with irregularity occurring peaks of pressure. The same principles apply in the production of other types of sounds. Variations in air pressure in the form of sound waves move through the air somewhat like ripples on a pond. When they reach the ear of a listener, they cause the eardrum to vibrate. A graph of the sound wave is very similar to a grapg of the movements of the eardrum. The waveforms of speech sounds can be readily observed on a computer.

Pitch and Frequency:

The pitch of a sound depends on the rate of vibration of the vocal folds. In a sound with a high pitch, there is a higher frequency of vibration than in a sound with a low pitch. Because each opening and closing of the vocal folds causes a peak of air pressure in the sound wave, we can estimate the pitch of a sound by observing the rate of occurrence of the peaks in the wave-form. To be more exact, we can measure the frequency of the sound in this way. Frequency is the technical term for an acoustic property of the sound- namely, the number of complete repetitions (cycles) of variations in air pressure occurring in a second. The unit of frequency measurement is the Hertz, usually abbreviated as Hz. If the vocal folds make 220 complete opening and closing movements in a second, we say that the frequency of the sound is 220Hz.

The pitch of the sound is that auditory property that enables a listener to place it on a scale going from low to high, without considering its acoustic properties. In practice, when a speech sound goes up in frequency, it also goes up in pitch. It is possible to determine the frequency of a sound by counting the peaks of air pressure in a record of its waveform. Computer systems will provide graphical displays corresponding to the pitch. Pcquirer/Macquirer programs determine the fundamental frequency at each moment in an utterance. Voiceless sounds have no vocal fold pulses and therefore no pitch. For male voice, the frequency of the vocal fold vibrations in speech may be between 80 to 200 Hz. A woman’s voice may go up to about 400 Hz. The prominent frequencies in voiceless sounds are usually above 2,000 Hz.

Loudness and Intensity:

In general, the loudness of a sound depends on the size of the variations in air pressure that occur. The intensity is proportional to the average size, or amplitude, of the variations in air pressure. It is usually measured in decibels (abbreviated as dB) relative to the amplitude of some other sounds. Technically, to get to the dB difference one has to compare the power ratio, where the power is defined as the square of the mean amplitude (the mean variation in air pressure). The human ear can hear (perhaps tolerate would be a better word) a range of about 120 dB, although if you persist in listening to sounds 110 to 120 dB above the quietest sounds you can hear you will soon go deaf. When one sound has an intensity 5 dB greater than another, then it is approximately twice as loud.

Acoustic Measurements:

Within the range of pitches used by both make and female voices, a change in frequency is directly to a change in pitch. The relation between pitch and frequency has been derived experimentally and used to form the Bark scale. Equal distances along the Bark scale correspond to equal changes in pitch. The mathematical relation between Hz and Bark is fairly complex. When dealing simply with the pitch of the voice, a straightforward linear plot of frequency is sufficient. The relation between acoustic intensity and loudness is also nonlinear, but fortunately only slightly so. For all practical purposes we can consider differences in loudness to be simply related to differences in intensity, reported in dB. Each increase of 5 dB corresponds to a doubling of the perceived loudness. Acoustic records are useful for studying various kinds of phonetic problems. Records of the waveform and the intensity provide a good way of studying variations in length.

Acoustic Analysis of Vowels:

Is has been described how differences in pitch and loudness can be recorded. Now we must consider the differences in quality. A set of vowel sounds provides a suitable starting point, since vowels can all be said on the same pitch and with the same loudness. The quality of a sound such as a vowel depends on its overtone structure. Putting this way, we can say that a vowel sound contains a number of different pitches simultaneously. There is a pitch at which it is actually spoken and there are the various overtone pitches that give it its distinctive quality. We distinguish one vowel from another by the differences in the overtones. Saying the vowels in usual rate, whispering, whistling, using a creaky-voice, making glottal stops can be used to distinguish vowels. Vowels are largely distinguished by two characteristic pitches associated with their overtones. One of them (actually the higher of the two) goes downward throughout most of the series [i, ɪ, e, ɛ, æ, ɑ, ɔ, o, u]. The other goes up for the first four vowels and then down for the nest four. These characteristic overtones are called the formants of the vowels, the lower of the two being called the first formants, and the higher the second formant. There is another characteristic overtone, the third formant, which is also present, but there is no simple way of demonstrating its pitch. The formants that characterize different vowels are the result of the different shapes of the vocal tract. Any body of air, such as that in the vocal tract or that in a bottle, will vibrate in a way that depends on its size and shape. Smaller bodies of air, like smaller piano strings or smaller organ pipes, produce higher pitches. In the case of vowel sounds, the vocal tract has a complex shape so that the different bodies of air produce a number of overtones. A vowel has its own characteristic auditory quality, which is the result of the specific variations in air pressure due to its vocal tract shape being superimposed on the fundamental frequency produced by the vocal folds.

The general theory of formants was stated by the great German scientist Hermann Helmholtz almost 150 years ago. Even earlier, in 1829, the English physicist Robert Willis had said, ‘A given vowel is merely the rapid repetition of its peculiar note.’ Willis was one of the first people to make an instrumental analysis of the acoustic structure of speech. But the notion of a single format (actually the second formant) had been observed several centuries earlier. In about 1665, Isaac Newton wrote in his notebook: ‘The filling of a very deepe flagon with a costant streme of beere or water sounds ye vowels in this order w, u, ω, o, a, e, i, y.’ He was about twelve years old at the time. The symbols are the best matches to the letters in Newton’s handwriting in his note-book, which is in British Museum. Spectrogram is used to analyze sounds and show their separate components. G. Oscar Rusell, one of the pioneers in x-ray studies of vowels said, ‘Phoneticians are thinking in terms of acoustic fact, and using physiological fantasy to express the idea.’

Acoustic Analysis of Consonants:

In many cases, a consonant can be said to be a particular way of beginning or ending a vowel, and during the constant articulation itself there is no distinguishing feature. The apparent point of origin of the formant for each place of articulation is called the locus of that place of articulation Evidence of voicing near the baseline during a consonant closure is called a voice bar.

Wide-band spectrogram has been used in this case. When the vocal folds vibrate, they produce what are called harmonics of their fundamental frequency of vibration. Harmonics are vibrations at whole-number multiples if the fundamental frequency. Thus when the vocal folds are vibrating at 100Hz, they produce harmonics at 200, 300, 400 Hz, and so on. Narrow-band spectrums are useful for determining the intonation, or tone, of an utterance.

To summarize Spectrograms:

The most reliable measurements will be those of the length of the segments, for which purpose spectrograms are often even better than waveforms. Differences among vowels, nasals, and laterals can be seen on spectrograms, whereas it may be impossible to see these differences in the wave-forms. Spectrograms are usually fairly reliable indicators of relative vowel quality. The frequency of the first formant certainly shows the relative vowel height quite accurately. The second formant reflects the degree of backness quite well, but there may be confusions due to variations in the degree of lip rounding. It is also possible to tell many things about the manner of articulation from
spectrograms. For example, one can usually see whether a stop has been weakened to a fricative, or even to an approximant. Affrication of a stop can be seen on most occasions. Trills can be separated from flaps and voiced from voiceless sounds. One can also observe the relative rates of movement of different articulations. Spectrograms cannot be used to measure degrees of nasalization, nor are they much help in differentiating between adjacent places of articulations. For studying these aspects of speech, other techniques are more useful.

Individual Differences:

A speaker’s speech habits are a significant factor in studying speech sounds. Spectrograms of a person’s voice are sometimes called ‘voice-prints’ and they are said to be as individual as fingerprints. However, this is not the case. Nobody knows how many individuals share similar characteristics. Individual variation is also important from a general phonetic point of view. Spectrograms can show relative vowel quality. It is clearly true that one can use spectrograms to tell that the speaker has a higher vowel in ‘three’ than in the beginning of the vowel in ‘here’. One can also use formant plots. But it is not easy to say if the vowel in a given word as pronounced by one speaker is higher or lower than that of another speaker. In general, when two different speakers pronounce sets of vowels with the same phonetic quality, the relative positions of these vowels on a formant chart will be similar, but the absolute values of the formant frequencies will differ from speaker to speaker.

The simplest way to deal with this problem is probably to regard the average frequency of the fourth formant as an indicator of the individual’s head size, and then express the values of the other formants as percentages of the mean fourth formant frequency. An alternative method is to assume that each set of vowels is representative of the complete range of a speaker’s vowel qualities. Then we can express in that speaker’s voice. Much of the work of the applied phonetician today is concerned with computer speech technology and directed toward improving speech synthesis systems. The greatest challenges in the field of speech synthesis are concerned with improvements in intonations and rhythm. Synthetic speech often sounds unnatural because the intonation is too stereotyped. In order to get the correct pitch changes, one must know the speaker’s attitude to the world in general and to the topic under discussion, In addition, the syntax of the utterance must be taken into account, as well as various higher level pragmatic considerations, such as whether the word or a synonym of it has been used in previous sentence.

Airstream Mechanisms and Phonation Types

In order to describe the various languages of the world, we need to consider the total range of the phonetic capabilities of humans. There are several ways in which the sets of terms that we have been using to describe English must now be enlarged. In the first place, all English sounds are initiated by the action of lung air going outward; other languages may use additional ways of producing an airstream. Second, all English sounds can be categorized as voiced or voiceless; in some languages, additional states of the glottis are used.

Airstream Mechanisms:

Air coming out of the lungs is the source of power in nearly all speech sounds. When this body of air is moved, we say that there is a pulmonic airstream mechanism. The lungs are sponge-like tissues within a cavity formed by the rib cage and the diaphragm. When the diaphragm (a dome-shaped muscle) contracts, it enlarges the lung cavity so that air flows into the lungs. The air can be pushed out of the lungs by a downward movement of the rib cage or an upward movement of the diaphragm, resulting from a contraction of the abdominal muscles. Stops that use only an egressive or outward-moving, pulmonic airstream are called plosives. In some languages, speech sounds are produced by moving different bodies of air. If you make a glottal stop, so that air in the lungs is contained below the glottis, then the air in the vocal tract itself will form a body of air that can be moved. An upward movement of the closed glottis will move this air out of the mouth. A downward movement of the closed glottis will cause air to be sucked into the mouth. When either of these actions occurs, there is said to be a glottalic airstream mechanism. Ejectives of different kinds occur in a wide variety of languages, including American Indian languages. For example, / ɬʼ/. It is also possible to use a downward movement of the larynx to suck air inward. Stops made with an ingressive glottalic airstream mechanism are called implosives.

Historically languages seem to develop implosives from plosives that have become more and more voiced. In many languages voiced implosives are simply allophones of voiced plosives.

There is one other airstream mechanism that is used in a few languages. This is the mechanism that is used in producing clicks, such as the interjection expressing disapproval that novelists write ‘tut-tut’ or ‘tsk-tsk’. Another type of click is commonly used to show approval or to signal horses to go faster. Yet another click in common use is single, pursed-lips type of kiss that one might drop on one’s grandmother’s cheek.Clicks occur in words (in addition to interjections or nonlinguistic gestures) in several African languages. Zulu, for example, has a number of clicks, including on ethat is very similar to our expression of disapproval. The IPA symbol for a dental click is [l], a single vertical stroke.

Movement of the body of air in the mouth is called a Velaric airstream mechanism. The phonetic symbol is [ll], a pair of vertical strokes. Clicks can also be made with the tip (not the blade) of the tongue touching the posterior part of the alveolar ridge. The phonetic symbol for a click of this kind is [!], an exclamation point. These three possibilities all occur in Zulu and in the neighboring language Xhosa. Some of the aboriginal South African languages, such as Nama and !Xóõ, have an even wider variety of click articulations.!Xóõ, spoken is Botswana, is one of the few languages that have bilabial clicks – a sort of thin, straight lips, kiss sound, for which the symbol is [ʘ]. In the production of click sounds, there is a velar closure, and the body of air involved is in front of this closure (that is, in the front of the mouth). The spelling system regularly used in books and newspapers in Zulu and Xhosa employs the letters c, q, x, for the dental, post-alveolar, and lateral clicks for which we have been using the symbols [l, ll, !], respectively. The h following the x indicates a short burst of aspiration following the click.

States of the Glottis:

The four states of Glottis can be:

- Voiced
- Voiceless
- Murmur
- Creaky Voice

These positions can be adjusted by the movements of the Arytenoid Cartilages. In a voiced sound, the vocal folds are close together and vibrating. In voiceless sounds, they are pulled apart. If there is considerable airflow, as in an h-like sound, the vocal folds are will be set vibrating while remaining apart. In this way, they produce breathy voice, or murmur. The symbol for this sound is [ɦ]. In creaky voice, which is the other state of the glottis the arytenoids cartilages are tightly together, so that the vocal folds can vibrate only at the anterior end. Creaky voice is a very low-pitched sound that occurs at the ends of falling intonations for some speakers of English. Creaky-voiced sounds may also be called laryngealized.

*Look into Larynx!

Voice Onset Time:

The terms aspirated and unaspirated refer to the presence or absence of a period of voicelessness during and after the release of an articulation. The interval between the release of a closer and the start of the voicing is called Voice Onset Time (usually abbreviated as VOT). The VOT is measured in milliseconds (ms). Some languages contrast three different voice onset times. Thai has voiced, voiceless unaspirated, and aspirated stops.

*Look into VOT!