WFMU - 365 Days Project (2003)

2003 MARCH 3 #062

Bell Telephone Labs - Computer Speech

from the album liner notes:

This recording contains samples of synthesized speech - speech artifically constructed from the basic building blocks of the English language. A machine which produces synthesized speech is callled, fittingly, a talking machine. There are many possible kinds of speech synthesizers or talking machines. Instead of building and testing a variety of them, scientists at Bell Telephone Laboratories simulate their behaviour with a high-speed, general puropse computer. The computer is instructed (programmed) to accept in sequence on punched cards the names of the speech sounds which make up an English sentence. It then processes this information, in accordance with the linguistic rules governing the English language, and produces an output analogous to the output of the talking machine it is programmed to simulate. The talking machine simulated by the computer in this recording would normally be operated by continuosly feeding it a set of nine control signals. The signals correspond to voice pitch, voice loudness, lip opening and other speech variables. When every instant of sound is specified, and every variable accounted for, such a machine produces human-sounding speech.

Setting up the computer to simulate this talking machine requires two sets of instructions or, more precisely, a two-part computer program. One part of the computer program performs the actual sound making function - it imitates the "talking' of a talking machine. The second part consists of rules for combining individual speech sounds into connected speech, and for producing the nine control signals that activate the talking machine. Scientists at Bell Telephone Laboratories have developed a computer program that permits them to feed the names of speech sounds into the computer on punched cards. They also have devised a phonetic code using the letters of the alphabet. At present, it is made up of 22 consonant and 12 vowel sounds:

CONSONANTS: P - B - T - D - K - G - M - N - NG (as in sing) - F - V - S - Z - SH (as in she) - ZH (as in azure) - H - W - R - L - Y - TH (as in thin) - DH (as in then)

VOWELS: EE (as in bee) - I (as in ill) - AY (as in rate) - E (as in end) - AE (as in add) - AH (as in ah) - AW (as in jaw) - (as in go) - OO (as in foot) - UU (as in food) - UH (as in up) - ER (as in her)

Each speech sound is specified on a separate punched card. When a sequence of cards is fed into the computer, it "operates' on the information - following the rules set up in the second part of its program - to produce the nine control signals that activate the talking machine program. For example, if the sequence of cards, H - EE - S - AW - DH - UH - K - AE - T, is fed into the computer, the machine will say "He saw the cat,' in flat monotones. Proper inflection and phrasing are achieved by specifying on each card the changes in pitch and timing natural to human speech.

By specifying the pitch of the sounds, it also is possible to make the computer sing. In two of the samples recorded, the computer first sings a familiar tune and then, singing the same song, is accompanied by music played by another computer. The "speech' of the simulated talking machine comes out of the computer as tiny magnetized spots on half-inch magnetic tape. The tape is fed to another machinewhich converts the spots to a sound tape suitable for playing on an ordinary tape recorder.

The first eight and very last samples of synthetized speech on this recording are part of a research program aimed, principally, at formulating a minimum set of rules for making plausible English speech. The ninth and tenth selections were produced by analyzing a person"s speech and re-constructing it synthetically on a computer. The objective of this program is to duplicate the sounds and transitions made by a human speaker, including his accent and dialect.

Knowledge developed through such research programs may be useful in devising new techniques for transmitting speech more efficiently over communications systems. In the near future, for example, a person may be able to type on a keyboard and cause a typing machine thousands of miles away to speak for him. There is also the possibility that talking machines may be built for people who are unable to speak.

- Written and directed by D.H. VanLenten

- Contributed by Russell Scholl

TT-4:58 / 4.6MB / 128kbps 44.1khz
from 7" 33 1/3 rpm Record (1963)

Darren James writes:
I have noticed something a bit spooky, almost prophetic. If you listen towards the middle of the recording, the computer voice ("In accents almost completely human...") sings a song, solo; none other than "Daisy Daisy", the very same song that HAL sings solo in 2001 (made in 1968) as it is going mad. Prophetic or litigious?

Brian Flynn writes:
The Bell Labs Speech Synthesis clip was interesting up to the point where the computer sings "Daisy". That's when I fell off my chair! I wonder if Stanley Kubrick had heard this (or maybe a demonstration of Bell Labs' speech synthesis) back in 1963, 5 years before he made 2001.

Robert K Huselius writes:
Stanley Kubrick's idea of letting the computer HAL9000 sing "Daisy" in "2001: A Space Odyssey" actually came from those early speech synthesis experiments by Bell Labs.