Gretchen Jude


As both a vocalist and a computer musician, I find that, insofar as my music entails programming, my body is ignored while I am at the keyboard. In contrast, when vocalizing (which is rooted in the breath and the physical body) I literally rediscover my voice, which then grounds my artistic practice. Similarly, as a performance studies scholar, I find that my work as a performer moors my intellectual practice in the physical realm. Based on these experiential observations, this paper presentation will explore the problem of (dis)embodiment in human (vocal) interactions with digital technology by utilizing digital audio devices to problematize the increasingly sidelined position of the body in the expanding realms of technology, as well as providing a real-time experiment in performance-as-(artistic)-practice. The academic lecture will also be denaturalized and reframed as work of sonic art, highlighting the transformation of the voice as it travels between microphone and amplifier – a shift which is currently so normalized as to go unnoticed in both academic and artistic contexts. At the same time, this performance will in some sense be a collaboration between me and my technological medium of choice, namely the audio PA system and computer. These are tools shared by academics and artists (among others) – tools, which also remain invisible yet essential to the ways that scholars now research, write and share work.

Author’s note: The following text performs best when treated as a script for real-time speaking aloud rather than as a written document for silent reading. Left-justified, plain font specifies text to be read aloud. Right-justified font should be read aloud by the performer while simultaneously playing as a prerecorded sound file. Sound files can be accessed at; the title of each recording is specified in the text by bold italic font. Nonverbal and/or extra-verbal instructions are also specified in italics


[Appropriate introductory remarks, e.g., “Good afternoon, thank you for coming,” etc.] Today, I’d like to talk to you about talking. This fundamental human activity not only places us within contexts of meaning, it is also central in creating and maintaining social bonds. We spend a great deal of time talking, even if only to ourselves. So today, I’d like to consider what actually happens when we talk. What does talking require of us? And, of increasing importance, what is lost or gained, shifted or disrupted when audio technologies intervene in the space between my mouth and your ears?

When I speak, I begin with a breath. The muscles in my chest and belly contract to pull air into the tiny spaces of my lungs. As my chest muscles release, tiny membranes embedded deep in my throat can tighten; the combination of a continuous flow of air and changes in my muscular tension produces vibrations – small but complex vibrations – in the air as it escapes. These sonic vibrations gain resonance (meaning volume and form) as they pass through my body. And on their way out, I can consciously shape these vibrations further by moving my throat, mouth, tongue, teeth and nasal cavity.

One common way to shape the vibrations that emanate from my face is into words. Words are made of brief sounds called phonemes. In fact, every human language group (aside from sign languages) is comprised of phonemes. They are the building blocks of linguistic meaning.

DUET 1:  repeat/hold specified phonemes/words when you hear them in the recorded text >>>>>

A phoneme is a basic unit oV a language’s Fonology, which is combined with other phonemeZ to form meaningful unitS. The phoneme can be described as the smallest uniT of sound employed to form meaningful contrasts between uDDerances. In this way the difference in meaning between the English words Kill and Kiss is a result of the exchange of the phoneme /l/ for the phoneme /s/.

vvvvv fffff zzzzz sssssssss tttttttt ddddd lllllll rrrrrr kkkk kkkkk

Two words that differ in meaning through a contrast of a single phoneme are called minimal pairs.
For example, the opposition between ‘p’ and ‘b’ in English is not a salient distinction in some languages; in contrast, the uvular r’ is not a phoneme in the world’s standard Englishes, while a flipped r’ also functions as a marker of extra-linguistic meaning, such as national identity.

pear-bear, pop-bop  p-b p-b  uvular rrrr  flipped r-r-r-r

When I speak to you, phonemes and other sounds come out of my mouth and the other parts of my face through the air towards you. Since the early 20th century, technology has allowed words and other sounds to be recorded directly as sound – physically inscribed or magnetically recorded into wax, vinyl and similar materials. Audio playback has revolutionized people’s conceptualizations of sound, irreversibly altering human sonic behavior around the world. However, whatever the source or modality, sound always depends on air.

It is not entirely accurate to say that sound comes through the air, since in actuality, sound is MADE of air. Sound waves are by definition organized patterns of vibration in the air molecules, or more technically speaking, alternating compressions and rarefactions of local air pressure. So in other words, in order to talk to you, I must systematically disturb the air that lies between us. I must, with my breath, form patterns of vibration in the air, which somehow reach you, wherever you are located. These interactions in oxygenated spaces and times also unavoidably reflect our various social, cultural and linguistic contexts, spaces that we simultaneously create even as they shape who we are.

In her article “Doing Things with Voices…” >>>>> DUET 2: drink water, yawn, etc. >>>>> Norie Neumark writes,

“Emerging from the body, voice is marked by that enculturated body.
That is, embodied voices are always already mediated by culture: they are inherently modified by sex, gender, ethnicity, race, history and so on. Through its performance quality voice does not directly express or represent those cultural characteristics, it enacts them – it embodies them through its vocal actions.” (Neumark 2010, 97.)

I have many years of experience speaking and listening, in various contexts, but always with more or less this particular physical form. Based on these experiences of speaking and listening, on my past journeys and interactions, on my many thoughts, dreams, hopes, valuations, successes, failures, hates and loves, today I speak to you with the expectation that the movements of my belly, chest, lungs, larynx, throat, pharynx, mouth, tongue, teeth and lips are shaping the air I exhale into patterns that, since you know English, you will find meaningful, at least to some degree. But what actually happens once that air reaches you?

DUET 3: extended exhalation into mic >>>>>
According to Neumark,

“the performative voice is quintessentially paradoxical…[or] uncanny in Freud’s sense of unheimlich or unhomely. It carries a trace of its ‘home’, the body of the speaker, but leaves that home to perform speaking. And if we consider the voice in digital media, it is even more uncanny, in that it has a second home – the realm of ones and zeros – yet must leave that home, and indeed the digital realm, to perform differently, to sound analogically.” (Neumark 2010, 97.)

We do not usually notice the pressure of the air that envelops our bodies, but in fact the air is always strongly exerting itself upon us. The weight of the atmosphere is a fundamental condition of human life. Tiny variations in air pressure cause vibrations that ripple through our bodies and our senses. These sound waves of different frequencies (what we sometimes call pitches) are most noticeably sensible in different regions of the body. Very low, or bass, frequencies may shake our internal organs. The pitch range of my voice, in contrast, is most easily sensed by your eardrums.

An eardrum, like the vocal folds, is a membrane. However, it most often functions not to vibrate the air, but rather, simply to wait, tightly stretched, to be vibrated. Thus the rapid changes in local air pressure that I have sent from my throat to you vibrate your eardrum. These vibrations are transformed by your eardrum and by the tiny fluid-filled spiral chamber, or cochlea, of your inner ear, into electrical impulses that then travel through your nervous system to your brain. Interestingly, your inner ear is what also keeps you oriented upright in space, since it provides you with a sense of balance.

DUET 4: Make yourself comfortable. >>>>>

The microphone, much like an eardrum, consists of a vibrating membrane, which is moved by the sound waves in the air. This movement is then transduced via electromagnetic or electrostatic induction into electrical signals. This electricity must undergo several more transformations before it can become the electricity that animates your brain. At the very least, the microphone’s audio signal must connect to other devices that amplify the signal and transform it back into sound waves.

There is one last, unavoidable component in the extended network of hardware that produces electronic sound: the loudspeaker. A speaker is nothing like the human vocal apparatus. Rather, a speaker most commonly uses magnets to convert an amplified electrical signal into movements that vibrate a paper cone. These vibrations (which oscillate at between 20 and 20,000 times per second) push the air molecules into sound patterns. With a PA or public address system, sounds that are naturally very quiet can easily be made very loud, even painfully so. A person speaking into such a system can also choose to use much less physical effort to produce sound with her body, relying on the electrified technology of the PA rather than physical technique to powerfully transmit her message. Of course, as most of us know from experience, PA systems are not infallible, or even ideal for every public speaking situation.

So far I have been making the argument that sound technology, like a human speaker, has a body of a sort, or at least an inescapable materiality. I constantly resist discourses that either romanticize or demonize machines. However, like the air, like language, like our bodies, the materials and tools that we use (as well as how we use them) shape us at the same time as we utilize them. So what does increasingly ubiquitous digital technology tell me about myself and my body in relation to others? How do social structures mitigate those relations? What use is it to reflect upon the materiality of language, or of air?

According to Brandon LaBelle, >>>>> DUET 5: Read in perfect unison. >>>>>

The voice comes to us as an expressive signal announcing the presence of a body and an individual – it proceeds by echoing forward and away from the body while also granting that body a sense of individuation, marking vocality with a measurable paradox. The voice is that very core of an ontology that balances presence and absence, life and death, upon an unsteady and transformative axis. The voice comes to signify through a slippery and unforgettable semantics the movements of consciousness, desire, presence while also riveting language with bodily materiality.
The voice is sense and substance, mind and body, cohering in a flux of words that imparts more than singular impression or meaning. It carries words through a cavity that in turn resonates with many uncertainties, excesses, and impulses, making communication and vocality distinct yet interlocked categories.” (LaBelle 2010, 149.)

Yet, my voice, even before it leaves my body, is, in its cascade of vibrations, which disturb and rearrange my surroundings, never entirely mine. Just as I am never simply or separately, “I” alone.

DUET 6: As call and response, in sequence >>>>>

“If,” as LaBelle posits, “language is already a technology, further mediatized by the advent of radio, television, and related broadcast operations, then literary mechanisms and strategies are appropriative interventions into such technology – they begin to function as forms of hacking that aim for the mechanics at work….

These predigital voicings hint at a displacement and ultimate networked condition of the human subject, rerouting the expressive self through an alterity that turns one’s own body into a speaking machine.” (LaBelle 2010, 161.)

“Whereas modernist notions of disembodiment led to a sense of fragmentation or rupture, the digital voice seems to find a new sense of agency (and pleasure) within networked conditions…

… producing sonic projects that generate provocative instances of the human body as process…

opening up a space through which we learn to inhabit our current relational and networked geographies by an auditory fissuring and extension of voicing.” (LaBelle 2010, 167.)

Of course, the metaphor of the network is primarily a spatial image, and one, which brings to mind plastic-wrapped cables, securely wiring the post-human voice to mass-produced machines under control that is tenuous at best. However, sound is not only a spatial but also a temporal phenomenon. Vibration and oscillation entail both a motion-in-stasis and an amount or duration of repetition. As Elin Diamond writes,

DUET 7: Simultaneous start and finish with backwards recording >>>>

Mimesis then is impossibly double, simultaneously the stake and the shifting sands:
order and potential disorder, reason and madness.”

DUET 8: Speaking louder while moving away from microphone as the voice splits >>>>>

Mimesis then is impossibly double, simultaneously the stake and the shifting sands: order and potential disorder, reason and madness. As a concept, mimesis is indeterminate… and, by its own operations, loses its conceptual footing. On the one hand, it speaks to our desire for universality, coherence, unity, tradition, and on the other, it unravels that unity through improvisations, embodied rhythm, powerful instantiations of subjectivity, and what Plato most dreaded, impersonation, the latter involving outright mimicry. In imitating… the model, the mimos becomes another, is being an other….” (Diamond 1997)

SOLO: standing away from mic, speaking loudly enough to project >>>>>
Does my electronic voice, in pointing back to my body as its point of origin, call my own unitary subjectivity into question?

DUET 9: In unison, low voice close to microphone >>>>>

“The advent of digital technologies resituates the modernist understanding of embodiment, foreclosing routes toward ‘original’ voicing through intensifications of simulated, virtual presence and the language of coding. The conditions of the digital replace the fantasies of primary beginnings with a dissolution of the original – though the fragmentation and doubling of analog technology may refer to a presumed notion of origin, to the ‘real’ voice, the digital ruptures such a link…. The digital voice may be heard not just as poetical revolution tied to subjectivity, but more as a signaling of the subject’s current pluralization and post-human future.” (LaBelle 2010, 145.)

Perhaps the digital voice, in calling into question the notion of a singular, authoritative speaking subject, can be taken as a reminder of our unavoidable and constant state of connection with Others of all kinds: other people in our various human communities, the animal-vegetable-mineral ecologies that sustain our material bodies, even our own other selves, remembered and envisioned, in the past and future.

It is also at this point that technical descriptions come full circle, sounding more and more like poetry. In Curtis Roads’ groundbreaking 1996 book, The Computer Music Tutorial, he describes granular synthesis:

DUET 10: (freely, not in unison) extend words into song-like phrases, aiming to begin and end vocalization at the same time as the recording, but otherwise simply overlapping >>>>>

“Asynchronous granular synthesis…sprays sonic grains into cloud-like formations across the audio spectrum…. Time-varying combinations of clouds lead to dramatic effects such as evaporation, coalescence, and mutations created by crossfading overlapping clouds.” (Roads 1996, 184.)

Just as light energy can be viewed both in terms of wavelike properties and in terms of particulate properties…, so can sound. Granular synthesis builds up acoustic events from thousands of sound grains. A sound grain lasts a brief moment (typically 1 to 100 ms), which approaches the minimum perceivable event time for duration, frequency, and amplitude distinction. Granular representations are a useful way of viewing complex sound phenomena – as constellations of elementary units of energy, with each unit bounded in time and frequency… Such representations are common inside synthesis and signal-processing algorithms.” (Roads 1996, 168.)

In its voyage through the digital realm, the voice is rapidly transformed into discrete numerical values, subjected to multiple calculations, then resynthesized back out into the vibrating analog atmosphere, available once again to attentive ears. As Donna Haraway writes, “the sheer messiness of life – and of technology – seems our best hope for breaking the hold of the established disorder. The world is not finished, and reconfigured knowledges and technologies must be at the center of freedom projects… One cannot know in advance what something is, not even or maybe especially if that something comes from the belly of the monster.” (Haraway 2007, 135.)

In conclusion, I cannot say for certain whether digital processing entails pernicious disjunctures of organic audio phenomena, or whether human perception (either properly or wrongly) reconstitutes these miniscule, inconceivably rapid disjunctures into unitary experience – like the accumulations of floating water droplets that we speak of as clouds. But no matter what the meaning we ascribe to oscillating air molecules, the sounds of our voices are here, fleeting yet recurrent, in the space between us.

Thank you for listening. I hope you will feel free now to add your voices to this work-in-progress, with both questions and feedback.



Gretchen Jude is a performer and composer who works with digital and analog electronics, exploring the tensions and liminal spaces between human and machine. Gretchen spent eight years as a university English teacher in Tokyo, where she studied traditional Japanese music. In 2011, she earned an MFA in Electronic Music from Mills College (California); Gretchen is now enrolled in the doctoral program in Performance Studies at University of California, Davis. Current interests include presence and embodiment in electrovocal and computer music performance, site-responsive improvisation, graphic scores, and collaboration with dancers and visual artists.


Cavarero, Adriana. 2005. For More Than One Voice: Toward a Philosophy of Vocal Expression. Stanford, CA.: Stanford University Press.

Diamond, Elin. 1997. Unmaking Mimesis: Essays on Feminism and Theatre. London & New York: Routledge.

Haraway, Donna. 2007. [Interview.] In Jan Olsen and Evan Selinger (eds). Philosophy of Technology: 5 Questions. New York: Automatic Press/VIP.

Huber, David Miles and Robert Runstein. 2005. Modern Recording Techniques. 6th Edition. Amsterdam: Elsevier Press.

LaBelle, Brandon. 2010. “Raw Orality: Sound Poetry and Live Bodies.” In Norie Neumark, Ross Gibson, and Theo van Leeuwen (eds.) Voice: Vocal Aesthetics in Digital Arts and Media. Cambridge, MA: MIT Press. pp. 147–171.

Neumark, Norie. 2010. “Doing Things with Voices: Performativity and Voice.” In Norie Neumark, Ross Gibson, and Theo van Leeuwen (eds.) Voice: Vocal Aesthetics in Digital Arts and Media. Cambridge, MA: MIT Press. pp. 95–118.

Roads, Curtis. 1996. The Computer Music Tutorial. Cambridge MA: MIT Press.