Suoraan sisältöön
Juan Vassallo

Versificator:algorithmic poetry and music composition

In this article, I will describe a generative system created by me called ‘Versificator ‘, conceived as a metaphor and a way of bringing to life the original versificator, a fictional device created by George Orwell in the novel “Nineteen hundred and eighty-four” (1949), whose main purpose was to act as an automatic generator of literature and music. The versificator system is based on the implementation of a modular system based on rules for the creation of a score for vocal ensemble, where each module generates different types of musical material composed of pitches, durations and text to be vocalized, and an independent module is responsible for generating the formal plot of the work determining the temporal array of the material. At the end of the process, the system is capable of rendering a complete music score. This work reflects on the possibility of automating some compositional decisions, and the use of artificial intelligence and tools based on computer-assisted composition as a means for artistic creation, with the conviction that these methods offer the artist new possibilities to expand the formal and conceptual architecture of the work, as well as the relationships between form and material.

Introduction

The idea becomes a machine that makes the art.

Sol LeWitt (1967)

In his novel 1984 (1949), George Orwell narrates an imagined future in 1984 in which, through the rise of a totalitarian government, perpetual war, mass surveillance, repressive regulation, historical denialism, and propaganda have taken over Western civilization. The Ministry of Truth is one of the four ministries through which the Ingsoc Party governs Oceania, and is responsible for generating content for culture and entertainment, through an automated process without human intervention. A fictitious generative machine known as versificator is used for this purpose.

In the book, the idea that some form of art or cultural object can be produced by automated means is seen as detrimental: by using automatisms and autopoietic strategies, artistic decisions seem to be delegated to an external instance, perhaps mechanical or repetitive, simplified and normative, and this is considered as a weakness of subjective autonomy (Essl 2007) and an instrument of alienation and cultural homogenization. The notion of human creative superiority manifests itself in the idea that it takes a real artist to create something fine and extraordinary and that the mere generation of creative content by an algorithmic system is far from being a complete model of human creative production. (Ventura 2016)

This idea floated in the atmosphere of the early 1960s, and the reaction to a new type of art was quite critical, when a few years earlier, in 1957, the first composition entirely created by using rule-based algorithmic processes on a computer saw the light. The Illiac suite was programmed by Lejaren Hiller and Leonard Isaacson in 1958, and takes as a reference for its rules, a counterpoint treatise by Johann Joseph Fux, Gradus ad Parnassum, written in 1725. Hiller proposed that it possible to compute musical structures, and he described them as the objective side of music:

The information encoded [in the score] relates to such quantitative entities as pitch and time, and is therefore accessible to rational and ultimately mathematical analysis.

(Hiller 1959, 110)

Xenakis’ reflections, after his essay “The crisis of serial music” (1955), and the edition of his book “Formalized music” (1962) contributed to generating a rearrangement of the paradigms in relation to the use of computers as a means for artistic creation. The idea of Xenakis that any art form implies a more or less pure determinism, and therefore can be explained through the laws of Logic and General Algebra, directly relate the process of artistic creation with the notion of computability, and this is the concept, according to Nake (2010), that is at the heart of algorithmic art.

The experiments of Xenakis with the FORTRAN programming language and his foray into composition based on algorithms and automations are known, from which several of his works are derived, such as the 25-minute work GEDYN3, created automatically after a run of the program GENDY301. Xenakis metaphorically describes his creative process involving the use of electronic computers as piloting a spaceship:

Freed from tedious calculations, the composer can devote himself to the general problems posed by the new musical form and explore the corners in this way while modifying the values ​​of the input data. (…) With the help of electronic computers, the composer becomes in a kind of pilot: press buttons, enter coordinates and supervise the controls of a cosmic ship that navigates in the space of sound, through sonic constellations and galaxies that before could only glimpse as a distant dream. He now can explore them easily, sitting on a sofa.

(Xenakis 1962, 144)

What for Orwell and for the Cartesian thought that puts the human being at the center of the creative universe, represents an artistic misfortune, for the creators of the new algorithmic era, this is the long-awaited future:

Algorithmic art is the final art form in times of industrial production. Beyond all craftsmanship and aura, the work happens automatically. This comes at the price of the artist becoming an engineer. What futurists and others may have vaguely dreamed of becomes real in the algorithmic age.

(Nake 2010, 55)

Is it possible to argue about algorithmic agency in art creation? To this day, the heuristic processes involved and a composer’s decision-making process in creating a musical work remain speculative, somewhat of a mystery. Is there any a priori reason to deny creative agency to non-human entities, such as machines or algorithms? (Bennett 2010) Is it possible to discuss a new perspective on the creative agency of algorithmic composition systems? The reason for naming the system ‘versificator’, in some way, represents a reparation to the aforementioned object, and with it to any system of automated and generative creation, as they favor an expansion of the human creative process. In this sense, the main question, then, is not whether it is possible to move more artistic decisions into the computer side, as proposed by Bown (2021), to the point of achieving autonomy in the creation of a score or a work, or minimizing the need for the composer to search iteratively the correct solution: A system that generates and structures musical material should be considered as an extension for the basic creative mechanism that would allow us a completely new approach towards musical composition, understood as the establishment of multidimensional mappings between a complex network of materials, forms and content in a musical work.

Methodology

The versificator system does not work as a fully automated music score generator, but rather proposes an iterative or cyclical creation process, as described by Bown (2021), where after each generative module, if the system produces a non-satisfactory outcome the first time, the composer can modify the input parameters and expect more or less different resulting outputs.

Fig. 1: Schematization and data flow of the operation of the versificator system

The versificator system has been fully developed in Max MSP: A visual programming language for music and multimedia that basically allows the connection and combination of pre-designed building blocks called ‘objects’ through cables through which different types of data flow, such as numbers, characters, text strings, lists, audio signals, among others. Max is also open to external third-party objects, and there is a large community of programmers who enhance the software with commercial and non-commercial extensions to the program. The versificator system uses an external library package in Max: the “Bach”, “Cage” and “Dada” package. Developed by Andrea Agostini and Daniele Ghisi (2012), Bach is a free and open-source library that provides multiple tools for computer-aided composition to the Max environment. More importantly, Bach provides a well-developed music notation interface, as well as a wide palette of functionalities.

Constraints solvers

The organization of a rule-based generator system is modular in nature: a complex system can be understood as the interactions between multiple rules of different nature, and the general complexity is given by the chaining of simple rules. The heart of the rule-based system within the system versificator is based on constraint satisfaction algorithms. A constraint satisfaction algorithm tries to solve compositional problems by finding certain combinations of variables among a list of possible values ​​so that a certain rule or number of rules is satisfied, either completely, by means of a deterministic solver, or at least partially, through a heuristic solver. Two constraint resolution algorithms are used within the system: the bach.constraints object and the PMC constraint engine (Laurson 1996, Sandred 2010).

Fig. 2: Basic flowchart that illustrates the basic components of a solver for constraints satisfaction problems (Sandred 2017)
Flowchart that illustrates how a heuristic rule that controls the number of letters for a word works. The search engine receives three variables: a prefix, a root and a suffix for a word. The rule tells the engine to count letters for different combination of variables and find the combination that is closer to 20. Thus, a word with 13 letters will be better candidate than a word with 6.

Text generators

The system has three generator modules, a first module generates consonant strings, a second module generates vowel strings, and the last module generates ‘pseudowords’. The first two use IPA symbols, the last one uses the Latin alphabet. All of them generate text strings based on rules that operate on constraint algorithms. Due to space considerations in this publication, only the operation of the pseudoword generator module will be detailed.

A pseudoword or nonsense word is a unit of speech or text that appears to be a real word in a given language, and its construction follows the phonotactic rules of the language in question, although in reality it has no meaning in the lexicon. In the versificator system, these pseudowords are the result of stochastic combinations of prefixes, roots and suffixes of English words. This combinatorial system for word generation was originally described in the 17th century by the German poet Georg Philipp Harsdörffer, who described in his book Fünffacher Denckring der teutschen Sprache (1636) a way to generate all existing and potential German words by combining basic syllables. A modern version of this system is the word generator called ‘Words without sense’, implemented in P5js by the artist Mario Guzmán. The pseudoword generator module within the versificator takes as inspiration the idea of ​​Harsdörffer, adapted and implemented in Max MSP, and has in common the combinatorial logic with the generators of Harsdörffer and Guzmán, but the ability to restrict random generation into a rule-based system using a constraints satisfaction algorithm is added. Some rules that govern the generation of pseudowords have to do with the use of rhyme patterns, alliteration, number of vowels or consonants or proportion between them. It is possible to generate groups of words, which when combined following these formal rules, can evoke a certain poetic sense.

Examples of pseudowords generated by the versificator system, such as combinations of prefixes, roots and suffixes of English and Latin.

Relations between text and music in the generation process

The versificator system proposes the generation of text and music based on the connection between musical parameters and phonetic elements, by translating into a common symbolic language through which, some of these parameters can be formalized, analyzed, and ultimately mapped into one another. The common language for these two realms is numbers, –a language that of course can be understood by computers. This type of bidirectional numerical relationship between text, phonetics, and musical material provides the composer with a symbolic connection between the generated text and its subsequent embodiment in a musical score. In general, a text comes from a symbolic written representation of human speech. Human speech is decoded and understood by recognizing acoustic (and sometimes visual) signals present in its signal. These acoustic signals that allow us to identify phonetic sounds of speech have been extensively studied in acoustic phonetics, and research in this field has been able to formalize these signals in numerical systems, which enabled their recreation through digital synthesis systems, and led to at the birth of the first speech synthesizers and TTS (text-to-speech) systems. A TTS system generally receives some type of text input, and translates it into sound information, translating normal text into a synthetic voice. Probably the most widely used synthesis method from the 80s onwards was a type of synthesizer that is based on recreating the acoustic model of the voice, developed especially by the work of Dennis Klatt (1980).

A simple text-to-speech synthesis procedure (Lemmetty 1999)

Musical pitch and duration mapping process

The versificator system proposes a system of musical pitches based on the translation of the spectral structure of a vowel sound, composed of a fundamental frequency (F0), and four resonant bands called ‘formants’ (F1, F2, F3 and F4) to pitches musical instruments, thus generating distinctive musical harmonic fields according to the original vowel sound. The source for obtaining the measurements of each formant structure is a study carried out by Hillenbrand et al. (1995). The following graph exemplifies the translation of this spectral structure into musical pitches:

Representation in musical pitches of the formant structure of twelve of the vowels sounds present in the English language.

The duration of each vowel – measured in the same study by Hillenbrand et al. (1995) – gives rise to the system of musical durations within the versificator. Importantly, the more conventional timescales of musical syntax differ substantially from the timescales of human speech, where durations are generally much shorter, on the order of 100–300 milliseconds. For this, the versificator system proposes a translation method based on scalar time factors: The musical durations from the phonetic durations can be scaled by a factor ranging from 1x (original duration) to 10x (10 times the original duration). Metaphorically, it is possible to understand these escalations from the notion of speech normalization, described by Miller (1981), by which our perception is capable of decoding a spoken message regardless of the duration of the phonetic sounds and the ‘speed’ to pronounce of the speaker. On the other hand, this scaling system is fundamental when it comes to conceiving the structural rules that will come into play when determining the temporal organization of the musical material and the formal structure of the score to be rendered.

Temporal and textural arrangement of musical material

The flow described above can be considered as the origin of the musical material in connection with the generated text. However, the piece goes beyond a mere symbolic translation between text and musical sound, or a succession of musical elements originated in the text. The composition of a musical work implies the determination of temporal structures on a local and global scale. For this, the determining factors for formal decisions are, the duration scaling factor of each presentation of material, the number of voices in which each presentation of material is made, and the type of material and its possibilities of combination in groups of two or three, depending on how many voices of the ensemble they appear on. For example, if a presentation of a material of type ‘vowels’ that appears in three voices were to be paired with other material of, say, ‘words’, the number of voices for this one should not be more than 2, since 2 + 3 = 5, and a higher number would exceed the number of voices of the ensemble, which has been set to five for this piece.

The versificator system has an independent module that facilitates the formal and textural organization of musical material based on stochastic principles, mainly related to notions about probabilistic distribution, and the combination of these with restriction algorithms. In short, the basic formal architecture is derived from distributing a number of presentations of musical material over time –by default this number is set at 100–. Each of these presentations is a musical phrase generated by any of the modules, and each presentation can occur in a number of voices ranging from 1 to 4. Some of the rules used to determine the formal organization are:

  • Presentations of materials can appear in duets or trios (E.G.: cons-vows; cons-words; vows-cons-words; etc.).
  • Duos or trios cannot be composed of the same type of material.
  • The sum of voices of each duet or trio of materials cannot exceed the number of voices determined for the vocal ensemble (by default there are five voices).

Examples of heuristic rules that determine the organization of sentences based on their scalar factor of time:

  • Organize the pairs / trios based on the greatest possible contrast between the temporal scaling factors.
  • Organize the pairs / trios as close as possible to the mean of the scalar factor ((1 x 10) / 2 = 5.5).

These rules are chained and determine the constraints of a search process for a PMC engine. It can happen that there are too many rules working at the same time, or some deterministic rules are contradictory, in that case, the engine will not find a solution or will iterate over all possible options, which might take a very long time. It is possible, then, to bypass some rules, or change the type of rule from deterministic to heuristic. At the end of the formal determination process, the system delivers a list containing the temporal order, grouping and textural distribution of each of the 100 material presentations. This list comes to assume the role of a formal blueprint and serves as a guide to generate the sentences in the modules, join them and concatenate them.

Result and discussion

Although the music notation interface offered by the Bach library is powerful and versatile, it is still necessary to export the score that comes out of the versificator system in XML format and adjust certain details using a professional music notation software such as Sibelius. Some problems related to microtonal alteration notation and dynamic regulators cannot be solved directly in Bach without resorting to complex relative scripts. The notation system for the versificator can still be refined and improved a lot, although it is tied to the future development of the Bach library in the addition of improvements and new functionalities.

The outcome of the versificator system is a score that shows an interesting architecture in relation to the organization of heights and durations. The generation of text based on phonetic articulation rules gives it a certain uniformity in its sonority, and the pseudowords give it a poetic-imaginary quality that has a certain charm. Within the versificator system, other musical parameters are also generated by rules, such as dynamic indications and vocal articulation, but their stage is still primitive and open to future experimentation and refinement.

View of the versificator system user interface
Fragment of the score ‘render 3’, created by means of the versificator system, for an ensemble of five mixed voices

Conclusion and future work

My personal consideration is that the final score as it emerges from the versificator system is not yet finished, and by itself it may not be worth its performance in concert beyond an informative question and curiosity about the musical result. I consider that the advantage of having an algorithmic tool for structuring musical material is not enough by itself to generate an interesting work, but that it must necessarily be the starting point for a deeper compositional elucubration, which should be oriented towards the sonic exploration of the material and the addition of new layers and new levels of abstraction, or even the addition of extra musical ideas that may be spilled into it, for example, layers involving theatricality or more visual-gestural performing aspects. In other words, for it to be a complete piece, it is still necessary to generate and refine new layers of compositional construction outside of heights, durations, and dynamic parameters.

The rule-based composition system and particularly the constraint algorithms have proven to be a powerful tool for generating musically interesting yet audible structures. The possibility of using computer-assisted composition tools based on heuristic rules, which provide greater flexibility to find solutions in a vast space of possibilities, can be understood as an expansion of the logic of artistic decision-making of human beings, which fundamentally provides greater processing power to work with a more advanced combinatorial, and more refined searches to solve a specific problem. In his very recent article, Rutz has most elegantly described the processes of cognitive assembly (Hayles 2016) between the human mind and computers in the creative process:

Humans and machines come together as cognitive assemblages in which the exchange processes and dynamics themselves become the central element, and whatever emerges from this connectedness is difficult to separate and attribute to either agent.

(Rutz 2021b, 22)

If we considered that a key element in the study of the psychology of creativity is that creative tasks are those that involve a process of searching for a result that has not been considered before (Bown 2021), a composition system based on rules and particularly on constraints algorithms can be a powerful tool when defining strategies and heuristics that contribute to an effective and innovative search of novel compositional possibilities and the potential of generating new layers of connections between ideas, form and material.

When our technologies actively, automatically, and continually tailor themselves to us just as we do to them – then the line between tool and user becomes flimsy indeed. Such technologies will be less like tools and more like part of the mental apparatus of the person. They will remain tools in only the thin and ultimately paradoxical sense in which my own unconsciously operating neural structures (my hippocampus, my posterior parietal cortex) are tools.

(Clark 2003, 18)

References

Agostini, A., & Ghisi, D. 2012. “Bach: An environment for computer-aided composition in Max.” In ICMC 2012: Non-Cochlear Sound – Proceedings of the International Computer Music Conference 2012, 373–378.

Bennett, J. 2010. Vibrant matter: A political ecology of things. Duke University Press.

Bown, O. 2021. “Sociocultural and Design Perspectives on AI-Based Music Production: Why Do We Make Music and What Changes if AI Makes It for Us?” In Handbook of Artificial Intelligence for Music: Foundations, Advanced Approaches, and Developments for Creativity. Edited by E. R. Miranda, 1–20. Springer International Publishing.

Clark, A. 2003. Natural-Born Cyborgs: Minds, Technologies, and the Future of Human Intelligence. Oxford: Oxford University Press.

Essl, K. 2007. Algorithmic composition. In The Cambridge Companion to Electronic Music, 107–125. Cambridge University Press. https://doi.org/10.1017/CCOL9780521868617.008.

Hayles, K. N. (2016). Cognitive assemblages: Technical agency and human interactions. Critical Inquiry, 43 (1), 32–55. https://doi.org/10.1086/688293.

Hillenbrand, J., Getty, L. A., Clark, M. J., & Wheeler, K. 1995. “Acoustic characteristics of American English vowels.” Journal of the Acoustical Society of America 97, no. 5 (May 1995): 3099–3111.

Hiller, J. . L. A., & Isaacson, L. M. 1958. “Musical Composition with a High-Speed Digital Computer.” Journal of the Audio Engineering Society 6, no. 3 (July 1958): 154–160.

Hiller, L. 1959. “Computer music.” Scientific American 201, no 6 (December 1959): 109–121.

Klatt, D. H. 1980. “Software for a cascade/parallel formant synthesizer.” The Journal of the Acoustical Society of America 67, no. 3 (1980): 971–995.

Laurson, M. 1996. PATCHWORK: A visual programming language and some musical applications (Vol. 6). Sibelius Academy.

Lemmetty, S. 1999. Review of speech synthesis technology. Master of Science thesis, Helsinki University of Technology.

Lewitt, S. 1999. “Paragraphs on Conceptual Art.” In Conceptual Art: A Critical Anthology. Edited by Alexander Alberro and Blake Stimson, 12–16. Cambridge, MA.: MIT Press.

Miller, J. L. 1981. “Some effects of speaking rate on phonetic perception.” Phonetica 38, no. 1–3 (1981): 159–180.

Nake, F. 2010. “Paragraphs on Computer Art, Past and Present.” In CAT 2010 London conference Proceedings, 55–63.

Orwell, G. 1949. 1984. Planet eBook. www.planetebook.com/1984/.

Rutz, H. H. 2021b. “Human–Machine Simultaneity in the Compositional Process.” In Handbook of Artificial Intelligence for Music. Edited by E. R. Miranda, 21–51. Springer, Cham. doi.org/10.1007/978-3-030-72116-9_2.

Sandred, Ö. 2017. The musical fundamentals of computer assisted composition. Audiospective Media.

Sandred, Ö. 2010. PWMC: a Constraint-Solving System for Generating Music Scores. Computer Music Journal 34 (2): 8–24.

Ventura, D. 2016. Mere generation: Essential barometer or dated concept. In Proceedings of the Seventh International Conference on Computational Creativity, 17–24. Sony CSL, Paris.

Xenakis, I. 1955. “La crise de la musique sérielle.” Gravesaner Blätter 1, no. 1 (1955), 2–4.

Xenakis, I. 1992 [1962]. Formalized music: thought and mathematics in composition. Revised edition. Pendragon Press.

Contributor

Juan Vassallo

Juan Vassallo (BMus, MA) is an Argentinian composer, pianist and media artist. Currently he is pursuing his PhD in Artistic Research at the University of Bergen (Norway). His music has been premiered internationally and awarded in competitions in France, China and Argentina. Currently integrates Azul 514, an experimental musical project based on the interaction between digital sound synthesis, instrumental improvisation and real-time processing of sound.