Master Thesis

Generative composition tools for video game soundtracks

Development and composition in Unity and Max for Live

Supervision: Prof. Alexander Peterhänsel & Julian Netzer, M.A.

Objectives of the work

  • Development of a native composition system for the creation of loop-based, adaptive soundtracks in the Unity game engine
  • Implementation of a generative composition system with Max for Live
  • Composition of adaptive and generative soundtracks using the implemented systems for a specially developed video game prototype

Video game prototype

In preparation for the Master’s thesis, a video game prototype was developed. It serves as a test environment for the implemented systems. The prototype is based on adventure games. The game location is a temple. All models for this video game, including a modular system for creating the rooms, were modeled and textured in Cinema 4D.

A level was then designed in Unity using the models created. Various interactions and events were integrated in order to be able to test the implemented system extensively in different situations. Some events are optional or have no fixed order. This allows to evaluate the adaptivity and flexibility of the system.

Listing of some events from the game:

  • Entering and leaving different areas
  • Sudden collapse of the floor
  • Collecting items
  • Opening the gate to the destination after solving a puzzle

Adaptive composition system for Unity

What is an adaptive composition system and what is it useful for?

An adaptive composition system makes it possible for the soundtrack to adapt to the gameplay. Games are generally non-linear, which means that each game sequence is different from the next, whether in terms of timing or the sequence of events. In order to create the best possible immersion, the soundtrack is also designed to be non-linear, unlike “conventional” music. This allows it to react directly to the gameplay and support it musically. Furthermore, by using relations in the game, events can be announced musically before they occur. This makes it possible to build up suspense.

The video shows the practical application of the developed system with a specially composed soundtrack.

Why is a native adaptive composition system being developed?

There are already widespread solutions for game engines such as Unity. The best-known solutions include the middlewares Wwise and FMOD. The soundtracks of many AAA titles have been realized with these softwares. Nevertheless, the decision was made to develop our own native system as part of the thesis for two reasons.

The first reason is compatibility. The middleware does not currently support all platforms that Unity supports. Especially in the area of VR, which is currently gaining popularity, there are systems that are not supported. As its own composition system is based exclusively on native Unity Engine objects, it can also be used for all platforms supported by Unity.

The second reason is the steep learning curve that the middleware has. Especially for small indie developers, getting started with audio middlewares can be a hurdle if they have not worked with such software before. The middlewares each have their own user interface, which is optimized for the application purpose, but the structure differs greatly from Unity. Depending on the middleware, communication with Unity is not as obvious as a native solution based on objects known from the game engine.

The following requirements were placed on the system

  • Use of tracks with different characteristics such as key, time signature or tempo
  • Smooth transitions between these tracks
  • A varied soundtrack over a longer period of time, even when events are not taking place (e.g. staying in one place for a long time by solving a puzzle)
  • Fast reaction time, taking rhythmic structures into account (e.g. audio tracks start to the beat of the music)
  • Simple structure and easy use of the composition system

Basic functionality of the composition system

The system is built using GameObjects via drag and drop in the hierarchy. To simplify this structure, prefabs are available for each type of object, which already contain all the necessary components such as scripts. A system consists of different tracks, each of which can have its own key, time signature or tempo. A track consists of one or more loops, which are subordinate to the track in the hierarchy. Breaks, i.e. interludes, can exist for a track. These interrupt the track, play an audio clip and then continue the interrupted track at the beginning of the bar. There are transitions to connect the tracks with the various properties. Each sound system requires a metronome object. This synchronizes the individual loops so that they do not become asynchronous over a longer period of time. It also defines the musical intervals.

Each object can be conveniently configured in the inspector of the object (see below). To control the composition system, each object provides functions that can be called in scripts. This makes it possible to control the system using triggers or events in the game.

Objects and functions

  • Metronome
    • Synchronizes loops
    • serves as a measure for musical intervals
  • Track
    • Groups loops that form a track
  • Loop
    • Use of any number of different clips so that variety can be created by random selection or an increase in the loop can be realized
    • determine every how many bars a loop can start
    • Random insertion and removal of loops for more dynamics
    • Attack and release time for seamless fading in and out of tracks (even in the middle of a loop)
  • Break
    • Insertion within a track
    • is independent of the beat, so that both short clips and longer musical sequences can be used
    • can start at any interval (at the next half note, 4th note, 8th note, …) to enable quick reactions to sudden events
    • for a smoother transition, a one-shot sound can be played after the break (e.g. a crash cymbal)
  • Transition
    • transition between two tracks with different properties
    • has many properties of the break object: temporal independence, insertion at any interval, one shot after the transition
    • one clip each for the transition from the first to the second clip and vice versa
Structure of the objects in the hierarchy and the UIs of the individual objects

Generative composition system with Max for Live

What is a generative composition system and what benefits does it have for video games?

In a generative composition system, the notes to be played are generated in real time. This is done using algorithms that form a set of rules for the notes to be generated. They can be used to describe musical structures such as rhythm, tempo, range, pitch or the progression of the melody. The combination of random values with the set of rules results in unique, endless compositions.

The use of generative algorithms for video game soundtracks offers great potential. The composition effort is lower, as there is no need to compose a large number of loops to cover every game situation and to keep the soundtrack varied over a long period of time. By changing parameters, the generative system can immediately adapt to the gameplay without any disruptive jumps. Generative soundtracks are rarely used in video games today. The main reason for this is the high processor load caused by the necessary real-time rendering of the sounds.

The tracks in the playlist were generated exclusively with the generative patch and MIDI effects such as Scale or Chord in Ableton Live.

Functions of the developed patch

  • Interval: determines the interval at which notes are generated
  • Mode: 1 – every note is generated, 2 – only every second note is generated (offbeat)
  • Range: describes the vocal range
  • Offset: determines the pitch of the voice
  • Velocity: velocity of the notes
  • Probability: probability that the generated note will be played
  • Duration: note length (interval * factor)
  • A/B Switch: activates the function that allows the generator to start or stop after a certain number of beats
  • Number field: determines after how many beats this can happen
  • Probability: the probability that the state will change
  • SliderValue: can be controlled to activate and deactivate the generator depending on the value
  • Threshold: Threshold value that the SliderValue must exceed for the generator to be activated
  1. Piano 1:22
  2. Ambient Music 1:50
  3. Electronic Pulse 1:04
  4. Scary Ambient 1:04
  5. Orchestral 1:06
  6. Electronic Music 0:59