Annealing Strategies is a piece that is generated by a content-aware program. It uses an audio descriptor to measure the perceptual loudness of a Fourses synthesiser alongside simulated annealing (SA), an optimisation technique that over time is used to discover a combination of complex coupled parameters for the synthesiser will create the quietest possible output. A number of iterations of this work were created by the system and through curation of those outputs and fine tuning of the SA algorithm I arrived at a final rendition of the work.
Motivations and Influences
The initial inspiration for this piece emerged when I came across a digital implementation of a Fourses synthesiser on the cycling 74 forums. I had never come across this particular synthesiser before, and there were two aspects that made me inquisitive and wanting to use this synth for something creative without any specific ideas for what that project would be. Firstly, experimenting with the collectively designed Max patch demonstrated the range of sounds it could produce, including noisy, pitched, harmonic and inharmonic timbres. In particular the quietest sounds interested me as they were novel and hard to discover amongst numerous loud sounds. Secondly, listening to the synthesiser and interacting with its 24 parameters inspired a technological question about how a computer might be able to find parameter combinations that would result in specific perceptual outputs. From this idea I imagined a piece where the synthesiser is seeded with parameters that produce a wild, untamed and loud output, and the computer traverses combinations of parameters to gradually find a quiet state. This searching process would shape the structure of the piece and the journey from the starting point to the “solution” would be able to create a simple and effective form with a strong perceptual basis.
As I researched topics related to these interests, I came across Stefano Fasciani, who in his work has developed new perceptually grounded interfaces for synthesisers. In “TSAM: A Tool for Analyzing, Modelling And Mapping The Timbre Of Sound Synthesizers” Fasciani (2016a, p. 1) describes a program that uses audio descriptors and neural networks to “enable the study of how changes to a synthesis parameter affect timbral descriptors”. This creates the possibility for real-time interaction with a synthesiser by navigating through a visual space that distributes timbral possibilities spatially rather than interacting with the individual parameters. In VIDEO 4.2.1, Fasciani demonstrates how this works by using TSAM to analyse many combinations of parameters of a synthesiser and the sound it produces. He then changes the parameters by exploring a two-dimensional space.
Coming into contact with Fasciani’s work opened an avenue in my conceptual approach to this piece. I realised that it was possible to give computer the responsibility of discerning the relationship between the synthesis parameters and their effect on the perceptual quality of the output. I was encouraged by this, despite not, at the time, having the programming skills to work with the technologies that Fasciani had been, such as neural networks or machine learning. Given these limitations, my motivations were to find or design my own methods to develop this in line with my capabilities at the time.
Changing the Balance of Computer and Human Agency
Alongside those influences spawned by engaging with the Fourses synthesiser, part of my motivation in Annealing Strategies was to re-approach some of the central aims in Stitch/Strata, particularly in affording the computer more responsibility and agency in the compositional process. Composing Stitch/Strata was frustrating and long, and considered a number of tangential approaches before arriving at the final result. A significant factor in this frustration was my inability to devise an algorithm which would allow the computer to make sophisticated high-level formal decisions that were compatible with my aesthetic preferences. What I found was that modelling my own high-level formal thinking led to an infinite regress of attempting to encode the complexity of choices that I make when composing intuitively and manually. My solution to this issue was to make the computer responsible for generating low-level and mid-level sections of music and for me to arrange, edit and apply further processing to these. Ultimately, the compromises that I made were a result of not being able to encode a desirable model for decision-making and selection in the computer. Furthermore, my inability to express what I wanted the computer to do stemmed from the fact that I did not have any prior constraints or measurable outcomes on what I wanted to achieve creatively. I was trying to experiment with the sample-based phonetic materials and encode, design and build a model for the computer to work with these at the same time.
One of the main goals of Annealing Strategies was to create more refined and constrained compositional possibilities with a clearer set of intentions creatively. Taking this simple idea of a computer program searching for a parameter combination that would result in a specific measurable perceptual outcome gave me this clarity and centred the compositional process technologically in trying to develop a simple compositional approach which would result in a complex computer interpretation and result. This is in contrast to the approach of Stitch/Strata, in which I was trying to take a complex compositional language and encode it into a simplistic computational model.
Synthesis and Annealing
Two technologies, the Fourses synthesiser and SA, are central to Annealing Strategies. This section explains how these work in detail and also explain their connection to each other in relation to my aesthetic aims and goals.
Fourses and an Interface for Control
Fourses is a synthesiser created by Ciat-Lonbarde otherwise known as Peter Blasser. It consists of four Fourse oscillators that generates triangle-shaped wave between an upper and lower boundary. The output for each Fourse is connected to the subsequent Fourse as a parametric control for the boundaries while also being siphoned off as an audio output. For example, The output of the first Fourse determines the upper limit of Fourse 2. Fourse 2 would then have an effect on Fourse 3 and so on and so forth. As such, any modifications made within the system propagate throughout it in unpredictable ways. This makes it difficult to find specific sounds with Fourses and instead encourages exploration within the complexity of this behaviour. A number of documents help to explain the underlying principle of Fourses including the Fourses Manual and the FOURSES PAPERZ. Despite the enigmatic nature FOURSES PAPERZ, images such as IMAGE 4.2.1 help in deciphering the underlying behaviour of this synth and how its interconnected and chaotic nature emerges.
The Max patch version of Fourses that was used for Annealing Strategies can be downloaded by clicking this link. In addition to this you may also want to experiment with a web version put together by Oli Larkin. This can be found at https://olilarkin.github.io/fourses/.
One-to-Many Mapping in Fourses
There are 24 interdependent parameters that control the Fourses synth. Because they interact with both the synthesis algorithm and each other, they are unpredictable, and it is difficult to “learn” how they affect the output. I wanted to create a method for exploring these sonic states of the synth, without having to interact with a single parameter at a time. Toward this end, I devised a one-to-many mapping strategy that creates a field of possible states by evaluating a procedural basis function graph. This function produces a two-dimensional map of values according to a number of different basis functions, each with their own characteristics. These are then used to control each parameter by extracting a single dimension from this graph. VIDEO 4.2.2 is a screen capture demonstrating how this single input interface creates a one-to-many mapping. As the “origin” is changed, the procedural basis function graph is evaluated at different points, in effect, shifting the values of the graph spatially in the horizontal dimension. In theory the procedural function can be evaluated with any origin value, so it is clipped between a minimum and maximum value of 0.0 and 1.0 in order to constrain the possible parameters combinations that can be generated. In addition to this, the basis function can be altered, affecting the relationship between parameters as this offset is changed.
The benefit of this approach is that the single input allowed me to broadly explore parameter combinations in exchange for specific control. Furthermore, the configuration of the procedural basis function can create a variety of novel combinations of parameters quickly that adhere to a fixed relationship and structure. This was inspired by the work of Bowers et al. (2016) and their various strategies that for mapping a single knob to several synthesis parameters.
The single input interface was devised as a way of traversing between sounds that could be made by the Fourses synthesiser in a semi-structured and computationally led manner. However, I still had not addressed my original question — how the computer could discover the relationship between parameters and the sound that Fourses produced — particularly in relation to finding delicate and quiet sounds. This led me on another path of research into combinatorics and optimisation problems. Serendipitously I came across simulated annealing (SA). SA is named in reference to annealing in metallurgy, where material is heated and gradually cooled in order to find an arrangement of molecules that is optimally strong. At the start of this process molecules can freely rearrange in a quasi-random fashion. As the material is slowly cooled, an optimal state crystallises and the tolerance for “weak” movements and thus bonds to be made are less likely to occur.
SA works by considering a set of discrete data points to have both a state and a successor or neighbour. The state is the value, or data associated to a data point and the neighbours are other states surrounding that one. The neighbours can be determined in a number of ways, such as by their organisation in a graph or in a spatial representation in two or three dimensions. SA begins its process by selecting a random state. An initial temperature parameter is set at this point. The algorithm then begins progressing through a number of epochs, in which a new nearby state is selected and the data associated to that state is compared with the previous state. This is a single iteration. If the movement to this new state results in a positive difference between the two data points, then that move is considered “good” and is retained. If the move produces a negative difference the move is considered “bad”. Instead of simply rejecting the bad move, there is a probability that this move is instead retained in the process. This probability is determined by the Boltzmann Distribution. Each iteration reduces the overall temperature of the system depending on the cooling rate, meaning that as the algorithm progresses bad moves are less likely to be retained. The rationale is that while other optimisation algorithms such as hill climbing algorithms can get stuck in local minima and maxima, SA can temporarily escape them to search for global optimal states when the temperature is high. As the temperature lowers, fewer bad states are kept, and the algorithm will have ideally explored enough to discover the global maximum. In IMAGE 4.2.2, a flowchart describes the steps of the SA process.
SA can be applied to many problems across different domains including finding the minimum or maximum value in a set, such as in this weather data (see IMAGE 4.2.3) or solving the “travelling salesman problem” in which an optimally short journey is designed between a set of spatially dispersed nodes (see IMAGE 4.2.4). An interactive example by “Nayuki” uses SA to reconstruct a digitally “shredded” image. This particular example is apt for demonstrating how the parameters of the algorithm can affect the success and output of SA.
Combining Simulated Annealing With the Single Input Interface
In Annealing Strategies, SA is used in conjunction with the single input interface to create a system of control and automatic computational exploration of parameter combinations. Using an audio descriptor for the perceptual loudness of the output of the Fourses, SA discovers how the single input that traverses through a procedural basis graph can produce perceptually quiet outputs. At its core, the SA algorithm is optimising the loudness value by changing the input value, with the black box of entangled and coupled parameters sitting between those two as a complex and initially unknown function.
A step-by-step procedure is followed to realise this.
- Generate an origin value and create new parameters
- Update the Fourses synthesis with the new parameters
- Measure the loudness of the synthesis in decibels
- SA evaluates the next move based on loudness measurement
- Generate a new parameter set based on the outcome of the SA algorithm
- Go to step 2.
This process has several hyperparameters: learning rate, cooling rate and the maximum temperature. Changing these can alter the outcome of using SA in a number of ways, perhaps most importantly by affecting the success of the algorithm in terms of discovering a quiet state. If the system is cooled too quickly, or the temperature does not start sufficiently high, then the algorithm is less likely to discover the global optimum. The learning rate is my own addition to the algorithm which determines the maximum distance from the current state the new state can be. This is scaled to the current temperature so that the algorithm can select states which are not necessarily direct neighbours at high temperatures and only nearby states when the temperature is low.
An iteration of Annealing Strategies is created by running this algorithm and recording the audio output produced while the SA solves for an optimal solution. As a result, the technical implementation contributes significantly to the shape of the work and how I interacted with the system to compose.
Selecting an Output from the Simulated Annealing System
Annealing Strategies is a single output from the system which I selected after auditioning. The selection process was judged by how well it suited my conceptual goals I made prior to the compositional process as well as how satisfying it was in relation to my aesthetic preferences. Hundreds of iterations were made from this back and forth, and were sorted into “bad” and “good” iterations. Some of these bad iterations can be found in AUDIO 4.2.2, AUDIO 4.2.3, and AUDIO 4.2.4 alongside explanations of why they were classified in this way. The iterations are named by the computer according to the random seed which is used, the type of procedural noise basis and then a random identifier.
In iteration 592_noise.simplex_529155534 there is little variation from the initial state which is set up at the start. In this case, the algorithm likely failed to discover any quiet states in its search, due to a cooling rate that was too fast, or because the initial temperature was not high enough.
The sound which dominates the first two minutes of iteration 453_noise.simplex_529163614 exhibits morphological and textural properties that I liked. This kind of sound was what I initially intended for the SA to be able to discover as it ran. However, this initial sound is never altered significantly until around 2:12, at which point a low pitched and wandering bass sound is introduced. The changes at this point seem abrupt and it contributes to a behaviour that is more like a random search than a controlled and increasingly focused progression towards quieter sounds.
Iteration 738_noise.simplex_529152854 possesses a pitched “whine” from the start which is the most pervasive musical element throughout. There is variation to this sound and for some periods of time after this, shorter iterative sounds emerge. I found this particular sound world aesthetically satisfying. However, at 3:41, a new pitched sound arises which is relatively unaltered for the remainder of the iteration. While the first section was a result that I liked, the overall structure of this iteration was unsatisfactory.
A number of good iterations were made, and those that were not chosen as the single “best” output will be discussed briefly in [4.2.4 Reflecting On Issues Of Control and Intervention]. The output that was selected to be Annealing Strategies was iteration 738_noise.simplex_529135520 which can be heard in AUDIO 4.2.5.
- List of referenced time codes
This iteration possessed a sophisticated macro-structure that satisfied my original aims and goals. Compared to any of the other iterations, 738_noise.simplex_529135023 exhibited a natural and organic taper from initially chaotic and bombastic sounds towards a narrow and controlled final section focused toward intricate and quiet sounds. Many other iterations would seem to be stuck and focused on exploring a small range of parameters, while this iteration does not have that aesthetic problem and the overall linear form is ornamented by complex and novel diversions. In particular, this iteration exhibited a set of behaviours in which pitched and noise-based material are put in opposition with each other. At the higher formal level, the overall loudness is decreasing, while different sections emerge in which these two material groups are portrayed antagonistically and exhibit their own development.
For example, the opening section from 0:00 to approximately 1:10 presents a dense mixture of interwoven noise and pitched elements. At 1:11, the opening section is concluded, demarcated by a new longer section of primarily noise-based material. From this point onwards there are increasingly longer spans of time that alternate between noise and pitched material with less interleaving. Within each of these longer sections, the prominent sound type is developed further. The period between 1:30 and 1:50 is an example of this.
At 2:39 there is a longer passage that changes the way that pitched and noise-based sounds are juxtaposed. From this point onwards their presentation is purer and the interspersing of those two sound types is largely gone. Instead, the behaviour is more homogenous. For example, from 2:39 to 2:43 noise-based material is foregrounded without any semblance of the previous pitch material present. After this at 2:43, the pitch material rapidly returns and the noise dissipates till 2:47. A similar alternation of homogenous sound type presentation occurs again from the period of 2:47 to 2:57. As this behaviour unfolds in the macro-structure of this iteration, there are noticeable evolutions to the pitch and noise-based material sound types. The noise becomes more controlled and quieter and the presence of transients disappears. This can be heard when comparing noise-based sounds from 1:13 to 1:17 to that from the previously mentioned 2:39 to 2:43 section. Pitched material becomes less striated and erratic and progresses towards quasi-melodic chromatic phrases. Again this can be heard when comparing from 0:46 to 0:53 and from 2:48 to 2:55.
In conjunction with different classes of sounds developing under the control of the SA, unintended and emergent “sectional borrowing” was produced in this iteration. The opening phrase for example is echoed at 0:26, with the contour and sound types between these two moments being very similar. Another example of this occurs prominently between two moments at 2:38 and 2:46. In both of these sections the gesture is extremely similar, despite this kind of phrase repetition not being explicitly coded into SA. A final example of this emergent “borrowing” behaviour occurs between the two sections at 0:20 and 0:55 that contain a short passage of low pitch material.
Other iterations rarely possessed such sophisticated structural properties and this was a significant factor in me selecting 738_noise.simplex_529135023. These features were not manually composed and emerged from the interaction between the SA algorithm and the Fourses synth.
Reflecting on Issues of Control and Intervention
Annealing Strategies was an artistically successful piece for me and fostered a compositional workflow where a significant amount of agency was given to the computer without having to sacrifice sophistication in the music. I endeavoured to achieve this in Stitch/Strata but was often unsuccessful in my attempts to have the computer be responsible for abstract and complex musical aspects such as high-level form. The compositional blockages of that project resulted in me utilising a workflow based on manually arranging small computer generated phrases and gestures of descriptor-driven concatenative synthesis outputs. This required me to alternate between generating those phrases and gestures, auditioning them and returning to the generation stage in order to make sense of the material and undertake further decisions about how it would be used in composition. Taking control in this way involved, to a large degree, relying on my intuition (such as for ordering of samples at the macro-level), rather than accepting what the computer had generated. Annealing Strategies was entirely different to this, with both the high-level and low-level formal decision making emerging from the control of SA, and much of the detail in the surface of the music being resolved by the system automatically.
Giving away control and agency fundamentally altered my interaction with sound materials and the ways that I was able to influence the compositional result. Without a mechanism for interacting with the system while it ran, my main influence on the computer was to alternate between listening to an output of the system and to modify the parameters or the system itself. Examples of this method of interaction and control included changing the hyperparameters of the SA process: cooling rate, starting temperature, learning rate as well as the random seed and type for the procedural noise graph. Instead of thinking about how I could achieve specific sonic moments, I was focused on creating a set of conditions in which a particular overarching musical behaviour would emerge. The system would then be responsible for creating detail and sophistication within the constraints of that process. This compositional workflow relates directly back to the notion of “Lose Control Gain Influence”, discussed in [3.5.2 Situating My Practice]. By affording the computer the role of creating structure, I was able to achieve coherence at many hierarchical levels of form without having to design specific musical moments or rely on modelling my intuitive and manual decision making. This took away the possibility for honing a specific musical section or idea that I could imagine or plan though, but allowed the computer to produce surprising and unanticipated outcomes that challenged my compositional tendencies.
Iterating on Approaches from Annealing Strategies
While there was a high degree of success in my approach, it was not a generalisable workflow that I felt I could apply to different materials or conceptual goals in the future. The combination of the Fourses synth, the SA algorithm and the single input interface came together almost serendipitously, and I was fortunate that these technologies were well-suited to achieving the aesthetic and conceptual aims of the piece. In other words, my compositional problem was relatively straightforward to express as a computational problem. This meant that I was able to more easily embrace the characteristics of the algorithm and its inflection on the result of the music. I did not experience much resistance in allowing it to contribute significantly to the compositional process.
For example, the SA converges over time on an optimal solution. In each iteration the Fourses synth has its parameters manipulated wildly, resulting in a chaotic musical surface. As SA iterates, the scope of sounds that can be produced is constrained, and the variation between parameter values is reduced significantly. In comparison to the initial sound world that results from SA’s control, the end of each iteration was stable and texturally focused. This musical behaviour was compatible with the vision I had for the piece, and therefore I did not have to bend or “tame” the algorithm drastically in order for it to produce aesthetically preferable results. That said, it is likely in the future that I might aim to create a complex and non-linear formal structure, in which simply running the algorithm will not be able to serve those aims well.
Furthermore, there were moments in the creative process where I was not sure that the algorithm would be able to produce a satisfying piece in one iteration. I auditioned several good iterations with novel characteristics but none of them were convincing to me as standalone pieces. Prior to producing 738_noise.simplex_529135023, I anticipated that to achieve a satisfying solution I would have to edit and stitch together sections from multiple iterations which I deemed to be satisfying. To me, this would have made the use of SA feel somewhat arbitrary and pointless. If I had not incorporated the inherent behaviour of SA and let it influence the high and low-level form, then another algorithm for the same purpose that was built from the ground-up would likely be controllable, customisable and required less blind experimentation to produce a desirable result. I wanted the algorithm to shape the piece entirely and with minimal intervention from me once running, and editing together numerous takes to achieve a structure that progresses from loud to quiet parameter combinations of the Fourses synth would have betrayed my initial conceptual aims.
The confluence of many aspects made Annealing Strategies successful, but did not engender a replicable workflow that could be iterated on in the future. After I had completed this project, I felt as if I had explored the semi-supervised and unsupervised range of the spectrum of computer agency. I was able to re-visit and potentially solve many issues I came across in Stitch/Strata, but it also raised new questions about how much immediacy and influence I would like to retain in future works. As such, I was motivated going forward to keep experimenting with the different balances of agency in my creative practice. The next project, Refracted Touch investigates how real-time live electronics can be used as a way of structuring formal events through improvisation. In terms of the overall portfolio, it is quite experimental and divergent, perhaps demonstrating a last effort to explore widely before honing my compositional practice toward a more stable set of techniques and practices in Reconstruction Error and Interferences.
I gave a presentation at the 2018 Electric Spring Creative Coding Lab Symposium that discussed the motivations and creative process for Annealing Strategies. While this video at times skims over detail due to time constraints of the presentation it is documentation that supports this work and thus deserves inclusion in this thesis. This can be viewed in VIDEO 4.2.3