Web Audio API – Bits of Sound

Introduction

“Da da da dum! Doesn’t that stir anything in you?”
Ford Prefect, The Hitchhiker’s Guide to the Galaxy

Sound and music have been an part of our society since time unknown. Impossible as it would be to describe to a society without some sort of musical culture, bursts of energy reflected by the contractions produced in our eardrums and felt in our bones have powered many of humankind’s most memorable occasions, from early music with rocks in it to extravagant compositions.

For its part, the web has joined in this cultural phenomenon in an entirely half-hearted, non-standard and often painful fashion. The combination of poor MIDI and pop-py prerecorded samples have left many an unwary browser scrabbling for the Close button as a discordant finger scratches its way across their soul.

The Web Audio API aims to change at least some of that with a full-featured set of capabilities for working with audio including playing files, analysing sounds, applying filters, and generating new sounds. Although it is not a standard API (as HTML5 audio development has been hampered over disagreements in formats) it is receiving ever-increasing amounts of attention as its capabilities become more understood.

The API itself is a high-level Javascript library which allows the developer to abstract away from low-level details of how the sound is produced from a hardware perspective whilst still allowing for sounds to be analyzed as well as generated. The idea behind the Web Audio API is the audio routing graph is a digital form of the standard physical setups commonly found in normal sound engineering.

The audio routing graph is made up of a series of nodes. These nodes have inputs and outputs through which sound is passed. For example, a gain node could take the sound coming in, amplify it, then pass it on. Various effects can be created by combining these nodes in different ways. The sound source is often loaded in from a file and then introduced to the system via a sound source node – several tutorials look at this.

Instead, this tutorial will focus on the generation of new sounds in a somewhat musical-way, but assuming no prior knowledge of the Web Audio API or of music in general. Some math will be involved, but we’ll keep it as short and as explained as possible. I leaned heavily on the generating tones with Web Audio API¬†tutorial in developing the original wave generation code for this, but we’ll take it a couple of steps further as well.

Some other topics not directly related to the generation of sound are omitted to keep the code examples as clean as possible. This tutorial will use the cleaner code, but a link to complete working demos will be provided.

Web Audio API

The starting point for all of our Web Audio API work is an audio context. As the Web Audio API has not yet been completely standardized, this may be retrieved differently depending on the browser being used. I’ve wrapped up the convenient functions I’ve created as part of this experiment for ease of use and reuse in an object called¬† jAudio¬†. To retrieve the audio context, we can use the following –

This function checks for the existence of any of audio contexts for the Major browsers and returns it. This function is based on the one found in Web Audio API by Boris Smus which, at the time of writing, is free. In addition, it saves a reference to our current audio context to allow us to retrieve it if needed. Only one audio context is used at any point for our sound – more than one context is possible, just not particularly useful.

Waves of Sound

“… and with the right kind of eyes you can almost see the high-water mark – the place where the wave finally broke and rolled back.”
Hunter S. Thompson, Fear and Loathing in Las Vegas

The sound that most of us perceive in our day-to-day lives is made up of the peaks and dips of waves of pressure travelling through some material (such as the air around us) until it reaches our eardrums.

Basically, sound is waves of pressure. This pressure travels to us via the air and smashes into our eardrums. This action is interpreted by our brain as either pleasant or unpleasant, with corresponding physical reactions – we dance or leave. All of this excitement, caused by waves.

The height of these waves (the amplitude) can be correlated directly to the perceived loudness of the heard sound, whereas the distance between the waves (the frequency of the waves) changes the pitch of the sound we hear. The closer the waves are to one another, the higher pitched the sound we hear.

If we look at the picture as showing a graph of the current frequency (f) on the vertical axis plotted against time (t) on the horizontal axis, the current position of our outputted sound is known as the phase.

There’s something else that we can associate with having a frequency, amplitude and a phase – an old school friend.

Original Sine

f(x) = \sin(x)

Sine has all of these components – the frequency is determined by how quickly we increase x (and so how quickly the function cycles), the phase gives the current point in the cycle and the amplitude can be brought through by multiplying the equation by some amp factor.

So how can we use sine to generate new sounds? The answer to this lies in the way in which we convert sound from continuous physical waves in digital formats that computers can work with.

Sampling

To convert a continuous wave into discrete values, we sample the wave at various points. By sampling, I mean we take the value of the wave at that point. If we do that enough times per second we end up with an approximation of what the original wave sounded like. It will never be exact, but it will be close enough that the human ear cannot tell the difference.

To create CD quality audio, for example, the incoming wave is sampled 44 100 times a second. This is the accepted sampling rate for CD audio – 44.1 kHz.

If these 44 100 samples are played back at the same speed, we generate a reasonable approximation of the original sound. Hence, to create new sound, we’ll need to generate samples of our new sound thousands of times per second. This may sound like a heavy computation load but it’s nothing that would strain any modern computer.

There is one other thing that we need to consider however – we need to have some control over the frequency of the sound that we generate. If we were to simply generate new samples at a rate of our sample rate times per second and increment our phase each time, we would generate a sound having a frequency directly proportional to our sample rate.

Not only would this frequency most likely be fairly high and unpleasant to hear, if the frequency is tied to our sample rate then we have no control over it which makes our generated music somewhat dull and uninteresting.

Instead, we need to disassociate our frequency from our sample rate – let’s have a look at doing that. Frequency can be calculated as

\text{frequency} = \frac{\text{sample rate}}{\text{cycle length}}

In our case we know that our cycle length is 2\pi, so let’s substitute that in immediately.

\text{frequency} = \frac{\text{sample rate}}{2\pi}

If we were to scale our x in our \sin(x) by this, we would therefore end up normalizing the frequency and removing the effect of the sample rate on it.

f(x) = \sin(\frac{x}{\frac{\text{sample rate}}{2\pi}})

This is a good first step, but we still have no control over our frequency. Let’s introduce a frequency variable that we do control.

f(x) = \sin(\frac{x \times \text{frequency}}{\frac{\text{sample rate}}{2\pi}})

Now we have an equation that lets us specify frequency independently of the sample rate. We can clean this equation up a little and introduce the amplitude variable.

f(x) = \text{amplitude} \times{} \sin(\frac{2\pi \times x \times \text{frequency}}{\text{sample rate}})

All we need now is to implement this in code and we should have a generated sound where we can control both the frequency and the amplitude – enough for a multi-platinum album by modern standards.

Let’s do it

Creating the JavaScriptNode

In order to create our sound, we’ll make use of the JavaScriptNode. ¬†This node fits into the audio graph as with any other audio node. However, when this node is reached in the graph, a callback function that we defined is invoked with the current sound contents. This allows us to analyse the incoming sound, modify it or in our case generate entirely new sound.

The JavaScriptNode¬†can be created using the AudioContext¬†that we acquired earlier –

The JavaScriptNode¬†takes in three arguments – the window size, the number of inputs and the number of outputs. Here we’re specifying that we’ll have a window size of 2048 bits – this means that each time we generate a sample, we’ll be generating 2048 bits of it. We’ll also have 2 inputs and outputs, for the left and right audio channels.

Next, we need to find the sample rate that we’ll be working with. This can be read directly from the context. We’ll also set some defaults.

We’ll start with an amplitude of 1, meaning that we won’t alter the signal at all. Our phase is 0, as we start at the beginning of our window. We’ll set our initial frequency to 440Hz for reasons that will be made clear in sections to come.

The process function

When the JavaScriptNode is reached in the audio graph the onaudioprocess  function associated with it is called. We insert our own code to generate sound at this point.

First, we extract the left and right channel buffers. Next, we fill up each buffer by iterating across their length and getting a sample of our sine wave to fill the current point. Then we advance the phase and repeat for the next point. This functions are presented shortly. The last thing that we need to do is set the node onaudioprocess function to call our code.

The getSample function is a direct JavaScript implementation of our previous equation, with a slight change to allow for the phase to be advanced.

Here we’ve wrapped the \frac{x \times \text{frequency}}{\text{sample rate}} factor up into a phase¬†variable – this allows us to calculate the amount that we need to increment our current phase by in a more convenient manner. The phase variable ranges from 0 to 1, meaning that after it has been multiplied to the 2\pi we have a range going from 0 to 2\pi as required.

The phase is advanced by an amount equal to the frequency divided by the sample rate. As we know that this function will be called a number of times equal to the sample rate, this means that the phase will end up advancing at a speed equal to our frequency, giving us the frequency control that we need. Once the phase has exceeded 1, we subtract 1 from it to return to the beginning of the phase again.

This is all we need to generate a basic wave generator – give it a try.

The result wasn’t particularly pleasing, was it? It was mathematically sound, just not nice sound. We’ve achieved noise. Although we generate smooth frequencies, the frequencies don’t vibrate particularly nicely in our brains. Well, probably. The problem really is the focus on frequencies in general, instead of specific frequencies.

It is this frequency that determines what note the frequency corresponds to – we’ll examine music and notes briefly in the next section and then see how we can go about creating our own.

Sound of Music

There are millions of chords. There are millions of numbers. And everyone forgets the one that is a zero. But without the zero, numbers are just arithmetic. Without the empty chord, music is just noise. Terry Pratchett, Soul Music

Music is made up of groups of frequencies which are generated and not generated at intervals. The range of frequencies and the various timings thereof are subjectively categorized as the various styles of music and are found to be pleasing by different sections of society.

This is an obviously terrible, dry and lacking description of music. It serves for our purposes however, being to examine music from a more technical perspective.

Musical notes are commonly divided up into 8 distinct full steps, beginning at one note and ending on the same note one octave higher. Moving an octave higher is equivalent to doubling the frequency of the note. The notes used are the first seven letters of the alphabet – that is,

A – B – C – D – E – F – G

A complete octave would then be A РB РC РD РE РF РG РA, with the final A being at double the frequency of the original A. In addition, there are four half-steps in which the note may be made either sharper (higher) or flatter (lower) meaning that each octave is divided up into 12 steps. The full range is traditionally written as

C – C# – D – Eb – E – F – F# – G – G# – A – Bb – B

A scientific pitch notation exists to represent this where the note as well as its octave are given. If one note was a C in the fifth octave, C5, then the C one octave higher would be C6. A single step higher would be written as Cb6.

The octaves are commonly given with A4 being set to 440hz (ahah!) and the other notes calculated relative to it. Given that each octave doubles the previous octave’s frequency, we can calculate the frequency for each note in general as

f_n = f_0 \times a^n

where f_0 is a fixed note, commonly A4. n is the number of half steps we are away from the fixed note and a is \sqrt[12]{2}. Commonly, using A4 as the root note, we use

{f_n} = 440 \,\text{Hz} \times 2^{n/12}

For example, say we were interested in calculating the note 12 steps up from A4. 12 steps makes up one octave, so we should be calculating A5 which has double the frequency of A4. Our n is 12. Substituting this in,

{f_n}= 440 \,\text{Hz} \times 2^{12/12}

{f_n}= 440 \,\text{Hz} \times 2^{1}

{f_n}= 440 \,\text{Hz} \times 2

It is clear that we do indeed end up with a frequency that is double that of our original frequency. This may not be an exhaustive test, but it’s a nice quick sanity check we can perform.

Using the first formula and working back, we can derive the formula to calculate the number of half-steps from a given root frequency. This is done by throwing \log‘s at the formula a couple of time.

n = \frac{\log{\frac{f_n}{f_0}}}{\log{a}}

We know quite a lot now – using these formulae, we are capable of specifying the frequency we would need to generate a wave corresponding to a specific note. Let’s have a look at the JavaScript implementations of these.

 Notes.js

I’ve wrapped up all of the music related functions in a little library I’ve imaginatively named notes.js for the moment. It’s not particularly advanced, but available in the demo should you want to play with it. First off, let’s have a look at the getFrequency¬†function.

This function exists on a NoteOctave object, being an object containing a value for a note as well as an octave. First off, we check if the note whose frequency we are obtaining is our base note. This is A4 in the library, but this can be changed should you have need to. If the note is the base note, we return the base frequency Рas we tune relative to this, we cannot calculate it.

Otherwise, we calculate the number of steps away that the current note is from the base note. This is our n that we can substitute into the previous formula, which we do on the next line. NoteOctave.A  is set up as a constant equal to the 12th root of 2, as required by the formula.

The code to calculate the number of steps is fairly simple – we simply calculate the difference between the positions of the notes and the octaves, and return it as a number of steps.

Next up we need to convert the inverse function – the function which, given a frequency, gives us the corresponding note.

Again, this is a direct implementation of the formula, with the logCurrentOverBase  variable extracted into a temporary variable simply to make reading it somewhat easier. This formula gives the number of steps away from the base note that the frequency is, so we simply add this many steps to our base note to find the new note. One change that exists in this implementation is the introduction of Math.round  Рthis ensures that we get the nearest true note to the given frequency.

The code to add steps is fairly basic – we simple add (or subtract) the number of steps, and then increment or decrement the octave if the number of steps has exceeded 12 in one direction or another.

Using these methods we can alter our slider function to produce the frequency corresponding to the closest note instead of the actual frequency it is set to. This is as simple as converting to a note and converting back to a frequency.

You can have a look at this example here.

Our produced sound, although now tuned correctly, still lacks a certain structure. We need to scale this example up a little bit.

Scales

A scale is a series of notes ordered by pitch. Musical pieces are commonly built using the notes of a single scale. Given the math that we’ve seen so far, it would be reasonable to expect that the math behind a scale is fairly complex. In addition, it would be incorrect to assume that.

Scales are simply combinations of notes that various cultures find appealing. They are generated by taking a base note, then moving up a certain number of steps and using that note, then moving up a certain number more until the base note one octave higher is reached.

We already have the functions needed to move up a number of steps from a note, so we have the JavaScript needed to generate a scale. In this case we’re going to be focusing on the pentatonic scale, purely so that I can point you toward’s Bobby McFerrin’s awesome section on the pentatonic scale in the World Science Fair. It’s worth watch, especially after some math, to remind you that music is fun and exciting.

Anyways – the pentatonic scale is built by taking a base note, then moving up 2, 2, 3, 2 then 3 steps, each time taking note of the note. For example, starting from A:

Base 2 2 3 2 3
A

We’ll move up 2 steps (from A, to Bb, to B) and take that note, so the next note is B.

Base 2 2 3 2 3
A B

Continuing in this fashion we get

Base 2 2 3 2 3
A B C# E F#

And moving the final 3 steps, we’re back at A, one octave higher.

Base 2 2 3 2 3
A B C# E F# A

So how do we do this in code? Well, we need an array specifying which steps to move up from. This code takes in a base note, or will calculate from A4.

We can also specify a total that allows us to keep generating notes further up in the scale. This is part of a Scale object, which is initialized in the library with three different scales built-in.

As you can see, adding new scales is as simple as including the number of steps that should be moved each time. This can be seen in our previous example by clicking the Play Scale text, which will play a pentatonic scale starting on the current slider note.

We can now generate a series of notes – pretty neat huh? We have everything we could need to generate music, but writing it in code can be a pain. Instead, using my previous post on writing a guitar tablature parser and by combining six of our note generate nodes in parallel, we can generate “music” from tablature. Just take some guitar tab from anywhere on the internet, drag and drop it on. This is still relatively basic, but time willing I’ll make it more sophisticated in the future.

I hope this article offered some enjoyment and hopefully some learning – post any questions that you have and I’ll do my best to answer them.

Tagged with: , , , , ,
Posted in Javascript

Leave a Reply