Real-Time Audio Synthesis on ESP-32

Nick Donaldson — Sat, 30 Apr 2022 16:32:00 GMT

Background

I've recently been learning the ESP-32 platform as part of some contract work I'm doing for a synthesizer-adjacent product line. The requirements for the work involve real-time, audio-rate digital synthesis via I²S which is something that is a bit lacking in the official ESP-IDF examples. With some research, prior examples, and a bit of stabbing in the dark, I was able to get a standard buffer-callback-like architecture working for providing samples to I²S DMA buffers using a FreeRTOS task, providing a simple foundation on which to build real-time audio signal processing on ESP-32 for any application.

Although there is a basic I²S DSP example provided in the official ESP-IDF SDK which creates a simple oscillator tone, it isn't all that helpful in practice. For one, it isn't really doing "real-time" synthesis, in the sense that the static oscillator tone in produces is at a frequency specifically chosen such that its period in samples is (and must be) exactly one DMA buffer in length. This means that the buffer only has to be filled up one time – except when changing the waveform or bit depth – and the I²S driver will continue to read the same samples out of the DMA buffer over and over, creating a sustained tone from a perfectly repeating single cycle wave.

I published a GitHub repository containing my own example code which you can access via the link below. Read on to learn more about how it works.

>> Get the Example Code here

A Quick Overview

At a high level the example code is doing a few things:

Configure the I²S driver to use the ESP-32's onboard DAC at our defined sample rate
Start a FreeRTOS Task to continuously fill the DMA buffers with samples
Within the task loop, generate a sine wave at an arbitrary predefined frequency

Steps 1 and 3 are the focus of this article and covered in some detail below, with a summary at the end that ties everything together with step 2.

The ESP-IDF I2S Driver

ESP-IDF, the official SDK for development on the ESP-32, provides a C driver API for I²S just as it does for other types of peripherals. The driver is fairly well-documented so we won't go into a ton of detail on how to use it, but rather focus on how it is used in the example code linked above.

Configuration Options

The first thing we see in main is a declaration of a config struct with a bunch of values:

i2s_config_t i2s_config = {
    .mode = I2S_MODE_MASTER | I2S_MODE_TX | I2S_MODE_DAC_BUILT_IN,
    .sample_rate = SAMPLE_RATE,
    .bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
    .channel_format = I2S_CHANNEL_FMT_RIGHT_LEFT,
    .communication_format = I2S_COMM_FORMAT_STAND_MSB,
    .dma_buf_count = DMA_NUM_BUF,
    .dma_buf_len = DMA_BUF_LEN,
    .use_apll = false,
    .intr_alloc_flags = ESP_INTR_FLAG_LEVEL2
};

Here's an overview of each meaningful value for our use case:

.mode = I2S_MODE_MASTER | I2S_MODE_TX | I2S_MODE_DAC_BUILT_IN

Bitmask which puts the driver in the "master" (preferred term: "leader") mode, tells the driver we want to transmit (TX) data and not receive it, and that we want to use the ESP-32's built-in 8 bit DAC rather than an external DAC or Codec IC.

.sample_rate = SAMPLE_RATE

Sets the sample rate for the driver. This also informs how the underlying clock source will be configured/divided.

.bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT

Sets the number of bits per sample. For the built-in DACs we have to use 16 bit as documented even though they only support 8-bit resolution. Only the MSB of each 16-bit sample is used (more on that later).

.channel_format = I2S_CHANNEL_FMT_RIGHT_LEFT
.communication_format = I2S_COMM_FORMAT_STAND_MSB

Sets the I²S channel and communication format (including endianness). For external codecs these need to be set to something with which the codec is compatible and will inform how data is transmitted at the signal protocol level. For the internal DACs we have to use exactly these values.

.dma_buf_count = DMA_NUM_BUF
.dma_buf_len = DMA_BUF_LEN

Sets the number and size (in sample frames, NOT bytes) of the allocated DMA buffers. In common real-time audio terms - this effectively determines our buffer size and latency, as discussed in more detail below.

.use_apll = false

Tells the driver NOT to use the ostensibly more accurate and evenly divisble APLL for the main I²S source clock, in favor of using PLL_D2. The APLL does not seem to work correctly for the internal DAC, though from reference documentation it seems like APLL is preferred for external codecs in order to avoid stability issues from dividing by decimal values.

.intr_alloc_flags = ESP_INTR_FLAG_LEVEL2

Sets the interrupt flags for allocating the I²S DMA interrupt. This is the second highest C-level interrupt we can choose. Passing 0 here instead will have the system automatically choose a dedicated interrupt level which may be a better option.

DMA Buffer Size and Latency

For the purposes of the example program, we are telling the driver to allocate 2 x 32 frame DMA buffers for our I²S output. In bytes, this comes out to 2 * 32 frames * 2 channels * 2 bytes/sample = 256 bytes. The program must be able to fill up one of these buffers in the time it takes for the DMA controller to write the samples in the other buffer out to I²S, which is about 725 microseconds at a sample rate of 44.1 kHz – this is effectively our output latency. For those acquainted with real time audio APIs in desktop or mobile operating systems, this should all sound very familiar.

Why only 2 buffers? Because we ideally want to minimize output latency. The program fills up the buffer not in use while the DMA controller is transmitting the other one, then swap, ad infinitum. The driver takes care of the "swapping" for us as the i2s_write() function will block until the DMA controller has buffer space available for us to write into. Note that this is essentially an asynchronous block; the FreeRTOS scheduler will switch to other tasks while we're waiting (more on that a little later).

The more DSP calculations we run in our synthesis loop, the more CPU time we will be using before our "deadline" to fill the buffer. If we don't keep up, not only will our output have dropouts, but we will also end up completely consuming one of the ESP-32's CPU cores leading to unexpected behavior and the FreeRTOS scheduler watchdog printing errors to the console. This example program is doing nothing but generate a simple sine wave oscillator, so it can keep up with a 32 frame buffer. More complex applications may require larger buffer sizes and/or counts, at the cost of added latency. I recommend this excellent overview as a general explanation of tradeoffs when choosing the buffer size and number of buffers.

Installing the Driver

Once we have our configuration struct populated correctly, there are two simple steps remaining to "install" the I²S driver at the system level:

First we call i2s_driver_install() passing which I²S "port" we want to use and a pointer to our struct. The last two parameters pertain to setting up a queue used for receiving audio input, which we aren't doing in this example.

i2s_driver_install(I2S_NUM, &i2s_config, 0, NULL);

Finally, we set which GPIO pins we want to use for our chosen I²S port. For an external codec IC we would need to choose pins for each of the discrete signals that are part of the protocol, but for the internal DACs we simply pass NULL in lieu of a configuration struct.

i2s_set_pin(I2S_NUM, NULL);

These functions can both return error codes and it's probably wise to check those in a real application. If all is well, at this point we should have an installed and running I²S driver outputting silence from the allocated DMA buffers since we haven't written anything to them yet.

Simple Sine Wave Synthesis

Above the main function there is another function defined called audio_task which contains a loop in which simple sinusoidal oscillator is synthesized at a predefined frequency.

for (int i=0; i < DMA_BUF_LEN; i++) {
    // Scale sine sample to 0-1 for internal DAC
    // (can't output negative voltage)
    samp = (sinf(p) + 1.0f) * 0.5f;

    // Increment and wrap phase
    p += PHASE_INC;
    if (p >= TWOPI)
        p -= TWOPI;

    // Scale to 8-bit integer range
    samp *= 255.0f;

    // Shift to MSB of 16-bit int for internal DACs (interlaved buffer)
    out_buf[i*2] = out_buf[i*2+1] = (uint16_t)samp << 8;
}

There is nothing fancy or unusual going on here; for each sample, we're simply passing the current phase value into a standard sinf() function to get a sinusoidal wave sample at the current phase and then advancing the phase variable by the correct amount for our desired frequency at the output sample rate. In practice, depending on performance needs, this might instead be done with an interpolated table lookup from a table loaded into memory.

PHASE_INC is defined via the relationship between our desired oscillator frequency (in Hz) and our sample rate (also in Hz) as:

(TWOPI * WAVE_FREQ_HZ / SAMPLE_RATE)

The wave frequency and sample rate are simple preprocessor macro definitions, which may be redefined to arbitrary (sensible) values for experimentation. Changing WAVE_FREQ_HZ will produce a sine wave at another arbitrary frequency. In a more practical synth implementation, the frequency of the oscillator would be dynamic and influenced by some other means of control (directly via a knob, etc) which would not be a difficult modification to make to this simple example.

The sine wave sample is also scaled and offset from its standard range of -1 to 1 to a range of 0 to 1 for purposes of using the internal DAC, which is unipolar (it can't produce negative voltages). With an external I²S codec IC designed for audio, you would not need to do this.

Next, the normalized floating point sample is scaled to a range of 0 - 255. While it's convenient to work with floats during the synthesis phase, the DAC requires us to write integer values into the output buffer, and the internal DAC on ESP-32 has an 8-bit resolution - hence 2^8 - 1 = 255 as the max value.

Finally, we write the sample into the temporary output buffer. This buffer is not the DMA memory buffer but rather one that has been statically allocated into SRAM, as a temporary space for us to fill a buffer before it's copied to DMA memory. Note that the buffer is interleaved, so we write each sample for each channel next to each other in memory.

Remember how we configured the I²S driver for 16-bit samples, and that the internal DACs only use the MSB? Because of that, we need to cast our sample to 16 bit integer and shift left by 8 bits before writing to the buffers. Note that in this example both DACs are given the same sample, but they could be given different samples for stereo or dual mono output.

⏱ An important note about performance

The ESP-32 has a floating point unit (FPU), but it only natively supports single-precision operations and is quite slow at division compared to multiplication. (Benchmarks for reference). Thus, explicit use of single-precision library functions, constants, and variables is encouraged for CPU efficiency, as well as pre-calculating the reciprocal of divisor constants or variables that don't change often, so you can multiply instead of divide in the DSP loop.

Passing output to I2S

There is one additional line of code in the synthesis task loop which copies the temporary output buffer into which we just wrote our samples to the I²S driver's allocated DMA memory buffers:

i2s_write(I2S_NUM, out_buf, sizeof(out_buf), &bytes_written, portMAX_DELAY);

As previously mentioned this will block the current task and let the scheduler run other tasks until there is space in the DMA memory - i.e. one of the buffers has just been shifted out via I²S and is now free for us to write new data into. The last parameter determines the timeout interval for which the scheduler should wait for free DMA memory space until "giving up" and moving on - we pass portMAX_DELAY because for this use case, we never want it to "give up".

This function also takes a pointer to a size_t variable which is populated with the number of bytes actually written when the function returns. If there were not enough space to write the entire temporary buffer before the scheduler timeout interval elapsed, the output value would be less than the number of bytes we requested to write. In our case, since our temporary buffer out_buf is exactly the same size as one of the 2 DMA buffers we allocated when configuring the driver and because we have an infinite timeout, we don't need to check this value.

Putting it All Together

So far we have covered how to configure and install the I²S driver as well as how to synthesize some audio samples and feed them to the output buffers. The last thing we need to do is setup a FreeRTOS task to run our output loop.

In short, a task is a "routine" with its own stack and specific priority for the FreeRTOS scheduler to make time to run, balanced against other tasks and things demanding CPU time. Tasks are generally not meant to ever return; they typically contain an infinite loop which waits for some work to do, does the work, and then delays or goes back to waiting for more work to do.

A task is created by invoking xTaskCreate() with a few parameters

xTaskCreate(audio_task, "audio", 1024, NULL, configMAX_PRIORITIES - 1, NULL);

The first parameter is a pointer to the function which represents the entry point of the task. The signature of the function should be void func_name(void *data) - it takes a void pointer to arbitrary data and returns void. Here we are passing our audio task function which contains the infinite loop to fill the output buffer.

The second parameter is a unique name for the task, and the third parameter is the size of the stack that will be allocated for the task. In our case 1024 bytes is plenty as we aren't aren't allocating much data on the stack or invoking any deep function call trees. If it's not enough, ESP-32 will print a stack overflow error to the console and reboot.

The third parameter is a pointer to any arbitrary data we want to pass into the task entry point function. In our case we don't need to pass anything into the task, so we pass NULL.

The fourth parameter is the task priority. In the case of a low-latency, real-time synthesis output, our output deadline is paramount. We must under no circumstances miss the deadline or audio dropouts will occur. Thus, we want this task to have very high priority, hence we give it the system's maximum possible priority configMAX_PRIORITIES - 1.

The final parameter is an optional pointer to a task handle variable which will be populated with a reference to the task that is created. This is useful if you need a reference to the task, but we don't , so we pass NULL.

Once this function is called, our task is created and the system starts running it. At this point it's safe for us to return from main because we have created at least one other task that is still alive. Now we should have a sine wave oscillator output at both of the internal DAC pins!

Thanks for reading, and I hope this is a useful resource to ESP-32 synth tinkerers!

Infrasonic Audio Blog