PMDX – ses1 (tryouts on polyrhythmic sequencing) – UNRELEASED

Photo credit: Manu Casir (https://www.instagram.com/manucasir)

Mar 25-2018.

This track is unreleased.

 

Advertisements

Implementing low latency audio processing on low-power Linux devices (with Pure Data)

Nürnberger Knoblauchsland from by bicycle.

Here’s something I wrote a couple years ago. Better here than lost in my hard drive. Cheers. (Note: this was before Bela)

Overview and state of the art

Up until recently, the only way to efficiently implement audio processors on low power platforms was to use a modular approach of separated embedded devices, with each module taking care of a specific function (i.e. user interface, signal processing or memory management) by making use of an accordingly optimized architecture. The trade-off to this very efficient configuration is the difficulty of portability or updates, low level development and lack of flexibility.

Recent embedded devices manufactured under the category of System on Chip (SoC) offer the possibility of running operating systems akin to those on desktop PCs or Laptops, with a highly developed interface for its peripherals. System integration for a possible audio processor/synthesizer is therefore more straightforward.  Digital signal processing algorithms can be developed on a higher layer and, on one hand -since its interaction with the processing core and peripherals is already handled by the operating system- its portability becomes easier.

Real time performance on general-purpose systems

On the other hand, these devices are not optimized for real-time audio processing, at least by default. Task schedulers on these general-purpose operating systems are designed to maximize CPU usage within the available processing power. Therefore the audio thread –and any other task handled by the OS- will be scheduled in order to maximize throughput and to cause minimum idle processor time among tasks. This isn’t necessarily compatible with the notion of real-time processing, where an audio frame needs to be fully treated before the next one comes. If this deadline is not met, audio drops will occur. Under a default configuration, the task scheduler will accept not meeting the audio deadline as long as the CPU resource usage is deemed optimal in a statistical way. This kind of scheduling might be enough and not cause any drops on higher power systems (i.e. desktop/laptop computers) under sufficient buffering, but low-power systems will easily show the limitations of this scheduling mechanism [1].

In order to overcome this problem, the task scheduling of the operating system needs to be modified in order to give a designated thread the highest priority. The scheduling mechanism has to be designed to use all resources available to meet the real time deadline of the given thread (i.e. audio), even if the resource use is not optimally distributed among all available tasks.

Real-Time Linux kernel

In recent years, a modification of the Linux Kernel to support full task preemption (RT PREEMPT) has emerged, where the behavior described in the previous paragraph can take place \cite[]. Although initially meant for industrial control applications, its use for audio applications has proven to give new perspectives on stability not reached before for this kind of general-purpose OS processors.

Modular Audio Signal Processing using Pure Data

Pure Data (PD) is a modular signal processing system created by Miller Puckette [2]. Its main focus is to implement signal processing and synthesis of audio and multimedia streams for artistic purposes. Although in later years there has been a divergence in the development’s direction, Pure Data bears a great amount of similarity to its analog visual programming environment Max/MSP (also created by Puckette himself). To a certain point, Pure Data could be considered as a free, open-source version of Max/MSP. Some of the reasons for using PD are:

  • It is modular, visual style is very similar to other low level audio processing software such as Reaktor. It can implement block-wise and/or sample-based processing. Instruments/processors already developed in Reaktor should be easy to port.
  • Due to its open-source nature, its high configurability is especially well suited for low-power devices, where very specific tuning of resource handling might take place in order to reach a higher efficiency or stability.
  • It runs under Linux and it is free. No software license has to be paid.
  • PD can also be embedded as a library via libpd to be integrated on external apps, games, web pages and art projects [3].

Choosing the right device

Choosing the right device for implementation requires a relatively extended feasibility study of the different choices available in the market. There are many topics that need to be considered, amongst them are, for example:

  • Price
  • Computational power
  • Power consumption
  • Integration on end product and development tools
  • Reliability and availability of the parts in the long term
  • Interfacing
  • Support

An extended explanation of the trade-offs involved on each of the previous items would constitute a book by itself. Nevertheless, some rough general guidelines for estimating computing power requirements can be mentioned for motivating the next sections:

  • A first version of the software must be already written, possible in a relatively low level language like C or modular environment. In this way, profiling information about the most resource-demanding modules can be gathered. It is also important to know the nature of the modules involved and whether it is possible to algorithmically optimize them. For example, frequency analysis can be very efficiently implemented through Fast Fourier Transform algorithms. Many processor architectures include some feature on their instruction set compatible with the kind of optimizations involved for this task. Some of them even dedicated hardware. The same can be said about large matrix multiplications with the help of block processing and caching.
  • Floating point performance can also be an issue if the software is to be implemented in this numerical representation. In short, floating point number representation has qualities that make it desirable for high quality audio processing such as higher dynamic range and smaller error propagation on intermediate calculations. Development time in floating point is usually shorter due to easiness of implementation in comparison to fixed point. Nevertheless, floating point units are expensive and most of the available low power devices perform poorly with this representation, or do not have this possibility at all. This has been changing nonetheless in the recent years, but floating point varies still varies a lot among solutions within the same application range.

Example application on Beagle Board Black

The next section will outline an example application of a PD patch on the Beagle Bone Black. The main aspects of this example should be easily extrapolated to further newer-generation SoCs with Linux that might show better performance. The general outline is as follows:

  • Installation and configuration of a real-time kernel
  • Configuring sound and additional tuning for audio performance
  • Installation/launching of Pure Data and adapting of patches for command line operation (no graphical user interface)
  • General stress/latency measurements and comparison with a non-real-time kernel.

Setup

The setup consists of a Beagle Bone Black (BBB) with the following:

  • AM335x 1GHz ARM® Cortex-A8, 512MB DDR RAM
  • Revision C: 4GB 8-bit eMMC on-board flash storage
  • 3D graphics accelerator, NEON floating-point accelerator, 2x PRU 32-bit microcontrollers

For audio I/O a Saffire USB 2i2 (First Generation) is used. No additional drivers are needed as Linux handles everything. All other configurations are considered default unless noted otherwise.

Installation and configuration of a real-time kernel

The basic, tried and true steps for installing a real-time kernel can be found in [4]: (https://eewiki.net/display/linuxonarm/BeagleBone+Black). The kernel must be cross-compiled from a desktop PC with the gnu ARM compiler. The procedure basically consists in indicating to the operating system the path to the cross compiler and then executing a script that checks out the source code, installs the real-time patches and builds. It can last quite long depending on the computer. The command line sequence is:

export CC=(path-to-compiler)/bin/arm-linux-gnueabihf-

For am33x-rt-v4.4 (Longterm 4.4.x + Real-Time Linux):

 ~/bb-kernel

git checkout origin/am33x-rt-v4.4 -b tmp

After

—————————–

Script Complete

eewiki.net: [user@localhost:~$ export kernel_version=4.4.11-bone-rt-r10]

Good symptoms: LED D2 of board flashes heartbeat after successful initialization.

In order to be able to log in via USB through PuTTy or similar in windows, update /etc/network/interfaces to add virtual Ethernet port:

cat >> /etc/network/interfaces <<EOF

add the following lines:

iface usb0 inet static

 address 192.168.7.2

 netmask 255.255.255.0

 network 192.168.7.0

 gateway 192.168.7.1

EOF

PuTTY config:

host name: 192.168.7.2

port: 22

type: SSH

user: root

pass: root

Troubleshooting:

  • Root/root login not possible:

By default, the SSH server denies password-based login for root.

In /etc/ssh/sshd_config, change:

PermitRootLogin without-password

to

PermitRootLogin yes

There are a couple of similar posts suggesting that this could be a problem with spawning a shell because of incorrect settings for the shell path in /etc/passwd

To check this, determine that your user shell path exists and is executable, for example:

# cat /etc/passwd | grep tomh
 tomh:x:1000:1000:Tom H:/home/tomh:/bin/bash <-- check this exists

Check shell exists:

# file /bin/bash
 /bin/bash: ELF 64-bit

Configuring the sound (ALSA) driver

By default, the Linux distribution on BBB is configured to handle audio via the HDMI connection. So the configuration must be changed in order to redirect audio to (in this case) the USB card:

Execute

alsa –l

to see which number is assigned to USB and then change accordingly in:

vi /etc/modprobe.d/alsa-base.conf

Some other useful commands: alsamixer, aplay –L

mplayer -ao alsa:device=hw=1.0 voice.wav -format s32le

How to find the hardware address of the sound card:

https://en.wikibooks.org/wiki/Configuring_Sound_on_Linux/HW_Address

How to measure latency:

https://ferryzhou.wordpress.com/2012/10/12/linux-alsa-latency-test/

$ wget http://alsa.cybermirror.org/lib/alsa-lib-1.0.26.tar.bz2

$ tar xjvf alsa-lib-1.0.26.tar.bz2

$ cd alsa-lib-1.0.26/test

$ gcc latency.c -lasound -o latency

If it cannot find a header, install libasound2-dev and compile again

$ sudo apt-get install libasound2-dev

$ ./latency -m 256 -r 16000

Other useful reference: http://elinux.org/images/8/82/Elc2011_lorriaux.pdf

Additional tuning for audio performance

The following links will give a deeper insight on the whole process, in case needed:

Frequency Scaling

Reference: http://derekmolloy.ie/changing-the-beaglebone-cpu-frequency/

Newer SoC’s implement a Linux feature (scaling governor) that dynamically scales voltage and frequency according to used resources. If the processor is idle most of the time, the operational frequency decreases to save power.

While this is important for efficiency, it is suboptimal for real-time operation where computational power must be fully exploited within the deadlines audio block processing.

Useful result: When the governor is turned to performance (maximal power), the analog to digital converter (for analog inputs) can be used without problems, otherwise there might be some clicking in the audio signal.

root@beaglebone:~/# cpufreq-info
cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009
Report errors and bugs to cpufreq@vger.kernel.org, please.

analyzing CPU 0:

 driver: generic_cpu0

 CPUs which run at the same hardware frequency: 0

 CPUs which need to have their frequency coordinated by software: 0

 maximum transition latency: 300 us.

 hardware limits: 300 MHz - 1000 MHz

 available frequency steps: 300 MHz, 600 MHz, 800 MHz, 1000 MHz

 available cpufreq governors: conservative, ondemand, userspace, powersave, performance

 current policy: frequency should be within 300 MHz and 1000 MHz.

                 The governor "ondemand" may decide which speed to use

                 within this range.

 current CPU frequency is 300 MHz (asserted by call to hardware).

Then using:

root@beaglebone:~/# cpufreq-set –g performance

Should set the governor to performance and CPU to 1 GHz.

Disabling unneeded services

BeagleBone comes by default with a lot of unnecessary services activated for general purpose development. The most of these services are present on the factory Linux, and should be already be stripped in a fresh install of the Real Time kernel, but nevertheless it is good to double-check.

Useful Result: So far deactivating js.node (if active) can impact also on ADC performance. We definitely do not need java for now.

Here is a bash script that takes care of deactivating common services by default in Linux:

#!/bin/bash

## Stop the ntp service

sudo service ntp stop

## Stop the triggerhappy service

sudo service triggerhappy stop

## Stop the dbus service. Warning: this can cause unpredictable behaviour when running a desktop environment on the RPi

sudo service dbus stop

## Stop the console-kit-daemon service. Warning: this can cause unpredictable behaviour when running a desktop environment on the RPi

sudo killall console-kit-daemon

## Stop the polkitd service. Warning: this can cause unpredictable behaviour when running a desktop environment on the RPi

sudo killall polkitd

## Only needed when Jack2 is compiled with D-Bus support (Jack2 in the AutoStatic RPi audio repo is compiled without D-Bus support)

#export DBUS_SESSION_BUS_ADDRESS=unix:path=/run/dbus/system_bus_socket

## Remount /dev/shm to prevent memory allocation errors

sudo mount -o remount,size=128M /dev/shm

## Kill the usespace gnome virtual filesystem daemon. Warning: this can cause unpredictable behaviour when running a desktop environment on the RPi

killall gvfsd

## Kill the userspace D-Bus daemon. Warning: this can cause unpredictable behaviour when running a desktop environment on the RPi

killall dbus-daemon

## Kill the userspace dbus-launch daemon. Warning: this can cause unpredictable behaviour when running a desktop environment on the RPi

killall dbus-launch

## Uncomment if you'd like to disable the network adapter completely

#echo -n “1-1.1:1.0” | sudo tee /sys/bus/usb/drivers/smsc95xx/unbind

## In case the above line doesn't work try the following

#echo -n “1-1.1” | sudo tee /sys/bus/usb/drivers/usb/unbind




## Set the CPU scaling governor to performance

echo -n performance | sudo tee /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

Disabling kernel modules

Apparently not needed, but in any case it might be useful to have a reference: https://kernel-handbook.alioth.debian.org/ch-modules.html

Disabling virtual capes

Reference: http://wiki.beyondlogic.org/index.php?title=BeagleBoneBlack_Cape_Manager

To display the overlays currently enabled by the cape manager, type:

cat /sys/devices/bone_capemgr.*/slots

0: 54:PF---

1: 55:PF---

2: 56:PF---

3: 57:PF---

4: ff:P-O-L Bone-LT-eMMC-2G,00A0,Texas Instrument,BB-BONE-EMMC-2G

5: ff:P-O-L Bone-Black-HDMI,00A0,Texas Instrument,BB-BONELT-HDMI

uENv is actually on root fs on /boot/uEnv.txt

Installation of Pure Data

Pure Data comes by default with a graphical user interface for patching. Nevertheless, if a patch is going to be executed in a resource-constrained platform, a “headless” version is preferred (no gui).  The patch must be then prepared beforehand and be fully functional. Pure Data can be either compiled from source from the information gathered on the webpage or installed as a ready-compiled package in most Linux distributions. For Debian, the packages can be found via apt-get.

Preparing a PD patch for “headless” runtime

PD features a DSP on/off switch that must be activated each time pure data is initialized. This can –and must- be done within the patch if an auto initialization takes place (i.e. we want our embedded system to be an autonomous effect processor without the need of a command line prompt).

Figure 1: example of a PD patch prepared for runtime without GUI.

From Fig. 1 it can be seen that the short 3-block section to the lower left end takes care of sending an execution order at start “loadbang”, followed by a 1000 ms. delay and a message box that tells PD to activate DSP processing. The delay is placed empirically in order to give some time to PD for initialization.

Performing Measurements

General audio latency measurements can be made via a set of tools available with varying degrees of accuracy.

Round trip latency

The latency test [5] can be used for measuring latency between capture and playback with a round robin scheduler SCHED_RR [6].

root@beaglebone:~/alsa_repo/alsa_lib/alsa-lib-1.1.1/test# ./latency -m 64 -M 64 -P hw:1,0 -C hw:1,0 -p -s 1 -r 48000 -f S32_LE

From [5]:

“When called, the test/latency.c program will attemp to set period/buffer sizes based on the latency entered, starting from -m,–min option (or the default minimum latency = 64 if not specified). If the run succeeds without errors with that setting, the program exits; otherwise, the latency is increased, and the run repeated – if the run is succesful here, then program exits, else the process continues until the -M,–max latency is reached.”

The problem with this approach is that it does not consider system and audio stream stability when the system is under heavy load (CPU or Memory).

Real case-scenario measurement

A good measurement of system performance in an application context could be, for example, running a simple PD patch and some CPU stress program at the same time, and then reducing the audio processing block size at a given sampling rate until audio drops start to occur. This should give an idea of the threshold on overall system performance under heavy computational load.

There are basically two ways PD can handle buffering and latency tradeoffs:

  • Setting the sample rate and audio processing block size will automatically set the buffering time.
  • Setting the sample rate and the buffering time (in ms.) will set the audio processing block size. This is usually the most convenient approach since it is directly associable to latency.

The main block size can be set via command line as a parameter, but multiple block sizes can be used within the same patch for handling different time/frequency resolutions of the DSP algorithms.

An example command line would be:

pd -nogui -alsa -audiooutdev 3 -rt -r 48000 -audiobuf 50 -verbose -stderr -noadc sinewave.pd

Where pd will be called “headless” (no gui), using the alsa driver and the device number 3 for output duties (pd –listdev will give the list of available sound devices), with real-time priority[1] at a sampling rate of 48000, an audio buffer of 50 ms., no analog-digital converter (no input available) and errors redirected to stderr. The patch will output a simple sinewave. Inhibiting the analog-to-digital input when not needed will allow for some smaller buffering (lower latency) without audio thread overruns.

Once the PD patch is running we can use a second terminal to launch some kind of stress system in order to load the CPU, and see how the audio thread responds. A good stress test is provided by the rt-stress test suite [7] [8]:

root@beaglebone:~/rt_test/rt-tests# ./pi_stress --rr –uniprocessor

The priority inversion [9] test pi_stress provides a heavy CPU load within seconds and will immediately affect the scheduling of the audio thread if the scheduling is not properly configured.  The –rr switch means a real time priority of round robin again, SCHED_RR, although SCHED_FIFO can also be used.

Useful Result:  For reference, without a real-time kernel at 50 ms buffering @ 48 kHz in PD, audio starts glitching when pi_test is running. Priority is not paid that much attention by the non RT kernel either.

Measuring under a real time kernel

A real-time patched kernel allows to set scheduling priorities at runtime for individual processes. It is useful then to set everything related to audio (and possible interfacing with sensors when it comes to instruments) to a high priority. Priorities are numbered from 1 to 99, 99 being the highest priority for the scheduler. Additionally, and as explained earlier, two types of real time scheduling methods are possible: SCHED_FIFO and SCHED_RT for each process.

This priority assignment must be done in runtime from within the program or externally. Issuing:

ps -e | grep usb

will give a list of processes currently running, performing a grep search will only list those associated with USB traffic (in this case, the sound card). From this command a process ID can be gathered. If the in/out processes associated with the USB sound card are, say, 68 and 69, executing:

chrt -f -p 98 68

chrt -f -p 98 69

will change these two processes to real-time priority 98. This number can be lower, around 96 or even less and can be tuned by hand depending on the other processes to be scheduled.

PD can also be assigned a higher priority when running in case it is needed. Issuing

pd -nogui -alsa -audioindev 1 -audiooutdev 1 -r 48000 -audiobuf 10 -verbose -stderr -rt sinewave.pd

will hopefully run in a stable manner even with stress tests going on at the same time.

Another stress test possible can be the stress tool for Linux [10]:

  • To spawn N workers spinning on sqrt() function, use the –cpu N option as follows.
  • To spawn N workers spinning on sync() function, use the –io N option as follows.
  • To spawn N workers spinning on malloc()/free() functions, use the –vm N
  • To allocate memory per vm worker, use the –vm-bytes N
  • Instead of freeing and reallocating memory resources, you can redirty memory by using the –vm-keep
  • Set sleep to N seconds before freeing memory by using the –vm-hang N
  • To spawn N workers spinning on write()/unlink() functions, use the –hdd N
  • You can set a timeout after N seconds by using the –timeout N
  • Set a wait factor of N microseconds before any work starts by using the –backoff N option as follows.
  • To show more detailed information when running stress, use the -v
  • Use –help to view help for using stress or view the manpage.

Therefore issuing stress –C 120000 renders the system unresponsive, but if PD is also high priority, the audio thread never overrunsJ. The current system load as well as priorities assigned can be seen with the Linux top [11] command.

References

[1] R. Birkett, “Enhancing Real-time Capabilities with the PRU (Sitara™ ARM® Processors),” 2015.
[2] M. Puckette, “https://puredata.info/,&#8221; [Online]. Available: https://puredata.info/.
[3] M. Puckette, “libpd,” Pure Data pdlib, 2016. [Online]. Available: https://puredata.info/downloads/libpd. [Accessed 13 June 2016].
[4] eewiki.net, “Installing a RT kernel,” [Online]. Available: https://eewiki.net/display/linuxonarm/BeagleBone+Black.
[5] A. Project, “ALSA,” [Online]. Available: http://www.alsa-project.org/main/index.php/Test_latency.c.
[6] P. Krzyzanowski, “Process Scheduling,” Rutgers University, 2015. [Online]. Available: https://www.cs.rutgers.edu/~pxk/416/notes/07-scheduling.html.
[7] C. Williams and J. Kacur, “Cyclictest,” [Online]. Available: https://rt.wiki.kernel.org/index.php/Cyclictest.
[8] F. Rownland, “Using and Understanding the RT Cyclictest Benchmark,” Sony Mobile Communications, [Online]. Available: http://events.linuxfoundation.org/sites/events/files/slides/cyclictest.pdf.
[9] M. Barr, “Introduction to Priority Inversion,” [Online]. Available: http://www.barrgroup.com/Embedded-Systems/How-To/RTOS-Priority-Inversion.
[10] A. Kili, “How to Install ‘stress’ Tool in Linux,” [Online]. Available: http://www.tecmint.com/linux-cpu-load-stress-test-with-stress-ng-tool/.
[11] http://linux.die.net/man/1/top, Writer, top(1) – Linux man page. [Performance].

[1] Real time switch –rt does not seem to be working for some versions of PD. Nevertheless, priority can be externally set with a real time kernel, so it is not really an issue anymore.

Implementing Barberpole Phasing in gen~ (Max4Live device)

13000351_10207530973846721_1059032527979981542_n

I decided to make some space during the weekend to get acquainted with Max/MSP’s gen~ codebox environment. This is something I’ve been meaning to do for some months now, as I was promised an idyllic world of sample-accurate processing and and a “procedural” or at least object-oriented programming environment (as opposed to visual) straight into my DAW of choice (that’s Ableton Live).

I really like the visual programming aspect of the Max/MSP but, to be honest, having learned most of my DSP knowledge in traditional ways and seeing all these boxes and cables inside patches made me cringe every once in a while. Especially knowing that many times the same problem could be solved in a couple lines of good old code. Of course, it was a small price to pay for the joy of experimenting with audio in real time and focusing on the important aspects instead of endless debugging.

Learning by doing is usually the best approach for getting used to a new environment, so I set out to search for a relatively simple -but interesting- problem to implement. On the other hand, it wouldn’t hurt if it ended up being part of my set of audio effects on my current production/performance Live set as well. This paper [1] turned out to be both interesting, current, and not overly complicated.

Overall structure

The barberpole filter is based on a more or less well-studied effect known as the Shepard-Risset glissando. The goal of the authors was to come up with a flanging/phasing effect that generates the same “infinite sweeping” illusion on an input signal by placing adequately gain-scaled cascaded notch filters on the frequencies where the Shepard tone harmonics are placed.

These conveniently placed notch filters have their cut-off frequency one octave apart from one another, so they have a constant separation in the logarithmic frequency scale. Moreover, their cut-off gains respond to an inverted raised cosine envelope across the frequency spectrum to preserve the equivalence to the Shepard tone spectral structure. In order to generate the glissando effect, each notch filter is cyclically swept from its original cut-off frequency to twice its value. If the number of cascaded filters is sufficiently large to cover the majority of the audible spectrum and the transitions in the cyclical sweeping smooth enough, this “infinite sweeping” sensation can be perceived.

I had no reference of implementation other than the formulas on the paper and almost no knowledge of the gen~ syntax, so I decided to take a divide-and-conquer approach with the following steps:

  • Implement a notch filter with the required parametrization (namely cut-off, bandwidth and gain)
  • Cascade instances of this filter with its different parameters in the most painless way possible
  • Implement the sweeping of the filter parameters
  • Solve possible issues stemming from the implementation, such as the continuity of the audio signals when the filters roll back to their original frequency after a sweep cycle (as mentioned in the paper).

These steps should also be accomplished under one additional rule: as much as possible must be implemented in only one instance of the gen~ codebox. The cute Max boxes and patchers should only take care of controlling parameters available to the user. So no cheating by feeding an external line~ object to the gen~ object for the sweeping, or concatenating series of filter objects, for example.

The reason for this narrow-mindedness is just because I want my code to be as close as possible to a C/C++ version, in case I really like the result and decide to port it to a more efficient/multi-platform version.

Implementing the notch filter

There are many variants of a notch filter. Nevertheless, if we want to achieve the raised cosine spectral envelope required for the auditory illusion, filter gain and bandwidths have to be parametrizable across the spectrum. This is the reason why the parametric EQ filter described here is used.

It’s always safe practice to do some plotting first in order to check everything out.

Matlab/Octave code

%octave script
%https://pmdelgado.wordpress.com
clear all;
close all;
Fs = 48000;
Q = 10;
fo = 20;

Lmin=-3;
Lmax=-40;
M=10;
k=1;
K=1
A=zeros(M,3);
B=zeros(M,3);
fc=zeros(M,1);
Lc=zeros(M,1);
H = 1;

for m = 1:M
fc(m,1) = fo*power(2,m-1)*power(2,(k-1));
theta=2*pi*((m-1)*K+(k-1))/(M*K);
Lc(m,1) = Lmin + ((Lmax-Lmin)*(1-cos(theta)))/2;
G = power(10,Lc(m,1)/20);
Gb_square = (1 + power(G,2))/2;
dw=(2*pi*fc(m,1))/(Q*Fs);
beta=sqrt((Gb_square - 1)/(power(G,2) - Gb_square))*tan(dw/2);
a0=(1+G*beta)/(1+beta);
a1=-2*(cos(dw*Q)/(1+beta));
a2=(1-G*beta)/(1+beta);
b1=a1;
b2=(1-beta)/(1+beta);
A(m,:) = [a0 a1 a2]; %x(n-i) coefficients
B(m,:) = [1 b1 b2];
end

%plot overall transfer function

for m = 1:M
N = 1024;
upp = pi;
w = linspace(0, upp, N+1);
w(end) = [];
ze = exp(-1j*w);
H_prev = H;
H = polyval(A(m,:), ze)./polyval(B(m,:), ze); % Evaluate transfer function and take the amplitude
H = H_prev .* H; %cascade filters
Ha = abs(H);
Hdb = 20*log10(Ha/1);
wn = w/upp*(Fs/2);
xlim = ([0 1]);
end
figure
plot(wn, Hdb);
grid on;

Note how the cutoff gain parameters follow an inverse raised cosine spectral envelope (in dB).

freqcascadematlab

Here is a snapshot of the notch filter cascade frequency response for a white noise input in Max/MSP:

filterMAX

Implementing IIR filters in gen~ codebox

gen~ code works at sample level, which means that everything written applies to the current audio sample (or defined buffer) being processed. Feedback loops making use of past samples of an algorithm (input or output) need to be accessed in a special way. According to the documentation, the history or delay operator is the one meant for constructing such feedback loops.

In my case, implementing a cascade of filters that make use of past input and output samples (in these cases up to two previous samples) would mean declaring multiple of these history objects. I found it easier to write every state of the variables used in the filtering process into the so called Data structures, which can be declared as arrays and therefore accommodate many instances of one filter in order to cascade them. In other words, for a causal IIR filter:

y(n) = \sum_{i=0}^{2} a_i x(n-i) - \sum_{i=1}^{2} b_i  y(n-i)

I am using an array of M elements (for M cascaded filters) for each x(n-1), x(n-2), y(n-1),y(n-2).

Data x_n1(10);
Data x_n2(10);
Data y_n1(10);
Data y_n2(10);

And of course for each input and output of m \in M filters. Samples can be recalled, calculated and stored through (index = 1…M is the filter number):

 //recurrence equation (actual filtering)

out_filter_prev = a0 * in_filter.peek(index) - a1 * x_n1.peek(index) + a2 * x_n2.peek(index) + b1 * y_n1.peek(index) - b2 * y_n2.peek(index);

//write output

out_filter.poke(out_filter_prev,index);

//feedback part
//respect time order
memory2 = x_n1.peek(index); // x(n-2)
memory4 = y_n1.peek(index); // y(n-2)
memory1 = in_filter.peek(index); // x(n-1)
memory3 = out_filter_prev; // y(n-1)

// store

poke(y_n2, memory4,index); //y(n-2)
poke(y_n1, memory3,m); //y(n-1)
poke(x_n2, memory2,index); //x(n-2)
poke(x_n1, memory1,index); //x(n-1)

Cascading can  be done via a for loop that is executed at sample rate (m and index are used interchangeably because I am lazy).

// cascade filters
for (m=1; m = M; m += 1)
{
	out_filter_temp = out_filter.peek(m-1);
	in_filter.poke(out_filter_temp,m);
}

Sweeping filter frequencies

Filter frequencies are swept cyclically at a -possibly- user-defined rate. This can actually be implemented as low frequency ramp generator (at around a 0.1 Hz ballpark).

If each filter frequency is defined as

fc = fo*pow(2,m-1)*pow(2,(k-1))

where k is the time index and m is the filter-number index as before, then a ramp generator at a rate LFOrate can be defined as:

turn_around = 1;
LFOinc = LFOrate/Fs; // calc LFO rate
LFOpointer += LFOinc;
if (LFOpointer < turn_around)
{
LFOpointer = 0;
}
LFOpointer = mod(LFOpointer,turn_around); // wrap
k= LFOpointer;

The LFO wave resets when the pointer reaches the turn_around value. The filter gains are also dependent of k and m in a similar way in order to form the raised cosine spectral envelope.

Time varying filters and complexity

Time-varying filters mean that the set of filter coefficients need to be updated in time. Depending on filter architecture and order, if the filters are updated at sample rate Fs, it can increase the complexity greatly. In our case the filter structure is already more or less complicated given that multiple divisions and trigonometric functions have to be calculated for the coefficients. I measured the first implementation  (M=10) to be around 50% of the CPU usage on an Intel i5 architecture. One alternative would be to specifically optimize the divisions and trigonometric functions involved (there are, after all, fast versions of the trigonometric functions in Max). The other alternative would be to use pre-calculated tables, given that we only need to know the filter coefficients for one cycle of each filter. The M4L device and gen~ code does not contemplate these alternatives.

Finer Details, avoiding clicks in audio

Experience taught me that the difference between a good audio plugin/processor/algorithm and a bad one relies on the details. Really. Anyone can implement more or less seamlessly a couple equations and stick them into some kind of framework for audio processing, whether it’s VST, RTAS, CSound, Max/Pure Data externals or similar. Auditory perception is very unforgiving to small glitches in an imperfectly implemented processing routine, and that is what makes audio processing a trade only truly mastered by few.  Any inconsistency coming from not fully understanding interaction between asynchronous user or computer control signals and audio stream derives into quite unpleasing effects when listening to the output.

In our case, and as explained in the reference paper, audio glitches can occur at the point in time when the frequency sweeping cycle resets. Since each notch filter sweeps from a given frequency to twice its value (one octave higher), the leap from 2*f to f causes clicks in audio if it’s in the range of the audible frequencies. In this case, the clicking can give a perceptual cue to the ear that the cycle is resetting and therefore break the “infinite sweeping” illusion, rendering the effect useless.

There have been many methods developed for avoiding clicking in time varying filters with different trade-offs and complexity levels. The method mentioned in the paper seems to be a really efficient one, given that the state-space representation of the filter is known.

I tested my version with white noise (noisy signals like snare sounds work best with this effect) and if the filter notches extend sufficiently near the Nyquist frequency, most of the clicking is forgiven. Nevertheless, in order to make a quality effect, some kind of transient suppression described on the links should be implemented. Again, due to lack of time (this was only meant to be one weekend) I was not able to do it properly.

Since I ran out of time for this post, the efficient, click-less implementation “is left as an exercise to re reader”.

Max Device:

http://www.maxforlive.com/library/device/3606

Sound example:

References

[1] F. Esqueda, V. Välimäki and J. Parker, “Barberpole phasing and flanging illusions”, in Proc. 18th Int. Conf. Digital Audio Effects (DAFx-15), Trondheim, Norway, Nov. 30 – Dec. 3, 2015, pp. 87-94.