I was part of the team that produced the first embedded MPEG-H audio encoder of Fraunhofer. In particular, I was in charge of the fixed-point architecture integration of the core audio encoder and audio quality testing.
The contribution encoder also got the “Best of Show” Award at NAB 2018:
Photo credit: Manu Casir (https://www.instagram.com/manucasir)
This track is unreleased.
Here’s something I wrote a couple years ago. Better here than lost in my hard drive. Cheers. (Note: this was before Bela)
Up until recently, the only way to efficiently implement audio processors on low power platforms was to use a modular approach of separated embedded devices, with each module taking care of a specific function (i.e. user interface, signal processing or memory management) by making use of an accordingly optimized architecture. The trade-off to this very efficient configuration is the difficulty of portability or updates, low level development and lack of flexibility.
Recent embedded devices manufactured under the category of System on Chip (SoC) offer the possibility of running operating systems akin to those on desktop PCs or Laptops, with a highly developed interface for its peripherals. System integration for a possible audio processor/synthesizer is therefore more straightforward. Digital signal processing algorithms can be developed on a higher layer and, on one hand -since its interaction with the processing core and peripherals is already handled by the operating system- its portability becomes easier.
On the other hand, these devices are not optimized for real-time audio processing, at least by default. Task schedulers on these general-purpose operating systems are designed to maximize CPU usage within the available processing power. Therefore the audio thread –and any other task handled by the OS- will be scheduled in order to maximize throughput and to cause minimum idle processor time among tasks. This isn’t necessarily compatible with the notion of real-time processing, where an audio frame needs to be fully treated before the next one comes. If this deadline is not met, audio drops will occur. Under a default configuration, the task scheduler will accept not meeting the audio deadline as long as the CPU resource usage is deemed optimal in a statistical way. This kind of scheduling might be enough and not cause any drops on higher power systems (i.e. desktop/laptop computers) under sufficient buffering, but low-power systems will easily show the limitations of this scheduling mechanism .
In order to overcome this problem, the task scheduling of the operating system needs to be modified in order to give a designated thread the highest priority. The scheduling mechanism has to be designed to use all resources available to meet the real time deadline of the given thread (i.e. audio), even if the resource use is not optimally distributed among all available tasks.
In recent years, a modification of the Linux Kernel to support full task preemption (RT PREEMPT) has emerged, where the behavior described in the previous paragraph can take place \cite. Although initially meant for industrial control applications, its use for audio applications has proven to give new perspectives on stability not reached before for this kind of general-purpose OS processors.
Pure Data (PD) is a modular signal processing system created by Miller Puckette . Its main focus is to implement signal processing and synthesis of audio and multimedia streams for artistic purposes. Although in later years there has been a divergence in the development’s direction, Pure Data bears a great amount of similarity to its analog visual programming environment Max/MSP (also created by Puckette himself). To a certain point, Pure Data could be considered as a free, open-source version of Max/MSP. Some of the reasons for using PD are:
Choosing the right device for implementation requires a relatively extended feasibility study of the different choices available in the market. There are many topics that need to be considered, amongst them are, for example:
An extended explanation of the trade-offs involved on each of the previous items would constitute a book by itself. Nevertheless, some rough general guidelines for estimating computing power requirements can be mentioned for motivating the next sections:
The next section will outline an example application of a PD patch on the Beagle Bone Black. The main aspects of this example should be easily extrapolated to further newer-generation SoCs with Linux that might show better performance. The general outline is as follows:
The setup consists of a Beagle Bone Black (BBB) with the following:
For audio I/O a Saffire USB 2i2 (First Generation) is used. No additional drivers are needed as Linux handles everything. All other configurations are considered default unless noted otherwise.
The basic, tried and true steps for installing a real-time kernel can be found in : (https://eewiki.net/display/linuxonarm/BeagleBone+Black). The kernel must be cross-compiled from a desktop PC with the gnu ARM compiler. The procedure basically consists in indicating to the operating system the path to the cross compiler and then executing a script that checks out the source code, installs the real-time patches and builds. It can last quite long depending on the computer. The command line sequence is:
export CC=(path-to-compiler)/bin/arm-linux-gnueabihf- For am33x-rt-v4.4 (Longterm 4.4.x + Real-Time Linux): ~/bb-kernel git checkout origin/am33x-rt-v4.4 -b tmp
Script Complete eewiki.net: [user@localhost:~$ export kernel_version=4.4.11-bone-rt-r10]
Good symptoms: LED D2 of board flashes heartbeat after successful initialization.
In order to be able to log in via USB through PuTTy or similar in windows, update /etc/network/interfaces to add virtual Ethernet port:
cat >> /etc/network/interfaces <<EOF
add the following lines:
iface usb0 inet static address 192.168.7.2 netmask 255.255.255.0 network 192.168.7.0 gateway 192.168.7.1 EOF
host name: 192.168.7.2
By default, the SSH server denies password-based login for root.
In /etc/ssh/sshd_config, change:
PermitRootLogin without-password to PermitRootLogin yes
There are a couple of similar posts suggesting that this could be a problem with spawning a shell because of incorrect settings for the shell path in /etc/passwd
To check this, determine that your user shell path exists and is executable, for example:
# cat /etc/passwd | grep tomh tomh:x:1000:1000:Tom H:/home/tomh:/bin/bash <-- check this exists
Check shell exists:
# file /bin/bash /bin/bash: ELF 64-bit
By default, the Linux distribution on BBB is configured to handle audio via the HDMI connection. So the configuration must be changed in order to redirect audio to (in this case) the USB card:
to see which number is assigned to USB and then change accordingly in:
Some other useful commands: alsamixer, aplay –L
mplayer -ao alsa:device=hw=1.0 voice.wav -format s32le
$ wget http://alsa.cybermirror.org/lib/alsa-lib-1.0.26.tar.bz2 $ tar xjvf alsa-lib-1.0.26.tar.bz2 $ cd alsa-lib-1.0.26/test $ gcc latency.c -lasound -o latency
If it cannot find a header, install libasound2-dev and compile again
$ sudo apt-get install libasound2-dev $ ./latency -m 256 -r 16000
Other useful reference: http://elinux.org/images/8/82/Elc2011_lorriaux.pdf
The following links will give a deeper insight on the whole process, in case needed:
Newer SoC’s implement a Linux feature (scaling governor) that dynamically scales voltage and frequency according to used resources. If the processor is idle most of the time, the operational frequency decreases to save power.
While this is important for efficiency, it is suboptimal for real-time operation where computational power must be fully exploited within the deadlines audio block processing.
Useful result: When the governor is turned to performance (maximal power), the analog to digital converter (for analog inputs) can be used without problems, otherwise there might be some clicking in the audio signal.
cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009
Report errors and bugs to email@example.com, please. analyzing CPU 0: driver: generic_cpu0 CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: 300 us. hardware limits: 300 MHz - 1000 MHz available frequency steps: 300 MHz, 600 MHz, 800 MHz, 1000 MHz available cpufreq governors: conservative, ondemand, userspace, powersave, performance current policy: frequency should be within 300 MHz and 1000 MHz. The governor "ondemand" may decide which speed to use within this range. current CPU frequency is 300 MHz (asserted by call to hardware).
root@beaglebone:~/# cpufreq-set –g performance
Should set the governor to performance and CPU to 1 GHz.
BeagleBone comes by default with a lot of unnecessary services activated for general purpose development. The most of these services are present on the factory Linux, and should be already be stripped in a fresh install of the Real Time kernel, but nevertheless it is good to double-check.
Useful Result: So far deactivating js.node (if active) can impact also on ADC performance. We definitely do not need java for now.
Here is a bash script that takes care of deactivating common services by default in Linux:
#!/bin/bash ## Stop the ntp service sudo service ntp stop ## Stop the triggerhappy service sudo service triggerhappy stop ## Stop the dbus service. Warning: this can cause unpredictable behaviour when running a desktop environment on the RPi sudo service dbus stop ## Stop the console-kit-daemon service. Warning: this can cause unpredictable behaviour when running a desktop environment on the RPi sudo killall console-kit-daemon ## Stop the polkitd service. Warning: this can cause unpredictable behaviour when running a desktop environment on the RPi sudo killall polkitd ## Only needed when Jack2 is compiled with D-Bus support (Jack2 in the AutoStatic RPi audio repo is compiled without D-Bus support) #export DBUS_SESSION_BUS_ADDRESS=unix:path=/run/dbus/system_bus_socket ## Remount /dev/shm to prevent memory allocation errors sudo mount -o remount,size=128M /dev/shm ## Kill the usespace gnome virtual filesystem daemon. Warning: this can cause unpredictable behaviour when running a desktop environment on the RPi killall gvfsd ## Kill the userspace D-Bus daemon. Warning: this can cause unpredictable behaviour when running a desktop environment on the RPi killall dbus-daemon ## Kill the userspace dbus-launch daemon. Warning: this can cause unpredictable behaviour when running a desktop environment on the RPi killall dbus-launch ## Uncomment if you'd like to disable the network adapter completely #echo -n “1-1.1:1.0” | sudo tee /sys/bus/usb/drivers/smsc95xx/unbind ## In case the above line doesn't work try the following #echo -n “1-1.1” | sudo tee /sys/bus/usb/drivers/usb/unbind ## Set the CPU scaling governor to performance echo -n performance | sudo tee /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
Apparently not needed, but in any case it might be useful to have a reference: https://kernel-handbook.alioth.debian.org/ch-modules.html
To display the overlays currently enabled by the cape manager, type: cat /sys/devices/bone_capemgr.*/slots 0: 54:PF--- 1: 55:PF--- 2: 56:PF--- 3: 57:PF--- 4: ff:P-O-L Bone-LT-eMMC-2G,00A0,Texas Instrument,BB-BONE-EMMC-2G 5: ff:P-O-L Bone-Black-HDMI,00A0,Texas Instrument,BB-BONELT-HDMI
uENv is actually on root fs on /boot/uEnv.txt
Pure Data comes by default with a graphical user interface for patching. Nevertheless, if a patch is going to be executed in a resource-constrained platform, a “headless” version is preferred (no gui). The patch must be then prepared beforehand and be fully functional. Pure Data can be either compiled from source from the information gathered on the webpage or installed as a ready-compiled package in most Linux distributions. For Debian, the packages can be found via apt-get.
PD features a DSP on/off switch that must be activated each time pure data is initialized. This can –and must- be done within the patch if an auto initialization takes place (i.e. we want our embedded system to be an autonomous effect processor without the need of a command line prompt).
Figure 1: example of a PD patch prepared for runtime without GUI.
From Fig. 1 it can be seen that the short 3-block section to the lower left end takes care of sending an execution order at start “loadbang”, followed by a 1000 ms. delay and a message box that tells PD to activate DSP processing. The delay is placed empirically in order to give some time to PD for initialization.
General audio latency measurements can be made via a set of tools available with varying degrees of accuracy.
The latency test  can be used for measuring latency between capture and playback with a round robin scheduler SCHED_RR .
root@beaglebone:~/alsa_repo/alsa_lib/alsa-lib-1.1.1/test# ./latency -m 64 -M 64 -P hw:1,0 -C hw:1,0 -p -s 1 -r 48000 -f S32_LE
“When called, the test/latency.c program will attemp to set period/buffer sizes based on the latency entered, starting from -m,–min option (or the default minimum latency = 64 if not specified). If the run succeeds without errors with that setting, the program exits; otherwise, the latency is increased, and the run repeated – if the run is succesful here, then program exits, else the process continues until the -M,–max latency is reached.”
The problem with this approach is that it does not consider system and audio stream stability when the system is under heavy load (CPU or Memory).
A good measurement of system performance in an application context could be, for example, running a simple PD patch and some CPU stress program at the same time, and then reducing the audio processing block size at a given sampling rate until audio drops start to occur. This should give an idea of the threshold on overall system performance under heavy computational load.
There are basically two ways PD can handle buffering and latency tradeoffs:
The main block size can be set via command line as a parameter, but multiple block sizes can be used within the same patch for handling different time/frequency resolutions of the DSP algorithms.
An example command line would be:
pd -nogui -alsa -audiooutdev 3 -rt -r 48000 -audiobuf 50 -verbose -stderr -noadc sinewave.pd
Where pd will be called “headless” (no gui), using the alsa driver and the device number 3 for output duties (pd –listdev will give the list of available sound devices), with real-time priority at a sampling rate of 48000, an audio buffer of 50 ms., no analog-digital converter (no input available) and errors redirected to stderr. The patch will output a simple sinewave. Inhibiting the analog-to-digital input when not needed will allow for some smaller buffering (lower latency) without audio thread overruns.
Once the PD patch is running we can use a second terminal to launch some kind of stress system in order to load the CPU, and see how the audio thread responds. A good stress test is provided by the rt-stress test suite  :
root@beaglebone:~/rt_test/rt-tests# ./pi_stress --rr –uniprocessor
The priority inversion  test pi_stress provides a heavy CPU load within seconds and will immediately affect the scheduling of the audio thread if the scheduling is not properly configured. The –rr switch means a real time priority of round robin again, SCHED_RR, although SCHED_FIFO can also be used.
Useful Result: For reference, without a real-time kernel at 50 ms buffering @ 48 kHz in PD, audio starts glitching when pi_test is running. Priority is not paid that much attention by the non RT kernel either.
A real-time patched kernel allows to set scheduling priorities at runtime for individual processes. It is useful then to set everything related to audio (and possible interfacing with sensors when it comes to instruments) to a high priority. Priorities are numbered from 1 to 99, 99 being the highest priority for the scheduler. Additionally, and as explained earlier, two types of real time scheduling methods are possible: SCHED_FIFO and SCHED_RT for each process.
This priority assignment must be done in runtime from within the program or externally. Issuing:
ps -e | grep usb
will give a list of processes currently running, performing a grep search will only list those associated with USB traffic (in this case, the sound card). From this command a process ID can be gathered. If the in/out processes associated with the USB sound card are, say, 68 and 69, executing:
chrt -f -p 98 68 chrt -f -p 98 69
will change these two processes to real-time priority 98. This number can be lower, around 96 or even less and can be tuned by hand depending on the other processes to be scheduled.
PD can also be assigned a higher priority when running in case it is needed. Issuing
pd -nogui -alsa -audioindev 1 -audiooutdev 1 -r 48000 -audiobuf 10 -verbose -stderr -rt sinewave.pd
will hopefully run in a stable manner even with stress tests going on at the same time.
Another stress test possible can be the stress tool for Linux :
Therefore issuing stress –C 120000 renders the system unresponsive, but if PD is also high priority, the audio thread never overrunsJ. The current system load as well as priorities assigned can be seen with the Linux top  command.
|||R. Birkett, “Enhancing Real-time Capabilities with the PRU (Sitara™ ARM® Processors),” 2015.|
|||M. Puckette, “https://puredata.info/,” [Online]. Available: https://puredata.info/.|
|||M. Puckette, “libpd,” Pure Data pdlib, 2016. [Online]. Available: https://puredata.info/downloads/libpd. [Accessed 13 June 2016].|
|||eewiki.net, “Installing a RT kernel,” [Online]. Available: https://eewiki.net/display/linuxonarm/BeagleBone+Black.|
|||A. Project, “ALSA,” [Online]. Available: http://www.alsa-project.org/main/index.php/Test_latency.c.|
|||P. Krzyzanowski, “Process Scheduling,” Rutgers University, 2015. [Online]. Available: https://www.cs.rutgers.edu/~pxk/416/notes/07-scheduling.html.|
|||C. Williams and J. Kacur, “Cyclictest,” [Online]. Available: https://rt.wiki.kernel.org/index.php/Cyclictest.|
|||F. Rownland, “Using and Understanding the RT Cyclictest Benchmark,” Sony Mobile Communications, [Online]. Available: http://events.linuxfoundation.org/sites/events/files/slides/cyclictest.pdf.|
|||M. Barr, “Introduction to Priority Inversion,” [Online]. Available: http://www.barrgroup.com/Embedded-Systems/How-To/RTOS-Priority-Inversion.|
|||A. Kili, “How to Install ‘stress’ Tool in Linux,” [Online]. Available: http://www.tecmint.com/linux-cpu-load-stress-test-with-stress-ng-tool/.|
|||http://linux.die.net/man/1/top, Writer, top(1) – Linux man page. [Performance].|
 Real time switch –rt does not seem to be working for some versions of PD. Nevertheless, priority can be externally set with a real time kernel, so it is not really an issue anymore.
I flew every weekend of April.
The camera of my iPhone 4s suddenly came to life after almost a year of malfunctioning and I could film this short bit. Decided to make a pause and put some improvised music.
One of my latest tracks on 8day records (Montreal)