Menu

What is SoundComp

Jan

SoundComp - A Sound Compiler

  1. What is SoundComp?

1.1. What is sound?

There are quite a lot of possible explanations on what sound is.

For our purposes and a quick start, it can be quite easy. Sound is a variation in pressure (or displacement) of air.
More exactly, sound is air oscillating around a mean position or mean pressure with a frequency in the range of 20Hz to 20000Hz.1Hz here means "one occurrence per second".

The ear interprets the movement roughly as a superposition of sinus oscillations. It is well possible for the oscillations to contain partial oscillations outside the mentioned range - these will usually not be noticed and can be disregarded (not exactly true for low frequency components of very high amplitude, but this movement is usually not noticed as sound (with the ear) but vibration (with possibly other parts of the human body)).

1.1.1. Electronic sound representation

Many electronic devices are used to operate on sound. The most intuitive ones would be loudspeakers and microphones, which serve as a kind of "adapter" between the mechanical (air-moving) and electronic world.

On the electronic side, sound is usually represented as an oscillation of a voltage or current around a mean voltage level or around a mean current (often 0V or 0A). You may envision this by speaking into a microphone connected to an oscilloscope, or by quickly connecting/disconnecting a battery to a loudspeaker (actually you are not encouraged to do the latter, it may drain the battery quickly and in case of strong batteries and weak speakers could ruin the speaker).

Other electronic sound equipment would be amplifiers (which change the amplitude of the signal), tape recorders (which kind of "store" and "read" a representation of the signal on magnetic tape) and so on.

All electronic sound devices are usually expected to assume a linear relation between the represented air elongation and the voltage/current they operate on. Like the air elongation inevitably is, the function of the voltage (we will disregard currents from here, viewing at currents does not add any value to the discussion from this point on) over time is a continuous curve.

1.1.2. Digital sound representation

Modern digital equipment operates on discrete numbers, not on continuous curves. To be able to work on sound with digital equipment like computers, we therefore need a representation with discrete numbers. Additionally, computers operate at finite speed. Therefore, the rate of numbers that can be operated on is limited.

In the digital world, sound therefore is represented as a sequence of numbers. This is done in a way that each number represents a certain voltage level.

In the time domain, the numbers are usually and ideally equidistant. The rate of numbers must be high enough that a 20kHz oscillation can be modeled without loss. According to the Shannon theorem, this is only possible if the rate of numbers is at least double the frequency, i.e. 40kHz. Audio CDs are digital storage media that contain an encoded sequence of numbers at a constant rate of 44.1kHz. DAT tapes and other similar equipment use a rate of 48kHz. Most professional equipment uses one of these frequencies or an integer multiple of these.

In the value domain, the relation of number to represented voltage should be ideally absolutely linear.
Both quantizations, in time domain and in value domain, lead to errors - the number stream cannot exactly represent the original signal. This quantization error introduces a certain noise. Theory (and practice) shows that the error is usually low enough with a number size of 16 bits (i.e. 65536 different possible numbers). Before quantization, one has to make sure that no higher frequencies than half the sampling rate is contained in the signal, otherwise the number stream contains "phantom signals" that is sometimes also called "mirror frequencies" due to their spectral position.

1.1.3. Digital sound storage

Sound can be stored in computer files. In the simplest case, such a file would just contain the sequence of numbers in any form. Due to the sample frequency of at least 40kHz, these files quickly become large. Therefore a lot of different compression methods have been invented to reduce the necessary storage capacity (or transmission bandwidth in case the sound has to be transmitted in any way).

1.1.4 Digital sound generation

A program generating sound could generate a sequence of numbers and feed it to a device that converts the numbers to a voltage curve again (such a devices is called a "digital to analog converter, abbreviated DAC, and is contained in almost every piece of computer sound hardware).

This would require the program to operate fast enough that it always can generate the next number in real time before the converter needs it (as the converter needs to be supplied the numbers at the ideally constant sample rate of e.g. 40kHz). For simple algorithms this is possible for

most computers.

If the algorithm gets more complex, it can be difficult or impossible to promise a minimum generation rate. Such algorithms can feed the numbers into files that store the generated signal for later playback.

1.2. What is a compiler?

A compiler is a computer program hat converts a program written in one computer language into another representation of usually lower abstraction level. For example a C compiler converts a program from a textual representation in the C programming language (in which computer actions can be described at a certain level of abstraction from the processor) into a form that is executable by a micro processor (or at least very near to the latter form).

1.3 So what is SoundComp finally?

SoundComp is a program that will convert a textual description of a piece of sound into files containing sequences of numbers representing the sound, for playback with regular sound playback programs.

Like for a C compiler, the description of the actions is forced to be in the C programming language, the text files that SoundComp is going to compile will also have to be written in a certain language. This language of course is not a regular programming language but will contain elements that make sense in the domain of generating and manipulating sound.

.2. Main building blocks of SoundComp

SoundComp consists mainly of three blocks:

  1. text input analysis
  2. sound structure synthesis (control logic)
  3. sound data synthesis

.2.1. [Text input analysis]

Like every compiler, SoundComp must read an input text and analyze it. This part is currently done with jflex and byacc/j.
These generate a data structure (a parser tree) that represents the input text in a form that is more suitable for use by the control logic.

.2.2. [Control logic]

From the parser tree, the control logic needs to generate a network of sound processing elements that will be used to finally generate the sound, and the timestamps of certain events controlling these sound processing elements. (This is the part that currently is lacking most work).

.2.3. [Sound data synthesis]

Sound data synthesis is the process of generating a stream of numbers representing the voltage of the oscillation at equidistant times.
It was mentioned before that a basic form of oscillation is a sine curve. A sine curve is the solution of a (second order) differential equation, so quite a lot of generating algorithms are based on numerically solving differential equations. More precisely, we calculate difference equations, not differential equations, due to the time-quantized nature of the signals. There is plenty of literature available on the net about the most important sound processing elements.
Almost all processing elements take one or more input number streams and generate one output number stream or a set of output number streams based on various forms of calculation on the current and past input values. All connected input streams get their numbers either from other processing elements or from the control logic.

To be more precise: this processing elements network is a state machine calculating the next state from the previous inner state and the input from the control logic. If all calculated outputs were immediately available to the connected inputs, the outcome would be depending on the order of evaluation, but the user should and will not have to care about order of evaluation. Besides, in a state machine there can be circular dependencies, for which case no "good" order of evaluation can be specified.
This is not what is intended. Therefore calculation of the next state is a two-step process:
First, calculate the next output of each element on the current input, but without actually propagating the result to the output. This way, the order does not affect the result. After all "next output value"s have been calculated, in a second iteration let each element actually propagate the already calculated value to its output (no actual calculation takes space in the second phase). This way the next state only depends on current state and inputs, but not on the order of calculation of the elements.
To make it even more complicated: There are elements which really need no delay. A simple "plus" element just adding two values does not need to have an inner state. The outcome can be determined by just looking at the inputs in real time - as long as there is no way from the input back to the output in the calculating chain (which would incur an endless recursion). To mitigate the delay problem, such simple math elements (addition, multiplication) therefore just forward their inputs to the outputs in the calculation phase and do nothing in the propagation phase - which makes it illegal to build circular structures of them. This restriction does not really hurt: endless recursion in summing/multiplying values either leads to infinite values (which we don't want to have anyway) or usually have a short-cut way of being calculated with a finite step of operations.

There is one drawback that SoundComp has to live with by being implemented this way: Each processing element (besides very simple math) incurs a delay of exactly one sample period. Even worse: Two parallel processing element chains of different length incur different delay (which makes the outcome different from what would happen in a non-delaying analog implementation). To cope with such situation (and in the hope that it rarely matters): It is always possible to adjust the delays by adding elements that only serve for delaying.

.3. Using SoundComp

SoundComp is planned to be a java library as well as a java executable. There may be a kind of GUI, but since it is merely a compiler, i.e. generating files from other files, having a GUI is not really important. SoundComp will offer a simple API that you can employ, and a command line interface for users that don't want to write a java program. The minimal GUI it probably will have exists for purposes of demonstration and should not limit you in the way you want to use it.
We may during development decide to shift more functionality to C++, in which case the java interface may become superfluous for us. We plan to continue to support java even in this case.

Real-time playback capability using the sound hardware of the computer might be added later on. Another option would be making SoundComp compatible with other sound software, by providing a VST interface or similar. Both of course require that SoundComp is really able to provide data fast enough (faster than real time). This depends on the complexity of the network of processing elements, so probably not everything will be possible in such a setup, depending on computational power.

.4. Developing SoundComp

SoundComp is a mixed-language program. It uses Java, C/C++, byacc/j and JFlex. The native parts are designed to be as widely portable as we can test (big and little endian, 32 and 64 bits, Windows, MacOSX, linux, BSD, Solaris...). You therefore should be able to build and run SoundComp on any platform that java is supported on.
It will continue to be mixed language for longer time as there are parts that I consider easier to write this way.
We do not test building each commit on all platforms for our limited resources. It may and will therefore happen that on some platforms a build will not work 'out of the box' but may require certain tweaks to the buildfiles (but usually not to the code). If you find a problem on a certain platform, we are eager to learn abut it.

On windows and linux, we support building in eclipse. These and other platforms can use ant buildfiles from the shell. Small 'configure' scripts show what steps might need to be taken before building. We do not commit the buildfiles in place in the version control system, as they tend o be cluttered with installation specific settings that should not get committed with code changes. We therefore use out-of-place copies that the configure scripts cop ver to the working place. These originals tend to get out of date on rarely tested platforms.

The eclipse buildfiles also are specific to installation and eclipse version to a certain extent. Do not commit buildfiles from an up-to-date eclipse as this may make the files unusable for platforms that aren't provided with bleeding-edge eclipse versions.
The same goes for ant files, svn adapters etc.

SoundComp is mixed-language Java and C++, which means there is a JNI layer. This is boiler plate code intensive. We got rid of a large amount of it by using macros excessively, but the macros themselves remain as boiler plates.

C++ is currently only used in the signal processing elements. There are plenty of examples of elements that evaluate the state in C++, others in Java. The way this is done, neither the control logic nor the parser nor the other processing elements need to know whether an element is coded in C++ or Java. Still it is avoided to jump "back and forth" between the languages between elements that use the same language as such language switches impose a performance penalty. It is planned to finally code the whole signal processing part in C++, but for some elements it is easier to do it platform independently in Java, so Java will always remain as a rapid prototype escape. Switching control logic or parser to C++ is not planned for the foreseeable future, although a pure C++ implementation might be useful in the far future.

The process elements contain a method giving a hint whether they are preferably called in java or C++. This way the control logic will have options to further optimize the number of language transitions - but this is really only an optimization hint, any element will always work being called from either language.

Presently SoundComp is a single threaded program. Parallelization would be possible to a certain extent: since the processing elements are evaluated in a way that should be independent of the order of evaluation, it theoretically should be possible to evaluate them in parallel. There are no concrete plans for trying this yet. Pitfalls may lie in the non-delaying elements, and it is unclear if such a high context switching rate (every calculation needs to stop at the next sample, i.e. after the next ~20µs of calculated sound) is really helpful. It may turn out that spreading the calculation over several therads becomes so expensive that the switching costs more time than it gains by using several cores, especially when we consider that most elements only consist of very few, simple steps. Maybe some more complex elements may become worthy candidates for parallelization. Or we might end up grouping certain elements to thread groups, where all elements in a group are processed sequentially but all the groups in parallel. The algorithm on how to ideally spread the elements over the groups would be an interesting field of research then.


Related

Wiki: Control logic
Wiki: Home
Wiki: Sound data synthesis
Wiki: Text input analysis

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.