Scalable Bioacoustics Pre-Processing - Browse /ScalableBioacousticsPreProcessing at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
src	2018-04-10		0
LICENSE.txt	2018-04-10	1.0 kB	0
README.txt	2018-04-10	3.0 kB	0
Totals: 3 Items		4.0 kB	0

Source code for the paper: "Scalable preprocessing of high volume bird acoustic data" (currently pending acceptance)

Required libraries:

Apache Commons Math (http://commons.apache.org/proper/commons-math/)
SoX (http://sox.sourceforge.net/sox.html)
JLayer for converting MP3s to wavs (http://www.javazoom.net/javalayer/javalayer.html)
To split long MP3 files (by default, to 30 minute chunks in our code, which are further split at later points), you will also need to download mp3splt (http://mp3splt.sourceforge.net/mp3splt_page/home.php).

(Converting and MP3 splitting were not factored into execution times in the research paper)

To execute source code:

The main classes to consider are StartServerMultithread, StartClientMultithread, and ClientServerTest.

StartClientMultithread should be ran by slaves (the classes should really be StartSlaveMultithread, StartMasterMultithread, etc.) and the master should either run StartServerMultithread for one execution, or ClientServerTest to run repeated executions for multiple tests.

The IP address and port numbers need to be changed. A total of 7 consecutive port numbers are used and can be changed in the FileServer class.

StartClientMultithread contains the reference to the IP address and the first of the port numbers for connecting to the master.

No intermediate files should be present at the beginning of execution. The only required files should be the full length audio files (either MP3, which gets converted, or wav) in the folder in which execution is taking place. If a slave and master are going to run on the same machine, execute StartServerMultithread or ClientServerTest in one folder, and StartClientMultithread in another folder.

For the purposes of decreasing execution time, C4.5 trees were hardcoded from a derived classification tree, which is in the class CicadaRainFilter. You will ]need to provide your own training data to generate a C4.5 tree and code this again. To do this, execute the (misleadingly named) SNRTest class to write an arff file containing the acoustic indices and use this as input in Weka (https://www.cs.waikato.ac.nz/ml/weka/). The class is set up so that the arff file can be labelled based on which folders you put files in (e.g. if you put all files containing rain in one folder, you can set up the code so that these files are all labelled as containing rain). Typically you'll want to delete the first attribute in Weka, which is the file name. While we cannot provide source audio (it's not ours), we can provide the generated arff files which contain training data used in testing. Each classifcation task is considered seperately, with the opposing class being removed during classification (e.g. 'Rain' is not treated as an attribute for 'Cicada' detection.

Many classes here are for secondary tests, such as sequential execution, and individual processes. There are several methods that were used in earlier versions of the code, but are not used in the final version.

Source: README.txt, updated 2018-04-10

Scalable Bioacoustics Pre-Processing Files