AEO-Light is an open-source software that extracts audio from optical sound tracks of motion picture film. AEO-Light is produced at the University of South Carolina by a team comprising faculty and staff from the University Libraries’ Moving Image Research Collections (MIRC) and the College of Arts and Science’s Interdisciplinary Mathematics Institute (IMI). Project funding comes from the Preservation and Access Division of the National Endowment for the Humanities. AEO-Light is available through an open-source licensing agreement. The complete terms are available in the AEO-Light “ReadMe” file and in the “About” menu.
AEO-Light extracts audio from film scans that meet the following requirements:







AEO-Light supports the simultaneous extraction of two or more defined audio regions. This functionality supports the production of two channel stereo tracks. It also provides for multiple versions of single mono tracks to compensate for damage to different areas of the optical track area.
The process for setting multiple bounding boxes differs little from setting a single box:
The project settings pane allows users to adjust the default settings for the core audio extraction process as outlined below. The following summary of the audio extraction method will orient users to the impact certain variable have on the process. AEO-light extraction has four main steps:
The quality of audio extracted is heavily impacted by the steps 2 and 3, as such, these variables should be looked at first when attempting troubleshoot the process.
THE FOLLOWING SETTINGS APPLY TO STEP ONE:
The routine AEO-Light uses to extract frames from video files may be unreliable depending on the video source (for example, when the video is on a slow external drive). If there are problems reading video, switch to safe reading mode. Safe mode uses the same routine, but takes extra time to check the frames read to verify that they are reasonable. If the check fails, the frames are read and checked again. Subsequent reads typically have a better chance of succeeding due to caching.
This parameter has no effect when reading from a frame sequence (DPX files), since that doesn't require the video reader.
When reading frames from a video file, AEO-Light can read several frames at once to reduce the overhead associated with opening the video. Reading two frames at a time cuts the overhead in half over reading one frame at a time, for example. The amount of speed that can be gained is limited by available memory, however. If more frames are read than can fit in memory, the system will swap them in and out to disk, crippling performance speed. (This is called disk thrashing.) Windows users are encouraged to use the Automatic option (below) to calculate the number of frames per batch based on available memory. The Automatic option is not available for Linux and Mac users because of a limitation in Matlab. Mac and Linux users should experiment with frame per batch numbers to determine what is optimum for their systems. It is reasonable to test the system with a value of 20 frames per batch.
This parameter has no effect when reading from a frame sequence (DPX files), since the access overhead is incurred on each frame regardless of how many frames are read at a time.
(Windows only). If checked, AEO-Light automatically sets the number of frames per batch to maximize the number of frames in memory without exceeding memory (paging) at any stage of the algorithm. Due to limitations in MATLAB, this option is not available on OS/X or Linux.
THE FOLLOWING VARIABLES APPLY TO STEP TWO:
AEO-Light can attempt to adjust each frame's sound signal to compensate for the effects of uneven illumination during scanning. The calibration mask is constructed by smoothing an averaged sound signal. Each frame's sound signal is then divided by the mask. This step is optional. If quality audio can be extracted from scans without calibration skipping this step will expedite the extraction process.
The number of sound signals to average together to produce a calibration mask. These signals are taken consecutively from the beginning of the frame sequence. To use all available signals, enter "inf" (infinite) or use any number larger than the number of frames.
The maximum number of image rows in each frame that can be excluded from consideration. Excluded rows can form one block at the top and another block at the bottom of an image. The specified number puts a cap on the total in both blocks. The value can be expressed as number of rows directly or as a percentage of the total number of rows in each frame.
After Creating the calibration mask AEO-Light uses one of two methods to reduce defects in the calibration curve, thereby improving the quality of the calibration mask. User experimentation can determine which of the methods, Moving Average or Polynomial Fit, provides better defect modeling.
Moving Average Options
Sweeps
The number of sweeps (passes) to do a moving average over signal. The more the sweeps that are applied, the smoother the calibration mask becomes.
The span (in rows) in each direction to use in producing the calibration mask. The larger the half-span is, the smoother the calibration mask becomes.
Polynomial Fit Options
Polynomial of Degree
Use polynomials of the selected degrees to produce a calibration mask.
Specify the degree(s) of the polynomial to be fitted through the averaged sound signal. Degree 1 corresponds to a linear polynomial (a straight line). The polynomial from the specified class of polynomials that best fits the averaged sound signal is used as the calibration mask.
THE FOLLOWING VARIABLES APPLY TO STEP THREE:
AEO-Light must register the overlapping portions of the sound signals in order to piece them together into a continuous audio track for the whole movie. This is done by looking at either the sound signals themselves ("Overlap by Sound") or the whole scanned frame ("Overlap by Image").
Guess a typical amount of overlap between any two consecutive frames. This box cannot be unchecked, for guessing is fast and very beneficial when reasonably correct.
AEO-Light first samples pairs of consecutive frames (chosen at random from the film) and registers them against each other to find the typical amount of overlap between scanned frames. This typical value is used as the starting point for registration for every pair of frames (reducing the time needed for registration, so long as the typical value is reliable).
Type in the number of frame pairs to sample to determine the typical overlap. We recommend using a minimum of 400 frame pairs.
Calculate the overlap between every pair of consecutive frames by using sound-like signals to represent them.
Calculate the overlap between every two consecutive frames by using the full frames.
If both "by sound" and "by image" are selected, both will be done but only the "by image" part will ultimately be used. The log file and visualizations will reflect both steps, however.
This is the expected radius (range in each direction) the overlap guess might need to be adjusted for a given pair of frames. E.g., if a scanner rarely slips by more than 10 rows (up or down), then enter 10. AEO-Light finds the best fit within this range, and if that fit is validated (see below), it is used as the registration for that pair of frames.
An overlap registration is considered reliable if it produces a better match than any other overlap within the validation radius of it.
The maximal number of rows by which a valid overlap can differ from the overlap guess without requiring further investigation, e.g., if a scanner never slips by more than 20 rows (up or down), then enter 20.
THE FOLLOWING VARIABLES APPLY TO STEP FOUR
Having prepared the audio signal for extraction in steps 2 and 3, AEO-Light extracts the complete audio signal and encodes it as a .wav file.
Type in the frame rate of the original film when reading from .dpx or .tif files. The frame rate of a video source is determined automatically.
OTHER VARIABLES:
Save graphs and images representing intermediate results for presentation, diagnostics, and troubleshooting. Visualizations slow down the extraction process and cause rather bothersome blinking on the screen as the numerous figures are quickly drawn and deleted.
Run in parallel if two or more processors (cores) are available. The maximal number of cores available for starting a pool of parallel workers is determined automatically. Typically, four or more cores are needed for noticeable speed-up. Running in parallel speeds up only the computational portions of the extraction process. Reading times remain unaffected.
Although combined audio and video will be the goal for most users, those interested in long-term preservation of the content should anticipate that future technologies may affect the quality of the audio that can be extracted from digital image files and/or the ability to playback the muxed audio-visual file. The AEO-Light authors therefore recommend that users retain:
To assist in recording this information, AEO-Light can produce a rudimentary PREMIS xml file. Simply click the check box when saving the audio file.
NOTE: This PREMIS file will record the audio filename as an identifier; best practice would employ a naming convention that encourages unique and persistent filenames. The file has been designed with the expectation that it will be supplemented by information specific to a digital repository on ingest. For more on PREMIS see: http://www.loc.gov/standards/premis/
AEO-Light Ver. 0.9 (beta) is provided to users for testing. All feedback is vital to the development of the software but we are keenly interested in reports on the following issues.
Share feedback with the community, http://sourceforge.net/p/aeolight/discussion/ OR
submit feedback via the web, http://imi.cas.sc.edu/mirc-feedback/
All Rights Reserved [License]
AEO-Light, Ver. 0.9 (Beta)
Greg Wilsbacher, Borislav Karaivanov, Pencho Petrushev, and Mark Cooper
Additional programming by L. Scott Johnson. Testing and support by Brittany Braddock. Logo design by Ashley Blewer.