Menu

#225 Spectrogram - Log/MEL/Bark frequency scaling options

open
nobody
spectrogram (2)
5
2023-01-19
2022-06-22
Sox User
No

Currently it seems that "spectrogram" can only use a linear frequency scale. I've run into a few examples recently that show how misleading this can be by over-representing higher-frequency patterns/issues.

Some other softwares default to logarithmic or MEL scaling, either of which give more real estate to lower frequencies in a way that better represents what we actually hear.

It would be nice to have these options in sox as well. I would put plain logarithmic as the highest priority just because it's the easiest math to implement, but if it's not a horrible amount of trouble, MEL and Bark options would be appreciated too.

My understanding is that there isn't a "standardized" MEL scale formula, but it would be reasonable to just pick a popular approach, as long as it's documented. Similar situation for Bark. I suppose the MEL break/corner frequency could be adjustable as a parameter if that's easy to do.

One note for aesthetics: if someone takes this on, please attempt to keep the labels on the frequency axis reasonably useful. To contrast, spectrograms made with ffmpeg use absolutely terrible axis labels (on both axes) that are meaningless at a casual glance and require "deciphering". I assume this is because it's some degree of difficult to implement nice human-readable labels for an arbitrary scale (varied frequency ranges and durations), but sox's spectrograms are currently a lot prettier and it would be nice to keep that advantage if alternate frequency scales are implemented.

Discussion

  • Martin Guy

    Martin Guy - 2022-08-16

    Hi! I've done quite a bit of work on this. First I used "sox spectrogram" and/or "sndfile-spectrogram" and then warped the output with an imagemagick script (!), then implemented logarithmic frequency axis as an option for sndfile-spectrogram, which has now made it into current Debian stable. The axis labelling is 1-2-5-10-20-50 for large frequency ranges or linear for small ones, switching appropriately
    Ultimately I wrote a whole new software https://gitlab.com/martinwguy/spettro which plays music showing a scrolling log-freq spectrogram and optional axes and has a command key to do a screen dump of the current view as well as a command line option to do this non-interactively.
    I've long wanted to do this for sox too, but haven't yet bitten the bullet as I already have two solutions for my own work, but hope these pointers are helpful for your needs.

     

    Last edit: Martin Guy 2022-08-16
  • Sox User

    Sox User - 2023-01-19

    Sorry for not responding much sooner. I watched this space for a few weeks and then kind of wandered off.

    Thanks for your suggestions, Martin. I'll have a look at those projects.

    In the interim I just went back to using ffmpeg for logarithmic spectrograms. It took a bit of trial and error to figure out the gotchas in ffmpeg's implementation (there's no documentation about the effect of output image dimensions, but it's extremely sensitive to that). On the plus side, it seems like someone may have touched up the axis labelling because it seems much more readable than I remember.

     

Log in to post a comment.