How to make waveform rendering more interesting?
Asked Answered
S

1

17

I wrote a waveform renderer that takes an audio file and creates something like this:

enter image description here

The logic is pretty simple. I calculate the number of audio samples required for each pixel, read those samples, average them and draw a column of pixels according to the resulting value.

Typically, I will render a whole song on around 600-800 pixels, so the wave is pretty compressed. Unfortunately this usually results in unappealing visuals as almost the entire song is just rendered at almost the same heights. There is no variation.

Interestingly, if you look at the waveforms on SoundCloud almost none of them are as boring as my results. They all have some variation. What could be the trick here? I don't think they just add random noise.

Sauterne answered 19/10, 2014 at 15:23 Comment(4)
Have you tried an "inverse log" vertical scale? That would accentuate the variation. Also you could try mean values instead of avg and/or cutting off the top/bottom outlier samplesBremsstrahlung
@Bremsstrahlung I didn't think about taking the means, I'll give that a try. What inverse log do you mean? Would that be an exponential function?Sauterne
Your input range is known - 0 to max volume (depending on sample depth?). You need a scale that places all low samples close to the bottom but leaves a lot of space for the large values. Say val^2 divided by max_value^2Bremsstrahlung
> "Unfortunately this usually results in unappealing visuals as almost the entire song is just rendered at almost the same heights. There is no variation." — this can be because the recording actually has too much dynamic compression (most modern records are damaged by mastering engineers in order to increase loudness as much as possible, sacrificing everything else). did you try some good records with good dynamic range? (for example, most classical music records are not damaged that way)Estus
M
23

I don't think SoundCloud is doing anything particularly special. There are plenty of songs I see on their front page that are very flat. It has more to do with the way detail is perceived and what the overall dynamics of the song are like. The main difference is that SoundCloud is drawing absolute value. (The negative side of the image is just a mirror.)

For demonstration, here is a basic white noise plot with straight lines:

regular plot

Now, typically a fill is used to make the overall outline easier to see. This already does a lot for the appearance:

fill

Larger waveforms ("zoomed out" in particular) typically use a mirror effect because the dynamics become more pronounced:

wrap

Bars are another way to visualize and can give an illusion of detail:

step

A pseudo routine for a typical waveform graphic (average of abs and mirror) might look like this:

for (each pixel in width of image) {
    var sum = 0

    for (each sample in subset contained within pixel) {
        sum = sum + abs(sample)
    }

    var avg = sum / length of subset

    draw line(avg to -avg)
}

This is effectively like compressing the time axis as RMS of the window. (RMS could also be used but they are almost the same.) Now the waveform shows overall dynamics.

That is not too different from what you are already doing, just abs, mirror and fill. For boxes like SoundCloud uses, you would be drawing rectangles.

Just as a bonus, here is an MCVE written in Java to generate a waveform with boxes as described. (Sorry if Java is not your language.) The actual drawing code is near the top. This program also normalizes, i.e., the waveform is "stretched" to the height of the image.

This simple output is the same as the above pseudo routine:

normal output

This output with boxes is very similar to SoundCloud:

box waveform

import javax.swing.*;
import java.awt.*;
import java.awt.event.*;
import java.awt.image.*;
import java.io.*;
import javax.sound.sampled.*;

public class BoxWaveform {
    static int boxWidth = 4;
    static Dimension size = new Dimension(boxWidth == 1 ? 512 : 513, 97);

    static BufferedImage img;
    static JPanel view;

    // draw the image
    static void drawImage(float[] samples) {
        Graphics2D g2d = img.createGraphics();

        int numSubsets = size.width / boxWidth;
        int subsetLength = samples.length / numSubsets;

        float[] subsets = new float[numSubsets];

        // find average(abs) of each box subset
        int s = 0;
        for(int i = 0; i < subsets.length; i++) {

            double sum = 0;
            for(int k = 0; k < subsetLength; k++) {
                sum += Math.abs(samples[s++]);
            }

            subsets[i] = (float)(sum / subsetLength);
        }

        // find the peak so the waveform can be normalized
        // to the height of the image
        float normal = 0;
        for(float sample : subsets) {
            if(sample > normal)
                normal = sample;
        }

        // normalize and scale
        normal = 32768.0f / normal;
        for(int i = 0; i < subsets.length; i++) {
            subsets[i] *= normal;
            subsets[i] = (subsets[i] / 32768.0f) * (size.height / 2);
        }

        g2d.setColor(Color.GRAY);

        // convert to image coords and do actual drawing
        for(int i = 0; i < subsets.length; i++) {
            int sample = (int)subsets[i];

            int posY = (size.height / 2) - sample;
            int negY = (size.height / 2) + sample;

            int x = i * boxWidth;

            if(boxWidth == 1) {
                g2d.drawLine(x, posY, x, negY);
            } else {
                g2d.setColor(Color.GRAY);
                g2d.fillRect(x + 1, posY + 1, boxWidth - 1, negY - posY - 1);
                g2d.setColor(Color.DARK_GRAY);
                g2d.drawRect(x, posY, boxWidth, negY - posY);
            }
        }

        g2d.dispose();
        view.repaint();
        view.requestFocus();
    }

    // handle most WAV and AIFF files
    static void loadImage() {
        JFileChooser chooser = new JFileChooser();
        int val = chooser.showOpenDialog(null);
        if(val != JFileChooser.APPROVE_OPTION) {
            return;
        }

        File file = chooser.getSelectedFile();
        float[] samples;

        try {
            AudioInputStream in = AudioSystem.getAudioInputStream(file);
            AudioFormat fmt = in.getFormat();

            if(fmt.getEncoding() != AudioFormat.Encoding.PCM_SIGNED) {
                throw new UnsupportedAudioFileException("unsigned");
            }

            boolean big = fmt.isBigEndian();
            int chans = fmt.getChannels();
            int bits = fmt.getSampleSizeInBits();
            int bytes = bits + 7 >> 3;

            int frameLength = (int)in.getFrameLength();
            int bufferLength = chans * bytes * 1024;

            samples = new float[frameLength];
            byte[] buf = new byte[bufferLength];

            int i = 0;
            int bRead;
            while((bRead = in.read(buf)) > -1) {

                for(int b = 0; b < bRead;) {
                    double sum = 0;

                    // (sums to mono if multiple channels)
                    for(int c = 0; c < chans; c++) {
                        if(bytes == 1) {
                            sum += buf[b++] << 8;

                        } else {
                            int sample = 0;

                            // (quantizes to 16-bit)
                            if(big) {
                                sample |= (buf[b++] & 0xFF) << 8;
                                sample |= (buf[b++] & 0xFF);
                                b += bytes - 2;
                            } else {
                                b += bytes - 2;
                                sample |= (buf[b++] & 0xFF);
                                sample |= (buf[b++] & 0xFF) << 8;
                            }

                            final int sign = 1 << 15;
                            final int mask = -1 << 16;
                            if((sample & sign) == sign) {
                                sample |= mask;
                            }

                            sum += sample;
                        }
                    }

                    samples[i++] = (float)(sum / chans);
                }
            }

        } catch(Exception e) {
            problem(e);
            return;
        }

        if(img == null) {
            img = new BufferedImage(size.width, size.height, BufferedImage.TYPE_INT_ARGB);
        }

        drawImage(samples);
    }

    static void problem(Object msg) {
        JOptionPane.showMessageDialog(null, String.valueOf(msg));
    }

    public static void main(String[] args) {
        SwingUtilities.invokeLater(new Runnable() {
            @Override
            public void run() {
                JFrame frame = new JFrame("Box Waveform");
                JPanel content = new JPanel(new BorderLayout());
                frame.setContentPane(content);

                JButton load = new JButton("Load");
                load.addActionListener(new ActionListener() {
                    @Override
                    public void actionPerformed(ActionEvent ae) {
                        loadImage();
                    }
                });

                view = new JPanel() {
                    @Override
                    protected void paintComponent(Graphics g) {
                        super.paintComponent(g);

                        if(img != null) {
                            g.drawImage(img, 1, 1, img.getWidth(), img.getHeight(), null);
                        }
                    }
                };

                view.setBackground(Color.WHITE);
                view.setPreferredSize(new Dimension(size.width + 2, size.height + 2));

                content.add(view, BorderLayout.CENTER);
                content.add(load, BorderLayout.SOUTH);

                frame.pack();
                frame.setResizable(false);
                frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
                frame.setLocationRelativeTo(null);
                frame.setVisible(true);
            }
        });
    }
}

Note: for the sake of simplicity, this program loads the entire audio file in to memory. Some JVMs may throw OutOfMemoryError. To correct this, run with increased heap size as described here.

Monotonous answered 19/10, 2014 at 18:40 Comment(14)
I thought that such plot should visualize the difference from other songs, but it's just show us the dispersion around average value... Mb I (was/am) wrong?Diller
@Diller I don't understand what you are unsure of. See my edit for an actual plot of the averaging I've described. At such a coarse level of detail, overall dynamics will be the only discernible difference between songs.Monotonous
@Monotonous thx, your update is very interesting. My thoughts about method is: the song is a function f(t) = n(t) + s(t) where t - time, n - uniformly distributed instrumental noise, s - the singer words, or solo, the song itself (mb periodic), the main difference from other songs. I think such plots should contains 90% of s(t), not 50/50 of n(t) and s(t). And I was expecting something like "Lets find fluctuations around avg value on each segment", but you answer was rather simple, that was a distraction for me.Diller
@Diller The information contained in what we would normally call a song is largely in the frequency domain. It is harmonic patterns and pitch fluctuations. It would be possible but combersome to capture some of this information. (The harmonic information in the time domain is at best obfuscated and at worst impossible for humans to understand.) So a waveform is just a time domain plot and it is pretty basic.Monotonous
PS. Did you just write this app for this question or find it somewhere? Either way, thank you very much for this excellent post!Sauterne
@Sauterne I wrote it for the question. When I was learning this stuff myself I found it difficult to find adequate tutorials/examples so I typically take the time to write something nowadays.Monotonous
Haha, submitted that PS. too early by accident, here is the rest of my comment: I think you are right. The boxes are definitely more visually pleasing. I may also have chosen a very bad song that's actually just boring, no matter how you render it. Yesterday I was experimenting with different methods to calculate a value for each pixel/box from the assigned samples (min, max, avg, mean..). The average seems to best represent the song. One issue I had, compared to your code, was that I didn't normalize the values. I have much better results now!Sauterne
The problem with that is that I need to be able to change the length of the wave dynamically at runtime, i.e. stretch it and rerender. By the nature of this method, the size of the highest box could change significantly after even just a little change in resolution. Imagine a quiet song with a single loud bang in the middle. This bang could happen to be entirely within a single box, resulting in a high box. Or it could happen to be on the edge between two boxes, making two not-that-high boxes.Sauterne
The resulting normalization values would be very different, making the wave appear to scale vertically. I was thinking about using the average value of all the samples to normalize the wave. This should keep the wave at the same apparent height, no matter the resolution. But I'm not sure if this is the best solution, as it could result in boxes being higher than the actual area of the waveform. What do you think?Sauterne
Well the simple fact is that much of the time-related information is just lost. If you compare the two generated waveforms, you can see that the second is about the lowest level of detail a 3 minute song can be reduced to. Something you could try (and what I think @Bremsstrahlung was getting at) is doctoring the data a bit. A simple way would be for example sample = sample * sample / 32768f (square and scale) which will double the contrast between peaks and troughs. It's generally not done but you might be happy with it.Monotonous
The flatness of a typical song is due to dynamic range compression on the master and it is actually desired by some artists. In fact, SoundCloud may have used to use a log scale to make the waveforms appear louder. GIS 'soundcloud player' to see what I mean. The site looks to me like they are just displaying a normal output now.Monotonous
How we can modify the example to not load all the file in memory :) I am using it in real Open Source Project :) @MonotonousRiven
@Riven I'd find a way to reduce their length while/after reading the files. You could use aggressive sample rate conversion, or something simpler like the averaging in my code example. I'd try to reduce their length to an array of maybe 10,000 or 100,000 samples, which is long enough to recompute the graphic as needed (if e.g. the UI is resized), but still very small compared to how much memory is available nowadays. If you're only loading one file in memory at a time (and aren't on mobile), though, you can just run the JVM with an -Xms argument.Monotonous
@Monotonous I made a repository just for that in JavaFX ( github.com/goxr3plus/Java-Audio-Wave-Spectrum-API ) , it's fully working just need a way to improve it on the way you told me :) Can you please help me on it ? . Will add also your code for making this amazing SoundCloud. I am loading the whole file and reading it with a buffer but still the amplitudes array is huuuge github.com/goxr3plus/Java-Audio-Wave-Spectrum-API/blob/master/… . The amplitudes calculation is happenning exactly on the method getWavAmplitudes line 195Riven

© 2022 - 2024 — McMap. All rights reserved.