How can I draw sound data from my wav file?
Asked Answered
S

3

10

First off this is for homework or... project.

I'm having trouble understanding the idea behind how to draw the sound data waves on to a graph in Java for a project. I have to make this assignment entirely from scratch with a UI and everything so basically making a .wav file editor. The main issue I'm having is getting the sound data into the graph to be drawn. Currently I have a randomly generated array of values just being drawn right now.

So far I have a mini-program running and validating the wav file for it to actually be a wav file.

I'm reading it in with a FileInputStream and validating: the RIFF bytes(0-3), FileLength(4-7), WAVE bytes(8-11), then the format chunk format(starting from the end of the RIFF chunk; and positioning the index to the end of it and giving format 0-3, length of format chunk 4-7, then the next 16 bytes for all the specifications of the wave file and storing those in their appropriate named variables.

Once I get to the DATA chunk and its length past that is all my sound data and that is what I'm unsure of how to store each byte for byte of sound data or even translate it to be value that's related to the amplitude of the sound. I thought validating was similar so it would be the same but it doesn't seem to be that way... Either that or I've been complicating something super simple since I've been staring at this for a few days now.

Any help is appreciated thanks.

Slovenia answered 14/10, 2012 at 4:5 Comment(6)
close duplicate: #11017783Not
Are you allowed to use Java Sound API for this homework? Agree with @Denis this seems a duplicate.Hayfork
You could take a look at #12067198 or #12036299Geldens
"value that's related to the amplitude of the sound." This is a tricky concept to convey, but the the sound amplitude only ever comes from groups of sample values. If all sample values were '128' the individual sample values might imply 'full volume' yet the result would be complete silence.Hayfork
yes i'm allowed to use anything. but the teacher told us like every sample will be practically an array's index and the value is going to be its amplitude which had most of us confused i've been doing this for 36 hours nonstop and felt like i haven't even seen the grass outside for a week. thanks so much so far for all the advice and links everyone i think i should have enough to figure it out soon. I think @AndrewThompson your explanation actually makes it more clear. i didn't know wave file sound samples come in groups.Slovenia
"i'm allowed to use anything" Then you will definitely want to check out Java sound.Hayfork
I
18

I'm not a Java programmer, but I know a fair bit about rendering audio so hopefully the following might be of some help...

Given that you will almost always have a much larger number of samples than available pixels the sensible thing to do would be to draw from a cached reduction or 'summary' of the sample data. This is typically how audio editors (such as Audacity) render audio data. In fact the most common strategy is to compute the number of samples per pixel, then find the maximum and minimum samples for each block of size SamplesPerPixel, then draw a vertical line between each max-min pair. You might want to cache this reduction, or perhaps a series of such reductions for different zoom levels. Audacity caches to temporary files ('block files') on disk.

The above is perhaps something of an oversimplification, however, because in reality you will want to compute the initial max-min pairs from a chunk of fixed size - say 256 samples - rather than from one of size SamplesPerPixel. Then you can compute further 'on the fly' reductions from that cached reduction. The point is that SamplesPerPixel will typically be a dynamic quantity - since the user might resize the canvas at any time (hope that makes sense...).

Also remember that when you are drawing to your canvas you will need to scale the sample values by the width and height of the canvas. The best way to do this (in the vertical direction, at least) is to normalize the samples, then multiply by the canvas height. 16-bit audio consists of samples in the range [-32768, 32767], so to normalize just do a floating-point division by 32768. Then reverse the sign (to flip the waveform to the canvas coordinates), add 1 (to compensate for the negative values) and multiply by half the canvas height. That's how I do it, anyway.

This page shows how to build a rudimentary waveform display with Java Swing. I haven't looked at it in detail, but I think it just downsamples the data rather than computing max-min pairs. This will, of course, not provide as accurate a reduction as the max-min method, but it's easier to calculate.

If you want to know how to do things properly you should dig into the Audacity source code (be warned, however - it's fairly gnarly C++). To get a general overview you might look at 'A Fast Data Structure for Disk-Based Audio Editing', by the original author of Audacity, Dominic Mazzoni. You will need to purchase that from CMJ, however.

Implant answered 14/10, 2012 at 5:19 Comment(4)
i'll re-read this over and over until i make some sense of it this seems to be a good idea i haven't thought of thanks!Slovenia
@Kevin Heng Yes, apologies if the above seemed complicated, but it's actually a very tricky thing to get right - as I discovered myself when I began writing my own audio editor. I would suggest starting with a simple reduction of blocks of 256 samples. Take the max and min sample of each of these blocks and cache them somewhere (in memory is OK to begin with, for small files). Then just draw a vertical line at each pixel between each max and min sample, appropriately scaled.Implant
@Kevin Heng It's important to remember that the coordinate system of your display - and hence your canvas (or whatever you want to call your drawing surface) - has its origin in the top left corner, and so will be upside-down with respect to the sample data. This is why I suggested changing the sign of each (normalized) sample.Implant
oh ok i never considered the sample to be reversed. we also had to implement discrete fourier transform of the samples too(have that working already just need to knwo how to take in the samples). thanks so much for the help!Slovenia
R
3

For standard WAV files, it's actually pretty easy. Once you get past the headers, you just interpret every 16 bits as a two's complement integer. I'd recommend using a DataInputStream, since then it is as easy as calling readShort().

These are the amplitude values at each sample point. You may want to do some averages or something, because most of the time there will be way more samples than horizontal pixels. Trying to plot all the samples on some sort of line graph may not be the best way.

Rusty answered 14/10, 2012 at 4:51 Comment(4)
@KevinHeng I'm not an expert on WAV, but I think the 8 bits per sample would be similar, but I think it is stored as unsigned bytes. Use the DataInputStream.readUnsignedByte() method.Rusty
Correct for wav files, 8 bit files are unsigned.Thug
Oh right our teacher also wants us to plot the samples on a graph too.. so I cant' really go around that @user141603, it needs to have zoom functions later on and such :/ and copy and paste manipulation of a select areaSlovenia
@KevinHeng Plotting all the samples on a graph may require a ton of horizontal resolution to have any clarity. If when you read the file in, you store the samples in an array, copy/paste becomes simply System.arraycopy.Rusty
T
1

First thing you need to do is read the raw data. Writing a Wav file parser is not too hard, but you can also use the javasound API. There are some great hints about and sample code for using this api here:

http://www.jsresources.org/

If you want to write your own parser, you could start here:

https://ccrma.stanford.edu/courses/422/projects/WaveFormat/

Once you have the raw data, you can display it as a function of time. This is called the waveform.

However, displaying the waveform is time consuming when the user has "zoomed out" on a lot of data: an hour's worth of data would take a long time to render in this manner. Most applications, therefore, precompute some data to make drawing the zoomed out data faster. The "correct" way to do this is as follows:

  • loop on blocks of samples in the file (from between 50 and 500 or so)
    • read the block of samples
    • take the absolute value of all those samples
    • take the maximum of the absolute value
    • store the maximum as the "zoomed out" value for that block

When I say "correct" I mean this is what everybody does, so it will result in a view that looks like what people expect. If you do something different (eg computing logs or averaging instead of looking for the peak) you will get something that doesn't look right, as this fellow discovered:

drawing waveform - converting to DB squashes it

Thug answered 15/10, 2012 at 15:18 Comment(1)
Alright thanks! this makes a lot of sense too I wasn't too sure what to do with sample pairs, the most my homework project will accomplish is just editing a wave file so only about a few seconds of data which could still be very well 60000+ samples. but nothing larger than that. no mp3s and the like. just uncompressed .wav filesSlovenia

© 2022 - 2024 — McMap. All rights reserved.