FSK Demodulation - Parsing Japanese EWS Data
Asked Answered
A

2

8

【This is not a duplicate. Similar questions are about scenarios where people have control over the source data. I do not.】

In Japan there's something called the "Emergency Warning Broadcasting System." It looks like this when activated: http://www.youtube.com/watch?v=9hjlYvp9Pxs

In the above video, at around 2:37, a FSK-modulated signal is sent. I want to parse this signal; i.e. given a WAV file that contains the signal, I want to end up with a StringBuilder that contains 0s and 1s to process them later. I have the spec for the binary data and all, but the problem is that I know nothing about audio programming. :(

This is just for a hobby project, but I became hooked. TV and radio makers can pick up this signal and have their appliances do stuff in reaction to it, so it can't be that hard, right? :(

Facts about the signal:

  • The mark tone is 1024Hz, and the stop tone is 640Hz
  • Each tone is 15.625ms long
  • 2 second pause before signal begins and after it ends (probably for detection purposes)

What I did so far:

  1. Write a simple RIFF parser that accepts 8bit mono WAV files and allows me to get samples from them. I've tested it and it works.
  2. A loop that takes 15.625ms of samples and:
    1. Uses RMS to look for two seconds of silence
    2. Uses the Goertzel algorithm to decide if the signal is 1024Hz or 640Hz

The problems I have:

  • 0s and 1s are swallowed during the loop depending on the test data.
    • Given the clarity of the signal (YouTube-to-MP3 rip), that shouldn't happen.
    • If I generate a repeating 01 sequence in Audacity 30 times, my program will pick up around 10 of the 01 pairs, instead of 30
  • Sometimes 0s and 1s are swapped (side effect of the above?)
  • If I tweak the code so it works with one test sound file, other test sound files stop working

My questions:

  • Can anyone give me a high level overview on how FSK decoding would be done properly in software?
  • Do I need to apply some sort of filter that limits the signal to 640Hz+1024Hz and mutes everything else?
  • What is the best approach to keep the timing right? Maybe I'm doing it wrong?
  • Any links to beginner's literature on this kind of audio processing? I'd really like to learn and get this working.

The code that reads samples is (simplified):

StringBuilder ews_bits = new StringBuilder();
double[] samples = new double[(int)(samplesPerMs * 16.625D)];
int index = 0, readTo = /* current offset + RIFF subChunk2Size */;
BinaryReader br = /* at start of PCM data */;

while (br.BaseStream.Position < readTo)
{
    switch (bitsPerSample / 8)
    {
        case 1: // 8bit
            samples[index++] = ((double)br.ReadByte() - 127.5D) / 256D;
            break;
        case 2: // 16bit
            samples[index++] = (double)br.ReadInt16() / 32768D;
            break;
    }

    if (index != samples.Length)
        continue;

    /****** The sample buffer is full and we must process it. ******/

    if (AudioProcessor.IsSilence(ref samples))
    {
        silence_count++;
        if (state == ParserState.Decoding && silence_count > 150)
        {
            // End of EWS broadcast reached.
            EwsSignalParser.Parse(ews_bits.ToString());

            /* ... reset state; go back looking for silence... */
        }
        goto Done;
    }

    /****** The signal was not silence. ******/

    if (silence_count > 120 && state == ParserState.SearchingSilence)
        state = ParserState.Decoding;

    if (state == ParserState.Decoding)
    {
        AudioProcessor.Decode(ref samples, sampleRate, ref ews_bits);

        bool continue_decoding = /* check first 20 bits for signature */;
        if (continue_decoding) goto Done;

        // If we get here, we were decoding a junk signal.
        state = ParserState.SearchingSilence;
    }

    /* Not enough silence yet */
    silence_count = 0;
Done:
    index = 0;
}

The audio processor is just a class with:

public static void Decode(ref double[] samples, int sampleRate, ref StringBuilder bitHolder)
{
    double freq_640 = GoertzelMagnitude(ref samples, 640, sampleRate);
    double freq_1024 = GoertzelMagnitude(ref samples, 1024, sampleRate);

    if (freq_640 > freq_1024)
        bitHolder.Append("0");
    else
        bitHolder.Append("1");
}

public static bool IsSilence(ref double[] samples)
{
    // power_RMS = sqrt(sum(x^2) / N)

    double sum = 0;

    for (int i = 0; i < samples.Length; i++)
        sum += samples[i] * samples[i];

    double power_RMS = Math.Sqrt(sum / samples.Length);

    return power_RMS < 0.01;
}


/// <remarks>http://www.embedded.com/design/embedded/4024443/The-Goertzel-Algorithm</remarks>
private static double GoertzelMagnitude(ref double[] samples, double targetFrequency, int sampleRate)
{
    double n = samples.Length;
    int k = (int)(0.5D + ((double)n * targetFrequency) / (double)sampleRate);
    double w = (2.0D * Math.PI / n) * k;
    double cosine = Math.Cos(w);
    double sine = Math.Sin(w);
    double coeff = 2.0D * cosine;

    double q0 = 0, q1 = 0, q2 = 0;

    for (int i = 0; i < samples.Length; i++)
    {
        double sample = samples[i];

        q0 = coeff * q1 - q2 + sample;
        q2 = q1;
        q1 = q0;
    }

    double magnitude = Math.Sqrt(q1 * q1 + q2 * q2 - q1 * q2 * coeff);

    return magnitude;
}

Thanks for reading. I hope you can help me.

Ambassadoratlarge answered 6/12, 2013 at 7:41 Comment(1)
So the data you're working with was converted from something that uses lossy compression, and uploaded to YouTube. You then ripped that to MP3, incuring more lossy compression. Then you convert that to WAV, which has to fill in those lossy gaps with something. It's quite possible that you're not getting exactly 1024 Hz. Does the signal look right if you examine it in an audio editor? I suspect you'll need to do some approximate matching as MrSmith mentions in his answer.Paraphernalia
S
2

This is how I would do it (high level description)

  1. Run your signal through a FFT
  2. look for steady peaks at about 640Hz+1024Hz (I would say at least +/- 10Hz)
  3. if the signal is steady for about 10 ms (with steady I mean about 95% of the samples are in the same range 640Hz+/-10Hz (or 1024Hz+/-10Hz) take it as a detection of the tone. Use this detection also to synchronize your timer that tells you when to expect the next tone.
Scale answered 6/12, 2013 at 8:11 Comment(0)
A
1

I got it about 90% working now after rewriting the sample parsing loop and silence detection parts. There were two main problems in my implementation. The first was that the silence detector was overeager, so I changed it from processing every millisecond of samples to every half-millisecond of samples. That brought me exactly to the start of FSK data.

The next problem was that I then thought I could naively let the demodulator look at 15.625ms of samples as it works itself through the WAV file. It turns out that while this works great for the first 90 bits or so, eventually tones become a little longer or shorter than expected and the demodulator goes out of sync. The current code finds and corrects 13 bits with such a timing mismatch. Particularly vulnerable to this are spots where the signal changes from mark to space and vice versa.

Guess there's a reason the word "analog" contains "anal". It is. I really wish I knew more about signal theory and digital signal processing. :(

How I discovered all of this: I imported the MP3 and trimmed it down to the FSK part using Audacity. Then I had Audacity generate labels for every bit. After that I went through highlighting bits according toe the labels.

Ambassadoratlarge answered 7/12, 2013 at 14:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.