Parsing GPS receiver output via regex in Python

Asked 22/11, 2008 at 20:50 Answered 26/3, 2023 at 1:45

I have a friend who is finishing up his masters degree in aerospace engineering. For his final project, he is on a small team tasked with writing a program for tracking weather balloons, rockets and satellites. The program receives input from a GPS device, does calculations with the data, and uses the results of those calculations to control a series of motors designed to orientate a directional communication antenna, so the balloon, rocket or satellite always stays in focus.

Though somewhat of a (eternal) beginner myself, I have more programming experience than my friend. So when he asked me for advice, I convinced him to write the program in Python, my language of choice.

At this point in the project, we are working on the code that parses the input from the GPS device. Here is some example input, with the data we need to extract in bold:

$GPRMC,092204.999,4250.5589,S,14718.5084,E,1,12,24.4,89.6,M,,,0000*1F $GPRMC,093345.679,4234.7899,N,11344.2567,W,3,02,24.5,1000.23,M,,,0000*1F $GPRMC,044584.936,1276.5539,N,88734.1543,E,2,04,33.5,600.323,M,,,*00 $GPRMC,199304.973,3248.7780,N,11355.7832,W,1,06,02.2,25722.5,M,,,*00 $GPRMC,066487.954,4572.0089,S,45572.3345,W,3,09,15.0,35000.00,M,,,*1F

Here is some further explanation of the data:

"I looks like I'll need five things out of every line. And bear in mind that any one of these area's may be empty. Meaning there will be just two commas right next to each other. Such as ',,,' There are two fields that may be full at any time. Some of them only have two or three options that they may be but I don't think I should be counting on that."

Two days ago my friend was able to acquire the full log from the GPS receiver used to track a recent weather balloon launch. The data is quite long, so I put it all in this pastebin.

I am still rather new with regular expressions myself, so I am looking for some assistance.

Holomorphic answered 22/11, 2008 at 20:50 Comment(4)

By the way, your $GPRMC line doesn't seem to fit the standard. home.mira.net/~gnb/gps/nmea.html#gprmc Am I missing something? – Fiasco 22/11, 2008 at 21:4

Thanks for pointing that out Federico. I'll be sure to look into that. – Holomorphic 22/11, 2008 at 21:6

It seems more of a $GPGGA line. – Fiasco 22/11, 2008 at 21:14

Honestly, I am not personally familiar with the equipment in use. However, I think the pastebin I linked to (pastebin.com/f5f5cf9ab) might offer some clarification. – Holomorphic 22/11, 2008 at 21:38

splitting should do the trick. Here's a good way to extract the data, as well:

>>> line = "$GPRMC,199304.973,3248.7780,N,11355.7832,W,1,06,02.2,25722.5,M,,,*00"
>>> line = line.split(",")
>>> neededData = (float(line[2]), line[3], float(line[4]), line[5], float(line[9]))
>>> print neededData
(3248.7779999999998, 'N', 11355.7832, 'W', 25722.5)

Loni answered 22/11, 2008 at 21:0 Comment(1)

3248.7780 and 11355.7832 are not a meaningful latitude and longitude values and they cannot be used in a formula as they are. You also need to know what those numbers structure in GPRMC string before converting float. Please also check this answer – Exodontist 4/2, 2023 at 14:43

You could use a library like pynmea2 for parsing the NMEA log.

>>> import pynmea2
>>> msg = pynmea2.parse('$GPGGA,142927.829,2831.4705,N,08041.0067,W,1,07,1.0,7.9,M,-31.2,M,0.0,0000*4F')
>>> msg.timestamp, msg.latitude, msg.longitude, msg.altitude
(datetime.time(14, 29, 27), 28.524508333333333, -80.683445, 7.9)

Disclaimer: I am the author of pynmea2

Performative answered 30/5, 2014 at 6:11 Comment(1)

7 years and your package still works like a charm. :) – Bailly 18/6, 2022 at 8:51

It's simpler to use split than a regex.

>>> line="$GPRMC,092204.999,4250.5589,S,14718.5084,E,1,12,24.4,89.6,M,,,0000*1F "
>>> line.split(',')
['$GPRMC', '092204.999', '4250.5589', 'S', '14718.5084', 'E', '1', '12', '24.4', '89.6', 'M', '', '', '0000*1F ']
>>>

Inweave answered 22/11, 2008 at 20:54 Comment(1)

Hah, it would be like me to pick the more complicated solution! – Holomorphic 22/11, 2008 at 20:59

Those are comma separated values, so using a csv library is the easiest solution.

I threw that sample data you have into /var/tmp/sampledata, then I did this:

>>> import csv
>>> for line in csv.reader(open('/var/tmp/sampledata')):
...   print line
['$GPRMC', '092204.999', '**4250.5589', 'S', '14718.5084', 'E**', '1', '12', '24.4', '**89.6**', 'M', '', '', '0000\\*1F']
['$GPRMC', '093345.679', '**4234.7899', 'N', '11344.2567', 'W**', '3', '02', '24.5', '**1000.23**', 'M', '', '', '0000\\*1F']
['$GPRMC', '044584.936', '**1276.5539', 'N', '88734.1543', 'E**', '2', '04', '33.5', '**600.323**', 'M', '', '', '\\*00']
['$GPRMC', '199304.973', '**3248.7780', 'N', '11355.7832', 'W**', '1', '06', '02.2', '**25722.5**', 'M', '', '', '\\*00']
['$GPRMC', '066487.954', '**4572.0089', 'S', '45572.3345', 'W**', '3', '09', '15.0', '**35000.00**', 'M', '', '', '\\*1F']

You can then process the data however you wish. It looks a little odd with the '**' at the start and end of some of the values, you might want to strip that stuff off, you can do:

>> eastwest = 'E**'
>> eastwest = eastwest.strip('*')
>> print eastwest
E

You will have to cast some values as floats. So for example, the 3rd value on the first line of sample data is:

>> data = '**4250.5589'
>> print float(data.strip('*'))
4250.5589

Excipient answered 24/11, 2008 at 4:49 Comment(1)

It turns out that there is some extra encoding going on here. For instance, "4250.5589,S" is actually latitude 42°50.5589'S – Stultz 6/3, 2011 at 1:49

You should also first check the checksum of the data. It is calculated by XORing the characters between the $ and the * (not including them) and comparing it to the hex value at the end.

Your pastebin looks like it has some corrupt lines in it. Here is a simple check, it assumes that the line starts with $ and has no CR/LF at the end. To build a more robust parser you need to search for the '$' and work through the string until hitting the '*'.

def check_nmea0183(s):
    """
    Check a string to see if it is a valid NMEA 0183 sentence
    """
    if s[0] != '$':
        return False
    if s[-3] != '*':
        return False

    checksum = 0
    for c in s[1:-3]:
        checksum ^= ord(c)

    if int(s[-2:],16) != checksum:
        return False

    return True

Fairlead answered 22/11, 2008 at 23:9 Comment(2)

Thanks for the example. I'll be sure to do something similar to this when writing the input checking function. – Holomorphic 23/11, 2008 at 2:19

def check_nmea(s): return s and \ s[0] == '$' and \ s[-3] == '*' and \ reduce(lambda x, y: x^ord(y), s[1:-3], 0) == int(s[-2:],16) – Deodand 18/7, 2018 at 14:31

This is a GPRMC string. After splitting the string, you need to parse latitude and longitude values.

line = "$GPRMC,199304.973,3248.7780,N,11355.7832,W,1,06,02.2,25722.5,M,,,*00"
line = line.split(",")

In latitude and longitude part ([..., '3248.7780', 'N', '11355.7832, 'W', ...]):

The first number is not a pure number, it is a number which is concatenated like a string. I mean, 3248.7780 refers 32 degree, 48.7780 minutes (latitude)
The second number (11355.7832) refers 113 degree, 55.7832 minutes (longitude)

They cannot be used in a formula as they are. They have to be converted to decimal degree.

def toDD(s):
    d = float(s[:-7])
    m = float(s[-7:]) / 60
    return d + m

lat_lon = (toDD(line[2]), line[3], toDD(line[4]), line[5])
print(lat_lon)

# (32.81296666666667, 'N', 113.92972, 'W')

Extricate answered 20/2, 2022 at 20:56 Comment(0)

If you need to do some more extensive analysis of your GPS data streams, here is a pyparsing solution that breaks up your data into named data fields. I extracted your pastebin'ned data to a file gpsstream.txt, and parsed it with the following:

"""
 Parse NMEA 0183 codes for GPS data
 http://en.wikipedia.org/wiki/NMEA_0183

 (data formats from http://www.gpsinformation.org/dale/nmea.htm)
"""
from pyparsing import *

lead = "$"
code = Word(alphas.upper(),exact=5)
end = "*"
COMMA = Suppress(',')
cksum = Word(hexnums,exact=2).setParseAction(lambda t:int(t[0],16))

# define basic data value forms, and attach conversion actions
word = Word(alphanums)
N,S,E,W = map(Keyword,"NSEW")
integer = Regex(r"-?\d+").setParseAction(lambda t:int(t[0]))
real = Regex(r"-?\d+\.\d*").setParseAction(lambda t:float(t[0]))
timestamp = Regex(r"\d{2}\d{2}\d{2}\.\d+")
timestamp.setParseAction(lambda t: t[0][:2]+':'+t[0][2:4]+':'+t[0][4:])
def lonlatConversion(t):
    t["deg"] = int(t.deg)
    t["min"] = float(t.min)
    t["value"] = ((t.deg + t.min/60.0) 
                    * {'N':1,'S':-1,'':1}[t.ns] 
                    * {'E':1,'W':-1,'':1}[t.ew])
lat = Regex(r"(?P<deg>\d{2})(?P<min>\d{2}\.\d+),(?P<ns>[NS])").setParseAction(lonlatConversion)
lon = Regex(r"(?P<deg>\d{3})(?P<min>\d{2}\.\d+),(?P<ew>[EW])").setParseAction(lonlatConversion)

# define expression for a complete data record
value = timestamp | Group(lon) | Group(lat) | real | integer | N | S | E | W | word
item = lead + code("code") + COMMA + delimitedList(Optional(value,None))("datafields") + end + cksum("cksum")


def parseGGA(tokens):
    keys = "time lat lon qual numsats horiz_dilut alt _ geoid_ht _ last_update_secs stnid".split()
    for k,v in zip(keys, tokens.datafields):
        if k != '_':
            tokens[k] = v
    #~ print tokens.dump()

def parseGSA(tokens):
    keys = "auto_manual _3dfix prn prn prn prn prn prn prn prn prn prn prn prn pdop hdop vdop".split()
    tokens["prn"] = []
    for k,v in zip(keys, tokens.datafields):
        if k != 'prn':
            tokens[k] = v
        else:
            if v is not None:
                tokens[k].append(v)
    #~ print tokens.dump()

def parseRMC(tokens):
    keys = "time active_void lat lon speed track_angle date mag_var _ signal_integrity".split()
    for k,v in zip(keys, tokens.datafields):
        if k != '_':
            if k == 'date' and v is not None:
                v = "%06d" % v
                tokens[k] = '20%s/%s/%s' % (v[4:],v[2:4],v[:2])
            else:
                tokens[k] = v
    #~ print tokens.dump()


# process sample data
data = open("gpsstream.txt").read().expandtabs()

count = 0
for i,s,e in item.scanString(data):
    # use checksum to validate input 
    linebody = data[s+1:e-3]
    checksum = reduce(lambda a,b:a^b, map(ord, linebody))
    if i.cksum != checksum:
        continue
    count += 1

    # parse out specific data fields, depending on code field
    fn = {'GPGGA' : parseGGA, 
          'GPGSA' : parseGSA,
          'GPRMC' : parseRMC,}[i.code]
    fn(i)

    # print out time/position/speed values
    if i.code == 'GPRMC':
        print "%s %8.3f %8.3f %4d" % (i.time, i.lat.value, i.lon.value, i.speed or 0) 


print count

The $GPRMC records in your pastebin don't seem to quite match with the ones you included in your post, but you should be able to adjust this example as necessary.

Stultz answered 5/3, 2011 at 10:4 Comment(0)

I suggest a small fix in your code because if used to parse data from the previous century the date looks like sometime in the future (for instance 2094 instead of 1994)

My fix is not fully accurate, but I take the stand that prior to the 70's no GPS data existed.

In the def parse function for RMC sentences just replace the format line by:

p = int(v[4:])
print "p = ", p
if p > 70:
    tokens[k] = '19%s/%s/%s' % (v[4:],v[2:4],v[:2])
else:
    tokens[k] = '20%s/%s/%s' % (v[4:],v[2:4],v[:2])

This will look at the two yy digits of the year and assume that past year 70 we are dealing with sentences from the previous century. It could be better done by comparing to today's date and assuming that every time you deal with some data in the future, they are in fact from the past century

Thanks for all the pieces of code your provided above... I had some fun with this.

Schoenburg answered 27/8, 2011 at 13:28 Comment(0)

This is an old question, and it was asking for a regular expression. My contribution here is an expression validated on regex101.com:

RMC,[\d\.]*,A,([\d\.]+),([NS]),([\d\.]+),([EW]),[\d\.]*,([\d\.]*)

You can use something like this:

import re

p = re.compile(r'RMC,[\d\.]*,A,([\d\.]+),([NS]),([\d\.]+),([EW]),[\d\.]*,([\d\.]*)')

nmea = r'$GPRMC,150714.696,A,2829.6203,N,08039.0335,W,0.00,,211108,,*0A'

x = p.search(nmea)
if not x is None:
  print(f"lat: {x.group(1)} {x.group(2)}")
  print(f"lon: {x.group(3)} {x.group(4)}")
  print(f"heading: {x.group(5) if len(x.group(5)) > 0 else 'None'}")

It has the 5 capturing groups that you needed (latitude, N/S, longitude, E/W, heading) and it will filter only valid fixes (those with an 'A' as 3rd parameter). Some observations:

14 years ago it was common to have only GPS receivers, so it was reasonable to use a regular expression that started with "$GPRMC", but nowadays you can have a multiconstellation GNSS receiver, that outputs messages that can be related to GPS, GLONASS, BEIDOU, GALILEO or something mixed, and the prefix can change accordingly to the constellation. For this reason, I've kept only the "RMC" identifier.
I've let heading to be optional in case there's no movement.
Remember that you must convert the latitude and longitude to decimal values (degrees.fraction_of_degrees°) or to degrees°minutes'seconds'' format to make some sense in regular maps. NMEA sentences use an strange format, with degrees concatenated to the minutes with fractional part of minutes, so you'll have to separate the degrees and minutes and then mix them again with something like this:

  lat_nmea = float(x.group(1))
  lat_deg = lat_nmea//100
  lat_min = lat_nmea - 100*lat_deg
  lat_deg = lat_deg + lat_min/60
  print(f"conv lat {x.group(1)} -> {lat_deg}")

  lon_nmea = float(x.group(3))
  lon_deg = lon_nmea//100
  lon_min = lon_nmea - 100*lon_deg
  lon_deg = lon_deg + lon_min/60
  print(f"conv lon {x.group(3)} -> {lon_deg}")

Broussard answered 26/3, 2023 at 1:45 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags