How to extract .gpx data with python
Asked Answered
M

6

16

I am a new linux/python user and have .gpx files (output files that are made from GPS tracking software) and need to extract values into csv/txt for use in a GIS program. I have looked up strings and slicing etc. in my beginning python book, this website, and online. I have used a .gpx to .txt converter and can pull out the longitude and latitude into a text file. I need to extract the elevation data. The file has six lines of text at the top and I only know how to open this file in emacs (aside from uploading on a website) Here is the file starting at line 7.

Optimally, I would like to know how to extract all values through python (or Perl) into a csv or txt file. If anyone knows a website tutorial or a sample script it would be appreciated.

<metadata>
<time>2012-06-13T01:51:08Z</time>
</metadata>
<trk>
<name>Track 2012-06-12 19:51</name>
<trkseg>
<trkpt lat="43.49670697" lon="-112.03380961">
<ele>1403.0</ele>
<time>2012-06-13T01:53:44Z</time>
<extensions>
<ogt10:accuracy>34.0</ogt10:accuracy></extensions>
</trkpt>
<trkpt lat="43.49796612" lon="-112.03970968">
<ele>1410.9000244140625</ele>
<time>2012-06-13T01:57:10Z</time>
<extensions>
<gpx10:speed>3.75</gpx10:speed>
<ogt10:accuracy>13.0</ogt10:accuracy>
<gpx10:course>293.20001220703125</gpx10:course></extensions>
</trkpt>
<trkpt lat="43.49450857" lon="-112.04477274">
<ele>1406.5</ele>
<time>2012-06-13T02:02:24Z</time>
<extensions>
<ogt10:accuracy>12.0</ogt10:accuracy></extensions>
</trkpt>
</trkseg>
<trkseg>
<trkpt lat="43.49451057" lon="-112.04480354">
<ele>1398.9000244140625</ele>
<time>2012-06-13T02:54:55Z</time>
<extensions>
<ogt10:accuracy>10.0</ogt10:accuracy></extensions>
</trkpt>
<trkpt lat="43.49464813" lon="-112.04472215">
<ele>1414.9000244140625</ele>
<time>2012-06-13T02:56:06Z</time>
<extensions>
<ogt10:accuracy>7.0</ogt10:accuracy></extensions>
</trkpt>
<trkpt lat="43.49432573" lon="-112.04489684">
<ele>1410.9000244140625</ele>
<time>2012-06-13T02:57:27Z</time>
<extensions>
<gpx10:speed>3.288236618041992</gpx10:speed>
<ogt10:accuracy>21.0</ogt10:accuracy>
<gpx10:course>196.1999969482422</gpx10:course></extensions>
</trkpt>
<trkpt lat="43.49397445" lon="-112.04505216">
<ele>1421.699951171875</ele>
<time>2012-06-13T02:57:30Z</time>
<extensions>
<gpx10:speed>3.0</gpx10:speed>
<ogt10:accuracy>17.0</ogt10:accuracy>
<gpx10:course>192.89999389648438</gpx10:course></extensions>
</trkpt>
<trkpt lat="43.49428702" lon="-112.04265923">
<ele>1433.0</ele>
<time>2012-06-13T02:58:46Z</time>
<extensions>
<gpx10:speed>4.5</gpx10:speed>
<ogt10:accuracy>18.0</ogt10:accuracy>
<gpx10:course>32.400001525878906</gpx10:course></extensions>
</trkpt>
<trkpt lat="43.49444603" lon="-112.04263691">
<ele>1430.199951171875</ele>
<time>2012-06-13T02:58:50Z</time>
<extensions>
<gpx10:speed>4.5</gpx10:speed>
<ogt10:accuracy>11.0</ogt10:accuracy>
<gpx10:course>29.299999237060547</gpx10:course></extensions>
</trkpt>
<trkpt lat="43.49456961" lon="-112.04260058">
<ele>1430.4000244140625</ele>
<time>2012-06-13T02:58:52Z</time>
<extensions>
<gpx10:speed>4.5</gpx10:speed>
<ogt10:accuracy>8.0</ogt10:accuracy>
<gpx10:course>28.600000381469727</gpx10:course></extensions>
</trkpt>
<trkpt lat="43.49570131" lon="-112.04001132">
<ele>1418.199951171875</ele>
<time>2012-06-13T03:00:08Z</time>
<extensions>
Muddlehead answered 19/6, 2012 at 16:52 Comment(1)
Out of curiosity: Have you ever figured this out?Pyrimidine
C
18

You can install GPXpy

sudo pip install gpxpy

Then just use the library:

import gpxpy
import gpxpy.gpx

gpx_file = open('input_file.gpx', 'r')

gpx = gpxpy.parse(gpx_file) \
    for track in gpx.tracks:
        for segment in track.segments:
            for point in segment.points:
                print 'Point at ({0},{1}) -> {2}'.format(point.latitude, point.longitude, point.elevation)

for waypoint in gpx.waypoints:
    print 'waypoint {0} -> ({1},{2})'.format(waypoint.name, waypoint.latitude, waypoint.longitude)

for route in gpx.routes:
    print 'Route:'

For more info: https://pypi.python.org/pypi/gpxpy

Regards

Carew answered 18/6, 2016 at 20:20 Comment(1)
sudo pip is almost never a good idea (e.g. see this).Brandish
E
10

GPX is an XML format, so use a fitting module like lxml or the included ElementTree XML API to parse the data, then output to CSV using the python csv module.

Tutorials covering these concepts:

I also found a python GPX parsing library called gpxpy that perhaps gives a higher-level interface to the data contained in GPX files.

Elison answered 19/6, 2012 at 16:54 Comment(2)
I will give this a try. Someone also suggested to me that Perl might be a way to extract these. As I am equally a novice to both, I will look at your mentioned tutorials first. Thank you Martijn!Muddlehead
Perl would be equally suited for the task; there are Perl XML parsers and CSV libraries, just like for python. However, you may find Python easier to learn; in my personal opinion Perl too easily devolves into line-noise.Elison
P
7

Since Martijn posted a Python answer and said Perl would turn to line noise I felt there is the need for a Perl answer, too.

On CPAN, the Perl module directory, there is a module called Geo::Gpx. As Martijn already said, GPX is an XML format. But fortunately, someone has already made it into a module that handles the parsing for us. All we have to do is load that module.

There are several modules available for CSV handling, but the data in this XML file is rather simple, so we don't really need one. We can do it on our own with the built-in functionality.

Please consider the following script. I'll give an explanation in a minute.

use strict;
use warnings;
use Geo::Gpx;
use DateTime;
# Open the GPX file
open my $fh_in, '<', 'fells_loop.gpx';
# Parse GPX
my $gpx = Geo::Gpx->new( input => $fh_in );
# Close the GPX file
close $fh_in;

# Open an output file
open my $fh_out, '>', 'fells_loop.csv';
# Print the header line to the file
print $fh_out "time,lat,lon,ele,name,sym,type,desc\n";

# The waypoints-method of the GEO::GPX-Object returns an array-ref
# which we can iterate in a foreach loop
foreach my $wp ( @{ $gpx->waypoints() } ) {
  # Some fields seem to be optional so they are missing in the hash.
  # We have to add an empty string by iterating over all the possible
  # hash keys to put '' in them.
  $wp->{$_} ||= '' for qw( time lat lon ele name sym type desc );

  # The time is a unix timestamp, which is hard to read.
  # We can make it an ISO8601 date with the DateTime module.
  # We only do it if there already is a time, though.
  if ($wp->{'time'}) {
    $wp->{'time'} = DateTime->from_epoch( epoch => $wp->{'time'} )
                             ->iso8601();
  }
  # Join the fields with a comma and print them to the output file
  print $fh_out join(',', (
    $wp->{'time'},
    $wp->{'lat'},
    $wp->{'lon'},
    $wp->{'ele'},
    $wp->{'name'},
    $wp->{'sym'},
    $wp->{'type'},
    $wp->{'desc'},
  )), "\n"; # Add a newline at the end
}
# Close the output file
close $fh_out;

Let's take this in steps:

  • use strict and use warnings enforce rules like declaring variables and tell you about common mistakes that are the hardest to find.
  • use Geo::Gpx and use DateTime are the modules we use. Geo::Gpx is going to handle the parsing for us. We need DateTime to make unix timestamps into readable dates and times.
  • The open function opens a file. $fh_in is the variable that holds the filehandle. The GPX file we want to read is fells_loop.gpx which I took the liberty of borrowing from topografix.com. You can find more info on open in perlopentut.
  • We create a new Geo::Gpx object called $gpx and use our filehandle $fh_in to tell it where to read the XML data from. The new-method is provided by all Perl modules that have an object oriented interface.
  • close closes the filehandle.
  • The next open has a > to tell Perl that we want to write to this filehandle.
  • We print to a filehandle by putting it as the first argument to print. Note that there is no comma after the filehandle. The \n is a newline character.
  • The foreach loop takes the return value of the waypoints-method of the Geo::Gpx object. This value is an array reference. Think of this as an array that holds arrays (see perlref if you want to know more about references). In each iteration of the loop, the next element of that array ref (which represents a waypoint in the GPX data) will be put into $wp. If printed with Data::Dumper it looks like this:

    $VAR1 = {
          'ele' => '64.008000',
          'lat' => '42.455956',
          'time' => 991452424,
          'name' => 'SOAPBOX',
          'sym' => 'Cemetery',
          'desc' => 'Soap Box Derby Track',
          'lon' => '-71.107483',
          'type' => 'Intersection'
        };
    
  • Now the postfix for is a bit tricky. As we just saw, there are 8 keys in the hashref. Unfortunately, some of them are sometimes missing. Because we have use warnings, we will get a warning if we try to access one of these missing values. We have to create these keys and put an empty string '' in there.

    foreach and for are completely interchangeable in Perl, and both can also be used in postfix syntax behind a single expression. We use the qw-operator to create the list that for will iterate. qw is short for quoted words and it does just that: it returns a list of the strings in it, but quoted. We could also have said ('time', 'lat', 'long'... ).

    In the expression, we access each key of $wp. $_ is the loop variable. In the first iteration it will hold 'time', then 'lat' and so on. Since $wp is a hashref, we need the -> to access it's keys. The curly braces tell that it's a hashref. The ||= operator assigns a value to our hash ref element only if it is not a true value.

  • Now, if there is a time value (the empty string we just assigned if the date was not set is regarded as 'there is none'), we replace the unix timestamp with a proper date. DateTime helps us to do that. The from_epoch method gets the unix timestamp as an argument. It returns a DateTime object which we can directly use to call the iso8601 function on it.

    This is called chaining. Some modules can do it. It is similar to what jQuery's JavaScript objects do. The unix timestamp in our hashref is replaced with the result of the DateTime operation.

  • Now we print to our filehandle again. join is used to put commas between the values. We also put a newline at the end again.
  • Once we're done with the loop, we close the filehandle.
  • Now we're done! :)

All in all, I'd say this is pretty simple and also quite readable, isn't it? I tried to make it a healthy mix of overly verbose syntax with a _Perl_ish flavor.

Pyrimidine answered 19/6, 2012 at 21:18 Comment(8)
Thanks for your script! I went to CPAN, looked@ readme and having errors. the perl Makefile.PL command resulted in: Optional ExtUtils::MakeMaker::Coverage not available Argument "6.57_05" isn't numeric in numeric ge (>=) at Makefile.PL line 34. Checking if your kit is complete... Looks good Warning: prerequisite DateTime::Format::ISO8601 0 not found. Warning: prerequisite HTML::Entities 0 not found. Warning: prerequisite XML::Descent 1.01 not found. Writing Makefile for Geo::Gpx Writing MYMETA.yml proceeded w/make test & 8/10 tests& 3/3 subtests failed. tried to only run lat,lon,elev, w/noluckMuddlehead
so I have 4 pages of errors from the make test, though attempted to remove time and all other fields from the text aside from lat, lon, elev and run it anyways, with no luck. I read my first 3 chapters of the beginning perl book yesterday, so I'm hoping theres and easy fix, I also tried to reinstall under sudo with no luck. the script makes sense and I appreciate the explanation portion as well. Being a novice I am scratching my head at the moment.Muddlehead
Have you read a manual on how to install cpan modules? Or did you try to download it from the CPAN website? If you use the command line tool, it will install all the dependencies.Pyrimidine
Ah, in Beginning Perl it should tell you all about cpan in Chapter 2. If you're on Windows with ActivePerl, there's also a program called ppm that will give you a nice GUI to install modules. You can use either to get the modules you need with all the dependencies in one go.Pyrimidine
Sounds great I will look into this, you are correct I did get the download from the site, though I will look to download at the command prompt in bash/ubuntu.Muddlehead
Again, don't just download but install using cpan. This is the easiest way. You could download it and install it using the make-file, but why bother if cpan and i MODULE::NAME can take care of it for you? ;)Pyrimidine
Hi Simbabque, almost there, though not quite. I installed the modules for DateTime::Format::ISO8601 & XML::Descent which moved me to Line19 of your script. It has created the headers in the .csv which I consider progress. I am just reading about arrays in the Beg.Perl book & also receiving this error: Can't use an undefined value as an ARRAY reference at gpxscript line 19. Not that I know much about arrays... the book states that an array must start with an alpha character or underscore. thats all I can think of... I have named the array @array, and that didnt help either. Im stuck...Muddlehead
@PaulM. The only thin in line 19 that is an ARRAY reference (see Intermediate Perl, ch. 4) is @{ $gpx->waypoints() }. The $gpx object's (let's call it that for simplicity) method waypoints() is called. It returns a reference to an array. @{ } turns that reference into an actual array (called dereferencing). But in this case, it does not return an array ref, but undef. Thus, there seems to be no data it can return. You have installed the module, because it does not moan about that. Is the file you are reading there? Check the open statement.Pyrimidine
T
2

Geopandas also has the ability to open .gpx files as a dataframe once again relying on GDAL (Check out their supported vector formats). Since .gpx is an XML format, .gpx is also more nested than a regular dataframe. This is why you have to define the layer you want to open.

To save the metadata in a dataframe with a single row (you might already have the entire track here stored as a linestring, thus without timestamps):

import geopandas as gpd
df = gpd.read_file("myfile.gpx", layer='tracks')

To get the actual track, where each trackpoint equals a single row, do:

df = gpd.read_file("myfile.gpx", layer='track_points')
Tufa answered 5/9, 2022 at 18:45 Comment(1)
Works great, and is very useful if you know Pandas. Here's a link: geopandas.orgForsake
E
1

Every time I try to do this, I scour the internet for solutions and end up writing my own regex parser.

import re
import numpy as np

GPXfile='Lunch_Walk.gpx'
data = open(GPXfile).read()

lat = np.array(re.findall(r'lat="([^"]+)',data),dtype=float)
lon = np.array(re.findall(r'lon="([^"]+)',data),dtype=float)
time = re.findall(r'<time>([^\<]+)',data)


combined = np.array(list(zip(lat,lon,time)))

This gives an array of the format:

array([['51.504613', '-0.141894', '2020-12-26T12:43:14Z'],
       ['51.504624', '-0.141901', '2020-12-26T13:10:26Z'],
       ['51.504633', '-0.141906', '2020-12-26T13:10:28Z'],
       ...)

You can then do with this whatever you desire.

Eldwen answered 26/12, 2020 at 17:18 Comment(2)
It (arguably) may be better to use with open(...) in place of the open(...) oneliner.Smackdab
Python has relatively good file handling protocols. It should still close when going out of scope. Similarly, opening said file in read-only mode would be good practice, however, both are null points since this is a question on file parsing, and not reading.Eldwen
V
0

While gpxpy is the popular python answer, and I found this answer myself and tried it, I found it frustrating it was difficult if not impossible to get out extension type data like heartrate, and one still has to loop through the various nested xml ancestors/children so I wrote gpxcsv.

As easy as:

from gpxcsv import gpxtolist
import pandas as pd

df = pd.DataFrame(
    pxtolist('myfile.gpx'))

for a dataframe, or a command line tool exists to just create a csv or json file, preserving as many columns in the trackpoint as it finds using the tags as the column names.

Source code of the project on github.

Vey answered 26/7, 2021 at 0:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.