Is there a GEDCOM parser written in Python? [closed]
Asked Answered
V

6

18

GEDCOM is a standard for exchanging genealogical data.

I've found parsers written in

but none so far written in Python. The closest I've come is the file libgedcom.py from the GRAMPS project, but that is so full of references to GRAMPS modules as to not be usable for me.

I just want a simple standalone GEDCOM parser library written in Python. Does this exist?

Veteran answered 17/12, 2009 at 5:10 Comment(0)
C
10

A few years ago I wrote a simplistic GEDCOM to XML translator in Python as part of a larger project. I found that dealing with the GEDCOM data in an XML format was much easier (especially when the next step involved XSLT).

I don't have the code online at the moment, so I've pasted the module into this message. This works for me; no guarantees. Hope this helps though.

import codecs, os, re, sys
from xml.sax.saxutils import escape

fn = sys.argv[1]

ged = codecs.open(fn, encoding="cp437")
xml = codecs.open(fn+".xml", "w", "utf8")
xml.write("""<?xml version="1.0"?>\n""")
xml.write("<gedcom>")
sub = []
for s in ged:
    s = s.strip()
    m = re.match(r"(\d+) (@(\w+)@ )?(\w+)( (.*))?", s)
    if m is None:
        print "Error: unmatched line:", s
    level = int(m.group(1))
    id = m.group(3)
    tag = m.group(4)
    data = m.group(6)
    while len(sub) > level:
        xml.write("</%s>\n" % (sub[-1]))
        sub.pop()
    if level != len(sub):
        print "Error: unexpected level:", s
    sub += [tag]
    if id is not None:
        xml.write("<%s id=\"%s\">" % (tag, id))
    else:
        xml.write("<%s>" % (tag))
    if data is not None:
        m = re.match(r"@(\w+)@", data)
        if m:
            xml.write(m.group(1))
        elif tag == "NAME":
            m = re.match(r"(.*?)/(.*?)/$", data)
            if m:
                xml.write("<forename>%s</forename><surname>%s</surname>" % (escape(m.group(1).strip()), escape(m.group(2))))
            else:
                xml.write(escape(data))
        elif tag == "DATE":
            m = re.match(r"(((\d+)?\s+)?(\w+)?\s+)?(\d{3,})", data)
            if m:
                if m.group(3) is not None:
                    xml.write("<day>%s</day><month>%s</month><year>%s</year>" % (m.group(3), m.group(4), m.group(5)))
                elif m.group(4) is not None:
                    xml.write("<month>%s</month><year>%s</year>" % (m.group(4), m.group(5)))
                else:
                    xml.write("<year>%s</year>" % m.group(5))
            else:
                xml.write(escape(data))
        else:
            xml.write(escape(data))
while len(sub) > 0:
    xml.write("</%s>" % sub[-1])
    sub.pop()
xml.write("</gedcom>\n")
ged.close()
xml.close()
C answered 25/1, 2010 at 23:55 Comment(0)
F
7

I've taken code from mwhite's answer, extended it a bit (OK, more than just a bit) and posted at github: http://github.com/dijxtra/simplepyged. I take suggestions about what else to add :-)

Frederique answered 18/10, 2010 at 16:54 Comment(0)
N
5

I know this thread is pretty old, but I found it in my searches as well as this project https://github.com/madprime/python-gedcom/

The source is squeeky clean and very functional.

Neptunian answered 18/12, 2014 at 23:49 Comment(0)
S
2

A general-purpose GEDCOM parser in Python is linked from http://ilab.cs.byu.edu/cs460/2006w/assignments/program1.html

Synchromesh answered 22/6, 2010 at 18:46 Comment(0)
U
1

You could use the SWIG tool for including C libraries though the native language interface. You'll have to make calls against the C api from within Python, but the rest of your code can be Python only.

May sound a bit daunting, but once you get thing setup, using the two together won't be bad. There may be some quirks depending how the C library was written, but you'd have to deal with some no matter which option you used.

Undertrump answered 17/12, 2009 at 5:31 Comment(1)
Or use ctypes or Cython (forked from Pyrex).Punkie
V
0

Another basic parser for the GEDCOM 5.5 format: https://github.com/rootsdev/python-gedcom-parser

Veteran answered 7/3, 2016 at 11:42 Comment(1)
Please don't post answers on obviously off-topic questions! See: Should one advise on off topic questions? Off-topic questions can be closed and deleted, which could nullify your contribution. Here, the question is asking for an off-site resource and is on its way to closure.Hectare

© 2022 - 2024 — McMap. All rights reserved.