Use saxon with python
Asked Answered
P

7

18

I need to process XSLT using python, currently I'm using lxml which only support XSLT 1, now I need to process XSLT 2 is there any way to use saxon XSLT processor with python?

Provisory answered 4/4, 2015 at 6:13 Comment(0)
Z
18

There are two possible approaches:

  1. set up an HTTP service that accepts tranformation requests and implements them by invoking Saxon from Java; you can then send the transformation requests from Python over HTTP

  2. use the Saxon/C product, currently available on prerelease: details here: http://www.saxonica.com/saxon-c/index.xml

Zwickau answered 4/4, 2015 at 9:3 Comment(2)
@Maliqf, which approach did you end up taking? and how was your experience with itHygrometric
I wrap Saxon/C in a thin Boost-Python wrapper. It's not difficult to do providing you know a bit of C/C++ - it's just a bit of boilerplate on-top of the the C++ examples given on Saxon's website. You can use the supplied PHP API as a guide on how to structure your Python API. I did it for exactly the reasons stated, no XSLT 3 support native to Python. It works well for me - specifically it's fast, unlike forking a child saxon process or HTTP requests.Edp
M
11

Saxon/C release 1.2.0 is now out with XSLT 3.0 support for Python3 see details:

http://www.saxonica.com/saxon-c/index.xml

Mesotron answered 18/10, 2019 at 9:53 Comment(2)
By now, this should be promoted to correct answer. Also cf. #59060268 for a step-by-step description.Duchess
SaxonC 11 has since been released.Mesotron
M
8

A Python interface for Saxon/C is in development and worth a look:

https://github.com/ajelenak/pysaxon

Mesotron answered 20/7, 2016 at 10:3 Comment(0)
P
5

At the moment there is not, but you could use the subprocess module to use the Saxon processor:

import subprocess

subprocess.call(["saxon", "-o:output.xml", "-s:file.xml", "file.xslt"])
Pradeep answered 21/10, 2016 at 3:17 Comment(0)
E
4

On January 13, 2023, Saxonica has released their own mantained pip package for Saxon 12:

saxonche

Now all we need is:

pip install saxonche
Everyman answered 7/3, 2023 at 9:29 Comment(0)
M
1

If you're using Windows:

Download the zip file Saxon-HE 9.9 for Java from http://saxon.sourceforge.net/#F9.9HE and unzip the file to C:\saxon

Use this Python code:

import os
import subprocess

def file_path(relative_path):
    folder = os.path.dirname(os.path.abspath(__file__))
    path_parts = relative_path.split("/")
    new_path = os.path.join(folder, *path_parts)
    return new_path

def transform(xml_file, xsl_file, output_file):
    """all args take relative paths from Python script"""
    input = file_path(xml_file)
    output = file_path(output_file)
    xslt = file_path(xsl_file)

    subprocess.call(f"java -cp C:\saxon\saxon9he.jar net.sf.saxon.Transform -t -s:{input} -xsl:{xslt} -o:{output}")
Milka answered 16/12, 2019 at 2:39 Comment(0)
A
0

This is in addition to the above answers suggesting subprocess and saxonche.

The example code in saxonche's pypi repository is slightly flawed in that there's essential indentation missing.

Also, I know it's just an example, but it would instantiate a new_xslt30_processor() for each and every xml file you need to transform. That wouldn't be very efficient.

My use case is that I periodically get a bunch of xml files (MARC21) that I need to transform with one and the same xslt-sheet (XSLT 2.0). So assume that the xslt-sheet 'o2a.xml' produces the desired output when I run

transform -s:my.xml -xsl:o2a.xml -o:my_output.xml

So I wrote this:

from saxonche import PySaxonProcessor
from pathlib import Path

class Xslt_proc():
    proc = PySaxonProcessor(license = False)
    nuproc = proc.new_xslt30_processor()
    xform = nuproc.compile_stylesheet(stylesheet_file='o2a.xsl')
    
def transform(processor, infile, sfx):
    outfname = f'{Path(infile).stem}_{sfx}.xml'
    doc = processor.proc.parse_xml(xml_file_name=infile)
    out = processor.xform.transform_to_string(xdm_node=doc)
    with open(outfname, 'w') as f:
        f.write(out)

def main():
    f_xml = 'some_xml_file.xml'
    P = Xslt_proc()
    transform(P, f_xml, '_done')
    
if __name__ == "__main__":
    main()
  

I was curious which method would be faster, subprocess or the code above.

So I ran 20 iterations on 5 input files. First using a subprocess call to transform.exe. And again, 20 iterations on the same 5 input files, with my own module, like this:

from pathlib import Path
import saxonche_transform as st

flist = [f.name for f in Path('.').glob('*.xml')]

P = st.Xslt_proc()

for i in range(20):
    for f in flist:
        st.transform(P, f, '_python')

The latter was 100 times faster, 2.6 seconds against 258 seconds for the subprocess test.

So thank you, Saxonica.

Acciaccatura answered 8/5, 2023 at 16:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.