I need to process XSLT using python, currently I'm using lxml which only support XSLT 1, now I need to process XSLT 2 is there any way to use saxon XSLT processor with python?
There are two possible approaches:
set up an HTTP service that accepts tranformation requests and implements them by invoking Saxon from Java; you can then send the transformation requests from Python over HTTP
use the Saxon/C product
, currently available on prerelease: details here: http://www.saxonica.com/saxon-c/index.xml
Saxon/C release 1.2.0 is now out with XSLT 3.0 support for Python3 see details:
A Python interface for Saxon/C is in development and worth a look:
At the moment there is not, but you could use the subprocess module to use the Saxon processor:
import subprocess
subprocess.call(["saxon", "-o:output.xml", "-s:file.xml", "file.xslt"])
On January 13, 2023, Saxonica has released their own mantained pip package for Saxon 12:
Now all we need is:
pip install saxonche
If you're using Windows:
Download the zip file Saxon-HE 9.9 for Java from http://saxon.sourceforge.net/#F9.9HE and unzip the file to C:\saxon
Use this Python code:
import os
import subprocess
def file_path(relative_path):
folder = os.path.dirname(os.path.abspath(__file__))
path_parts = relative_path.split("/")
new_path = os.path.join(folder, *path_parts)
return new_path
def transform(xml_file, xsl_file, output_file):
"""all args take relative paths from Python script"""
input = file_path(xml_file)
output = file_path(output_file)
xslt = file_path(xsl_file)
subprocess.call(f"java -cp C:\saxon\saxon9he.jar net.sf.saxon.Transform -t -s:{input} -xsl:{xslt} -o:{output}")
This is in addition to the above answers suggesting subprocess
and saxonche
.
The example code in saxonche's pypi repository is slightly flawed in that there's essential indentation missing.
Also, I know it's just an example, but it would instantiate a new_xslt30_processor()
for each and every xml file you need to transform. That wouldn't be very efficient.
My use case is that I periodically get a bunch of xml files (MARC21) that I need to transform with one and the same xslt-sheet (XSLT 2.0). So assume that the xslt-sheet 'o2a.xml' produces the desired output when I run
transform -s:my.xml -xsl:o2a.xml -o:my_output.xml
So I wrote this:
from saxonche import PySaxonProcessor
from pathlib import Path
class Xslt_proc():
proc = PySaxonProcessor(license = False)
nuproc = proc.new_xslt30_processor()
xform = nuproc.compile_stylesheet(stylesheet_file='o2a.xsl')
def transform(processor, infile, sfx):
outfname = f'{Path(infile).stem}_{sfx}.xml'
doc = processor.proc.parse_xml(xml_file_name=infile)
out = processor.xform.transform_to_string(xdm_node=doc)
with open(outfname, 'w') as f:
f.write(out)
def main():
f_xml = 'some_xml_file.xml'
P = Xslt_proc()
transform(P, f_xml, '_done')
if __name__ == "__main__":
main()
I was curious which method would be faster, subprocess or the code above.
So I ran 20 iterations on 5 input files. First using a subprocess
call to transform.exe
. And again, 20 iterations on the same 5 input files, with my own module, like this:
from pathlib import Path
import saxonche_transform as st
flist = [f.name for f in Path('.').glob('*.xml')]
P = st.Xslt_proc()
for i in range(20):
for f in flist:
st.transform(P, f, '_python')
The latter was 100 times faster, 2.6 seconds against 258 seconds for the subprocess
test.
So thank you, Saxonica.
© 2022 - 2024 — McMap. All rights reserved.