Do you have any clue how to achieve this in MySQL?
Yes, go by foot and make the xml yourself with CONCAT
strings. Try
SELECT concat('<orders><employee emp_id="', emp_id, '"><customer cust_id="', cust_id, '" region="', region, '"/></employee></orders>') FROM table
I took this from a 2009 answer How to convert a MySQL DB to XML? and it still seems to work. Not very handy, and if you have large trees per item, they will all be in one concatenated value of the root item, but it works, see this test with dummies:
SELECT concat('<orders><employee emp_id="', 1, '"><customer cust_id="', 2, '" region="', 3, '"/></employee></orders>') FROM DUAL
gives
<orders><employee emp_id="1"><customer cust_id="2" region="3"/></employee></orders>
With "manual coding" you can get to this structure.
<?xml version="1.0"?>
<orders>
<employee emp_id="1">
<customer cust_id="2" region="3" />
</employee>
</orders>
I checked this with a larger tree per root item and it worked, but I had to run an additional Python code on it to get rid of the too many openings and closings generated when you have medium level nodes in an xml path. It is possible using backward-looking lists together with entries in a temporary set, and I got it done, but an object oriented way would be more professional. I just coded to drop the last x items from the list as soon as a new head item was found, and some other tricks for nested branches. Worked.
I puzzled out a Regex that found each text between tags:
string = " <some tag><another tag>test string<another tag></some tag>"
pattern = r'(?:^\s*)?(?:(?:<[^\/]*?)>)?(.*?)?(?:(?:<\/[^>]*)>)?'
p = re.compile(pattern)
val = r''.join(p.findall(string))
val_escaped = escape(val)
if val_escaped != val:
string.replace(val, val_escaped)
This Regex helps you to access the text between the tags. If you are allowed to use CDATA, it is easiest to use that everywhere. Just make the content "CDATA" (character data) already in MySQL:
<Title><![CDATA[', t.title, ']]></Title>
And you will not have any issues anymore except for very strange characters like (U+001A) which you should replace already in MySQL. You then do not need to care for escaping and replacing the rest of the special characters at all. Worked for me on a 1 Mio. lines xml file with heavy use of special characters.
Yet: you should validate the file against the needed xml schema file using Python's module xmlschema
. It will alert you when you are not allowed to use that CDATA trick.
If you need a fully UTF-8 formatted content without CDATA, which might often be the task, you can reach that even in a 1 Mio lines file by validating the code output (= xml output) step by step against the xml schema file (xsd that is the aim). It is a bit fiddly work, but it can be done with some patience.
Replacements are possible with:
- MySQL using replace()
- Python using string.replace()
- Python using Regex replace (though I did not need it in the end, it would look like:
re.sub(re.escape(val), 'xyz', i)
)
- string.encode(encoding = 'UTF-8', errors = 'strict')
Mind that encoding as utf-8 is the most powerful step, it could even put aside all three other replacement ways above. Mind also: It makes the text binary, you then need to treat it as binary b'...' and you can thus write it to a file only in binary mode using wb
.
As the end of it all, you may open the XML output in a normal browser like Firefox for a final check and watch the XML at work. Or check it in vscode/codium with an xml Extension. But these checks are not needed, in my case the xmlschema module has shown everything very well. Mind also that vscode/codium can can handle xml problems quite easily and still show a tree when Firefox cannot, therefore, you will need a validator or a browser to see all xml errors.
Quite a huge project could be done using this xml-building-with-mysql, at the end there was a triple nested xml tree with many repeating tags inside parent nodes, all made from a two-dimensional MySQL output.
PS
Instead of MySQL, you might be better of with built-in tools of MSSQL server and the like. This answer was about the mere handycraft of XML building, but there are better tools nowadays. Export to a file and work on it in another SQL if puzzling the XML together takes too much time and nerves.