Invalid XSD schema using libxmljs with nodejs
Asked Answered
C

3

6

I've posted an issue on libxmljs repository and it was closed because they think that this isn't a problem at the level of the lib. So I post it here.

I'm trying to validate a XLIFF file with a XML schema provided by OASIS but I keep receiving an error with the XSD.

Error: Invalid XSD schema at Document.validate (/Users/fluxb0x/Tests/xliff_parser/node_modules/libxmljs/lib/document.js:73:17) at Request._callback (/Users/fluxb0x/Tests/xliff_parser/main.js:25:21) at Request.self.callback (/Users/fluxb0x/Tests/xliff_parser/node_modules/request/request.js:199:22) at Request.emit (events.js:98:17) at Request. (/Users/fluxb0x/Tests/xliff_parser/node_modules/request/request.js:1160:14) at Request.emit (events.js:117:20) at IncomingMessage. (/Users/fluxb0x/Tests/xliff_parser/node_modules/request/request.js:1111:12) at IncomingMessage.emit (events.js:117:20) at _stream_readable.js:938:16 at process._tickCallback (node.js:419:13)

I've use Oxygen XML editor to test the validation and it goes without problem.

This is the XLIFF file exported by me : en.xliff

This is the XSD file provided by OASIS : xliff_schema.xsd

Pretty big file.

Thank you for the help.

Corrody answered 10/10, 2014 at 7:48 Comment(0)
K
9

If the XSD schema contains xsd:import elements with filesystem-relative schemaLocation attributes, the libxmljs.parseXml() function accepts a baseUrl option that can be used to set the location of these.

const xsdDocument = libxmljs.parseXml(xsdString, { baseUrl: "/path/to/xsd/" });

This avoids the need to temporarily change working directory. Watch out for the trailing slash too.

Kerchief answered 3/10, 2017 at 17:59 Comment(1)
This worked for me on windows changing schemaLocation="internetsite.com/path/to/xsd/file.xsd" to schemaLocation="./xsd/file.xsd" and then libxmljs.parseXml(xsdString, { baseUrl: "c:/path/to/" });Tibbitts
G
4

As you noted on the libxmljs bug tracker, libxmljs throws an error when validating an XML with a schema file which imports another.

<xsd:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="./xml.xsd"/>

This happens because the relative path in schemaLocation is calculated on the process current working directory. A workaround is changing the directory before validating:

fs.readFile(schemaPath, { encoding: 'utf8' }, function (err, xsd) {
    if (err) cb (err);

    var cwd = process.cwd();
    process.chdir(path.dirname(schemaPath));

    var xsdDoc = libxml.parseXml(xsd);
    var xmlDoc = libxml.parseXml(content);

    var output = xmlDoc.validate(xsdDoc);
    process.chdir(cwd);

    cb(undefined, xmlDoc.validationErrors);
});

I'm not sure how libxml handles this though: maybe the referenced file is loaded synchronously and I suppose this is suboptimal.

This workaround is only for local files, I don't know how to resolve in case of a remote schemaLocation, like in your example (schemaLocation="http://www.w3.org/2001/xml.xsd"/>)

Even if it's not a real solution I think this might help.

Geosynclinal answered 28/11, 2014 at 11:56 Comment(0)
W
1

As noted in other answers, libxmljs throws an error when validating an XML with a schema file which <import>s another from an http or https url. However libxmljs works just fine if the <import> tag refers to a local file url.

This solution automatically processes each <import> element in the xsd by downloading it and updating the in-memory xsd document to refer to the local copy.

It uses axios for fetching urls, but this could be replaced with your favorite request library. Note that the processing only goes one level deep. If downloaded xsds also contain <import> elements, they will not be processed, although that type of recursion would not be difficult to add.

import * as libxml from "libxmljs"
import * as fs from "fs"
import * as os from "os"
import axios from "axios"

async function validateFile(file:string) {
  const fileContents = fs.readFileSync(file).toString()
  const doc = libxml.parseXml(fileContents)
  const root = doc.root()
  const schemaHref = root.namespace().href()
  const schemaLocation = root.attrs().find(a=>a.namespace().href() === 'http://www.w3.org/2001/XMLSchema-instance' && a.name() === "schemaLocation").value()
  const loc = (href:string):string=>{
    const pieces = schemaLocation.split(/\s+/)
    for (let i=0; i<pieces.length; i+=2) {
      if (pieces[i] === href) return pieces[i+1]
    }
  }
  const schemaDoc = libxml.parseXml((await axios.get(loc(schemaHref))).data)
  const importElements = schemaDoc.find("//schema:import",{schema:"http://www.w3.org/2001/XMLSchema"})
  // make any imports local
  for (const e of importElements) {
    const schemaLocation = e.attr("schemaLocation").value();
    const newFileName = os.tmpdir()+"/"+schemaLocation.replace(/[^a-z_.]/ig,'_')
    e.attr({schemaLocation:"file://"+newFileName})
    fs.writeFileSync(newFileName,(await axios.get(schemaLocation)).data)
  }
  if (!doc.validate(schemaDoc)) throw Error(doc.validationErrors.map(e=>(e.message + " at " + e.line + ":" + e.column)).join())
}
const file = "0001.xml"
validateFile(file)


Woodcraft answered 4/5, 2021 at 18:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.