We can solve the issue of finding and of reporting self-closing elements with parse5. It has a SAXParser class that should be quite robust (parse5 conforms to html5 standards). The parser raises an event, upon finding a start-tags, that contains a boolean as to whether the found tag self closes.
var parser = new SAXParser();
parser.on("startTag", (name, attrs, selfClosing)=>{
if(selfClosing){
//check if name is void, if not report error
}
});
parser.push(html);
To make use of this functionality I have set up a project that can be used in order to help sanitize html using the above approach. The developed lint tool is able to run a selection of Rules, collect any errors and return them as a Promise. This can then be reported to the user.
Vanilla Html/Templates
template-lint forms the base of the tool-set. It comprises of the Linter, and a couple of basic rules:
- SelfClose - ensure non-void elements do not self-close
- Parser - returns errors for unclosed or ill-matched elements, as captured during parsing
gulp-template-lint is the gulp wrapper for template-lint and can be used like so:
var gulp = require('gulp');
var linter = require('gulp-template-lint');
gulp.task('build-html', function () {
return gulp.src(['source/**/*.html'])
.pipe(linter())
.pipe(gulp.dest('output'));
});
Example
Given the following html:
<template>
<custom-element/>
<svg>
<rect/>
</svg>
<div>
<div>
</div>
</template>
produces:
Note: the self-closed <rect/>
does not produce an error. svg elements contains xml and Rules can differentiate based on scope.
Aurelia Templates
I initially made aurelia-template-lint, but decided to split out the reusable (outside of aurelia) components into template-lint. While both are currently separate, I will have the aurelia-template-lint extend upon template-lint in due course. Currently has a few proof-of-concept rules:
- SelfClose - ensure non-void elements do not self-close
- Parser - returns errors for unclosed or ill-matched elements, as captured during parsing
- Template - ensure root is a template element, and no more than one template element present
- RouterView - don't allow router-view element to contain content elements
- Require - ensure require elments have a 'from' attribute
there is a gulp wrapper that can be installed via:
npm install gulp-aurelia-template-lint
and used in a gulp build:
var linter = require('gulp-aurelia-template-lint');
gulp.task('lint-template-html', function () {
return gulp.src('**/*.html')
.pipe(linter())
.pipe(gulp.dest('output'));
});
this will use the default set of rules.
Example
a simple test with the following ill-formed aurelia template:
<link/>
<template bindable="items">
<require from="foo"/>
<require frm="foo"/>
<br/>
<div></div>
<router-view>
<div/>
</router-view>
</template>
<template>
</template>
outputs:
Improvements
there are lots of improvements needed; for instance there are a few ways to define vanilla templates without the <template>
tag. There are also quite a few specific attributes introduced by Aurelia that could be sanitised.
<br/>
, while in html5 you are allowed to do<br>
. That custom rules in htmlhint might be the way forward. – Spur