HtmlPurifier - allow data attribute
Asked Answered
D

2

11

I'm trying to allow some data-attribute with htmlPurifier for all my span but no way...

I have this string:

<p>
    <span data-time-start="1" data-time-end="5" id="5">
       <word class="word">My</word>
       <word class="word">Name</word>
    </span>
    <span data-time-start="6" data-time-end="15" id="88">
       <word class="word">Is</word>
       <word class="word">Zooboo</word>
    </span>
<p>

My htmlpurifier config:

$this->HTMLpurifierConfigInverseTransform = \HTMLPurifier_Config::createDefault();
$this->HTMLpurifierConfigInverseTransform->set('HTML.Allowed', 'span,u,strong,em');
$this->HTMLpurifierConfigInverseTransform->set('HTML.ForbiddenElements', 'word,p');
$this->HTMLpurifierConfigInverseTransform->set('CSS.AllowedProperties', 'font-weight, font-style, text-decoration');
$this->HTMLpurifierConfigInverseTransform->set('AutoFormat.RemoveEmpty', true);

I purify my $value like this:

$purifier = new \HTMLPurifier($this->HTMLpurifierConfigInverseTransform);
var_dump($purifier->purify($value));die;

And get this :

<span>My Name</span><span>Is Zoobo</span>

But how to conserve my data attributes id, data-time-start, data-time-end in my span ?

I need to have this :

<span data-time-start="1" data-time-end="5" id="5">My Name</span data-time-start="6" data-time-end="15" id="88"><span>Is Zoobo</span>

I tried to test with this config:

$this->HTMLpurifierConfigInverseTransform->set('HTML.Allowed', 'span[data-time-start],u,strong,em');

but error message :

User Warning: Attribute 'data-time-start' in element 'span' not supported (for information on implementing this, see the support forums)

Thanks for your help !!

EDIT 1

I tried to allow ID in the firdt time with this code line:

$this->HTMLpurifierConfigInverseTransform->set('Attr.EnableID', true);

It doesn't work for me ...

EDIT 2

For data-* attributes, I add this line but nothing happened too...

$def = $this->HTMLpurifierConfigInverseTransform->getHTMLDefinition(true);
$def->addAttribute('sub', 'data-time-start', 'CDATA');
$def->addAttribute('sub', 'data-time-end', 'CDATA');
Despumate answered 27/3, 2015 at 16:20 Comment(2)
See: #17084348Travistravus
See also this post: #17084348Travistravus
F
16

HTML Purifier is aware of the structure of HTML and uses this knowledge as basis of its white-listing process. If you add a standard attribute to a whitelist, it doesn't allow arbitrary content for that attribute - it understands the attribute and will still reject content that makes no sense.

For example, if you had an attribute somewhere that took numeric values, HTML Purifier would still deny HTML that tried to enter the value 'foo' for that attribute.

If you add custom attributes, just adding it to the whitelist does not teach HTML Purifier how to handle the attributes: What data can it expect in those attributes? What data is malicious?

There's extensive documentation how you can tell HTML Purifier about the structure of your custom attributes here: Customize

There's a code example for the 'target' attribute of the <a>-tag:

$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.DefinitionID', 'enduser-customize.html tutorial');
$config->set('HTML.DefinitionRev', 1);
$config->set('Cache.DefinitionImpl', null); // remove this later!
$def = $config->getHTMLDefinition(true);
$def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');

That would add target as a field that accepts only the values "_blank", "_self", "_target" and "_top". That's a bit stricter than the actual HTML definition, but for most purposes entirely sufficient.

That's the general approach you will need to take for data-time-start and data-time-end. For possible configuration, check out the official HTML Purifier documentation (as linked above). My best guess from your example is that you don't want Enum#... but Number, like this...

$def->addAttribute('span', 'data-time-start', 'Number');
$def->addAttribute('span', 'data-time-end', 'Number');

...but check it out and see what suits your use-case best. (While you're implementing this, don't forget you also need to list the attributes in the whitelist as you're currently doing.)

For id, you should include Attr.EnableID = true as part of your configuration.

I hope that helps!

Funch answered 28/3, 2015 at 12:59 Comment(2)
Thanks for your complete answer ! I tried at the first time to enable ID but $this->HTMLpurifierConfigInverseTransform->set('Attr.EnableID', true); doesn't work... Then for allow special attributes, I will see that but It seems to be hard for me ... I'm a debutant...Despumate
@Zagloo: Did you make sure to give your definition an ID and a revision number (I dimly recall not doing so causes issues) and to disable the definition cache while you're working on it? I unfortunately have no idea why Attr.EnableID would not work for you, other than a version mismatch, but it's been part of HTML Purifier almost forever, so I don't think that's it. :(Funch
P
0

If anyone else lands here (like I did) for the id attribute not working, and more weirdly not working in all cases.

In version 4.8.0 Attr.ID.HTML5 was added and reflects the usage of relaxed format introduced for HTML5.

For example, numeric values were not allowed, as well as values that start with a number. The following examples are all valid in HTML5, but only the first three are valid for pre-HTML5 (the default behaviour of the purifier):

  1. foo (both)
  2. foo-bar (both)
  3. foo-10 (both)
  4. 10 (HTML5 only)
  5. 10-foo (HTML5 only)
Posen answered 5/1, 2022 at 16:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.