I'm trying to add HTML tags between words inside a string (wrap words by html tags i.e. HTML annotations). The positions where the HTML tags should be written are delimited by an array of offsets, for example:
//array(Start offset, End offset) in characters
//Note that annotation starts in the Start offset number and ends before the End offset number
$annotationCharactersPositions= array(
0=>array(0,3),
1=>array(2,6),
2=>array(8,10)
);
So to annotate the following HTML text ($source) with the following HTML tag ($tag). That is wrapped the characters delimited by the $annotationPositions array (without taking into account the HTML tags of source).
$source="<div>This is</div> only a test for stackoverflow";
$tag="<span class='annotation n-$cont'>";
the result should be the following (https://jsfiddle.net/cotg2pn1/):
charPos =--------------------------------- 01---------------------------- 2-------------------------------------------3------------------------------------------45-------67-----------------------------89-------10,11,12,13......
$output = "<div><span class='annotation n-1'>Th<span class='annotation n-2'>i</span></span><span class='annotation n-2'>s</span><span class='annotation n-2'> i</span>s</div> <span class='annotation n-3'>on</span>ly a test for stackoverflow"
How can I program the next function:
$cont=0;
$myAnnotationClass="placesOfTheWorld";
for ($annotationCharactersPositions as $position) {
$tag="<span class='annotation $myAnnotationClass'>";
$source=addHTMLtoString($source,$tag,$position);
$cont++;
}
taking into account that the HTML tags of the input string must not be taken into account when counting the characters described in the $annotationCharactersPositions array and each insertion of an annotation (i.e $tag) in the $source text must be taken into account for the encapsulation/annotation of the following annotations.
The idea of this whole process is that given a input text (that may or may not contain HTML tags) a group of characters would be annotated (belonging to one or several words) so that the result would have the selected characters (through an array that defines where each annotation begins and ends) wrapped by HTML tag that can vary (a, span, mark) with a variable number of html attributes (name, class, id, data-*). In addition the result must be a well-formed valid HTML document so that if any annotation is between several annotations, the html should be writing in the output accordingly.
Do you know any library or solution to do this? Maybe PHP DOMDocument functionalities can be useful?¿but how to apply the offsets to the php DomDocument functions? Any idea or help is well received.
Note 1: The input text are UTF-8 raw text with any type of HTML entities embebed (0-n).
Note 2: The input tag could be any HTML tag with variable number of attributes (0-n).
Note 3:The initial position must be inclusive and the final position must be exclusive. i.e. 1º annotation starts before the 2nd character (including the 2 character 'i') and ends before de 6th character (excluding the 6 character 's')
1
th span (n-2
) begins before the 2nd character, but the example$annotationCharactersPositions
has1=>array(3,6)
. Also consider explaining the motivation for this whole process a little more clearly; it seems likely that someone will suggest a completely different approach that may work better in the long run. – Insurrection