xquery word wrap
Asked Answered
I

4

6

I need to word wrap long string to 100 characters using XQuery. In other words, to change some spaces to newlines so that the non-patological lines will be shorter than 100 characters. Is there an easy way?

Interpretive answered 22/9, 2010 at 8:18 Comment(2)
Good question (+1). See my answer for an alternative, XSLT 1.0 solution, which probably could be translated into XQuery -- especially in XQuery 3.0 where there would be higher-order functions and folds may be part of the standard function library. Of course, @Oliver-Hallam 's solution is a good one and is immediately usable. It can be touched a little bit to produce better looking results, as I mention in a comment to his answer.Imprecation
Could you define "non-patological"? I wasn't able to find a definition online.Viquelia
E
9

I think you just need this XPath 2.0 expression:

replace(concat(normalize-space(text),' '),'(.{0,60}) ','$1
')

With Dimitre's input, output:

Dec. 13 — As always for a presidential inaugural, security
and surveillance were extremely tight in Washington, DC,
last January. But as George W. Bush prepared to take the
oath of office, security planners installed an extra layer
of protection: a prototype software system to detect a
biological attack. The U.S. Department of Defense, together
with regional health and emergency-planning agencies,
distributed a special patient-query sheet to military
clinics, civilian hospitals and even aid stations along the
parade route and at the inaugural balls. Software quickly
analyzed complaints of seven key symptoms — from rashes to
sore throats — for patterns that might indicate the early
stages of a bio-attack. There was a brief scare: the system
noticed a surge in flulike symptoms at military clinics.
Thankfully, tests confirmed it was just that — the flu.
Eighth answered 8/10, 2010 at 17:32 Comment(0)
L
1

If you have a line that is longer than 100 characters, you want to replace the last set of consecutive spaces in the first 101 characters with a newline. If there is no space in the first 101 characters, then you want to just replace the first space in the line You can then apply the same logic recursively on the remaining lines in the string

This can be done as so:

declare function local:wrap-line($str)
{
  if (string-length($str) < 100 or not(contains($str," ")))
  then $str
  else 
    let $wrapped := if (contains(substring($str,1,101)," "))
                    then replace($str, "^(.{0,99}[^ ]) +", "$1&#10;")
                    else replace($str, "^(.*?) ", "$1&#10;")
    return concat(substring-before($wrapped, "&#10;"), "&#10;",
                  local:wrap-line(substring-after($wrapped, "&#10;")))
};

declare function local:word-wrap($str)
{
  string-join(
    for $line in tokenize($str,"&#10;")
    return local:wrap-line($line),
    "&#10;")
};

This works, since (.{0,99}[^ ]) + matches the longest possible series of 100 characters not ending in a space followed by a number of spaces, and (.*?) matches the shortest possible series of characters ending in a space.

The code is hardly optimal and will need improving for degenerate cases (for example a line of text ending in several spaces, but it does work.

Left answered 22/9, 2010 at 9:35 Comment(1)
@Oliver-Hallam: this is a good solution, but, depending on the data, it may create a series consisting of a long line, followed by a very short second line. Not exactly aesthetic, is it? To achieve a good splitting into lines it would be better to first replace each NL with a space -- in this case the result will consist of nearly equal in length -- around 100 chars-lines and only the last line will be shorter.Imprecation
I
1

I don't know XQuery well-enough to express this XSLT solution in XQuery, but I think just providing it may be helpful.

Do note, that in the typical realworld cases words are delimited by more than one delimiter. The following solution treats as delimiters all character specified in a parameter and splits every line at a maximum-length boundary. It is part of the FXSL library of functions/templates for XSLT 1.0 or 2.0.

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:f="http://fxsl.sf.net/"
xmlns:ext="http://exslt.org/common"
xmlns:str-split2lines-func="f:str-split2lines-func"
exclude-result-prefixes="xsl f ext str-split2lines-func"
>


   <xsl:import href="dvc-str-foldl.xsl"/>

   <str-split2lines-func:str-split2lines-func/>

   <xsl:output indent="yes" omit-xml-declaration="yes"/>

    <xsl:template match="/">
      <xsl:call-template name="str-split-to-lines">
        <xsl:with-param name="pStr" select="/*"/>
        <xsl:with-param name="pLineLength" select="60"/>
        <xsl:with-param name="pDelimiters" select="' &#9;&#10;&#13;'"/>
      </xsl:call-template>
    </xsl:template>

    <xsl:template name="str-split-to-lines">
      <xsl:param name="pStr"/>
      <xsl:param name="pLineLength" select="60"/>
      <xsl:param name="pDelimiters" select="' &#9;&#10;&#13;'"/>

      <xsl:variable name="vsplit2linesFun"
                    select="document('')/*/str-split2lines-func:*[1]"/>

      <xsl:variable name="vrtfParams">
       <delimiters><xsl:value-of select="$pDelimiters"/></delimiters>
       <lineLength><xsl:copy-of select="$pLineLength"/></lineLength>
      </xsl:variable>

      <xsl:variable name="vResult">
          <xsl:call-template name="dvc-str-foldl">
            <xsl:with-param name="pFunc" select="$vsplit2linesFun"/>
            <xsl:with-param name="pStr" select="$pStr"/>
            <xsl:with-param name="pA0" select="ext:node-set($vrtfParams)"/>
          </xsl:call-template>
      </xsl:variable>
      <xsl:for-each select="ext:node-set($vResult)/line">
        <xsl:for-each select="word">
          <xsl:value-of select="concat(., ' ')"/>
        </xsl:for-each>
        <xsl:value-of select="'&#xA;'"/>
      </xsl:for-each>
    </xsl:template>

    <xsl:template match="str-split2lines-func:*" mode="f:FXSL">
      <xsl:param name="arg1" select="/.."/>
      <xsl:param name="arg2"/>

      <xsl:copy-of select="$arg1/*[position() &lt; 3]"/>
      <xsl:copy-of select="$arg1/line[position() != last()]"/>

      <xsl:choose>
        <xsl:when test="contains($arg1/*[1], $arg2)">
          <xsl:if test="string($arg1/word) or string($arg1/line/word)">
             <xsl:call-template name="fillLine">
               <xsl:with-param name="pLine" select="$arg1/line[last()]"/>
               <xsl:with-param name="pWord" select="$arg1/word"/>
               <xsl:with-param name="pLineLength" select="$arg1/*[2]"/>
             </xsl:call-template>
          </xsl:if>
        </xsl:when>
        <xsl:otherwise>
          <xsl:copy-of select="$arg1/line[last()]"/>
          <word><xsl:value-of select="concat($arg1/word, $arg2)"/></word>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:template>

      <!-- Test if the new word fits into the last line -->
    <xsl:template name="fillLine">
      <xsl:param name="pLine" select="/.."/>
      <xsl:param name="pWord" select="/.."/>
      <xsl:param name="pLineLength" />

      <xsl:variable name="vnWordsInLine" select="count($pLine/word)"/>
      <xsl:variable name="vLineLength" 
       select="string-length($pLine) + $vnWordsInLine"/>
      <xsl:choose>
        <xsl:when test="not($vLineLength + string-length($pWord) 
                           > 
                            $pLineLength)">
          <line>
            <xsl:copy-of select="$pLine/*"/>
            <xsl:copy-of select="$pWord"/>
          </line>
        </xsl:when>
        <xsl:otherwise>
          <xsl:copy-of select="$pLine"/>
          <line>
            <xsl:copy-of select="$pWord"/>
          </line>
          <word/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:template>

</xsl:stylesheet>

when this transfrmation is applied on the following XML document:

<text>
Dec. 13 — As always for a presidential inaugural, security and surveillance were
extremely tight in Washington, DC, last January. But as George W. Bush prepared to
take the oath of office, security planners installed an extra layer of protection: a
prototype software system to detect a biological attack. The U.S. Department of
Defense, together with regional health and emergency-planning agencies, distributed
a special patient-query sheet to military clinics, civilian hospitals and even aid
stations along the parade route and at the inaugural balls. Software quickly
analyzed complaints of seven key symptoms — from rashes to sore throats — for
patterns that might indicate the early stages of a bio-attack. There was a brief
scare: the system noticed a surge in flulike symptoms at military clinics.
Thankfully, tests confirmed it was just that — the flu.
</text>

the wanted result is produced (lines not longer than 60 characters):

Dec. 13 — As always for a presidential inaugural, security 
and surveillance were extremely tight in Washington, DC, 
last January. But as George W. Bush prepared to take the 
oath of office, security planners installed an extra layer 
of protection: a prototype software system to detect a 
biological attack. The U.S. Department of Defense, together 
with regional health and emergency-planning agencies, 
distributed a special patient-query sheet to military 
clinics, civilian hospitals and even aid stations along the 
parade route and at the inaugural balls. Software quickly 
analyzed complaints of seven key symptoms — from rashes to 
sore throats — for patterns that might indicate the early 
stages of a bio-attack. There was a brief scare: the system 
noticed a surge in flulike symptoms at military clinics. 
Thankfully, tests confirmed it was just that — the flu. 
Imprecation answered 22/9, 2010 at 13:7 Comment(0)
G
0

Try with below XQuery function: This can word wrap any text.

Parameters to pass: 1. $Text - Text to word wrap 2. 50 -Word wrap string length 3. " " - After word wrap - new line character 4. 0 - Starting position

declare function xf:wrap-string($str as xs:string,
                                $wrap-col as xs:integer,
                                $break-mark as xs:string,
                                $pos as xs:integer)
    as xs:string {
       if(fn:contains( $str, ' ' )) then 
        let $first-word := fn:substring-before( $str, ' ' )
        let $pos-now := xs:integer($pos + 1 + string-length($first-word))
        return
            if ($pos>0 and $pos-now>=$wrap-col) then
                 concat($break-mark,
                 xf:wrap-string($str,$wrap-col,$break-mark,xs:integer(0)))
            else
                concat($first-word,' ',
                 xf:wrap-string(substring-after( $str, ' ' ),
                               $wrap-col,
                               $break-mark,
                               $pos-now))
       else(
        if ($pos+string-length($str)>$wrap-col) then
            concat($break-mark,$str)
        else ($str)
       )
};

    declare function xf:wrap-test($str as xs:string,
                                    $wrap-col as xs:integer,
                                    $break-mark as xs:string,
                                    $pos as xs:integer)
     as xs:string {
            let $Result := xf:wrap-string($str,$wrap-col,$break-mark,$pos)
            return
            if (fn:contains($Result,'&#10;')) then
                fn:string-join(for $line in tokenize($Result, '&#10;')
                                return
                                if (fn:string-length($line)<50) then
                                    xf:pad-string-to-length(string($line),50)
                                else ($line),
                '')
            else $Result 
     }; 

    xf:wrap-test($Text,50,"&#10;",0)

Parameters to pass: 
1. $Text - Text to word wrap
2. 50 -Word wrap string length
3. "&#10;" - After word wrap - new line character
4. 0 - Starting position
Greenback answered 12/2, 2018 at 10:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.