Select "Text" node using querySelector
Asked Answered
R

3

19

I'm writing a parser that should extract "Extract This Text" from the following html:

<div class="a">
    <h1>some random text</h1>
    <div class="clear"></div>
    Extract This Text
    <p></p>
    <h2></h2>
</div>

I've tried to use:

document.querySelector('div.a > :nth-child(3)');

And even by using next sibling:

document.querySelector('div.a > :nth-child(2) + *');

But they both skips it and returns only the "p" element.

The only solution I see here is selecting the previous node and then using nextSibling to access it.

Can querySelector select text nodes at all?
Text node: https://developer.mozilla.org/en-US/docs/Web/API/Text

Renitarenitent answered 21/2, 2019 at 14:38 Comment(2)
My workaround is to use the querySelector to select the element and then extract the #text node with Array.from(element.childNodes).find(node => node.nodeName === '#text')Zaidazailer
In this case, the Text node is the 3rd ChildNode, so you can access its text this way: element.childNodes[2].textContentPeloquin
B
20

As already answered, CSS does not provide text node selectors and thus document.querySelector doesn't.

However, JavaScript does provide an XPath-parser by the method document.evaluate which features many more selectors, axises and operators, e.g. text nodes as well.

let result = document.evaluate(
  '//div[@class="a"]/div[@class="clear"]/following-sibling::text()[1]',
  document,
  null,
  XPathResult.STRING_TYPE
).stringValue;

console.log(result.trim());
<body>
  <div class="a">
    <h1>some random text</h1>
    <div class="clear"></div>
    Extract This Text
    <p></p>
    But Not This Text
    <h2></h2>
  </div>
</body>

// means any number of ancestor nodes.
/html/body/div[@class="a"] would address the node absolutely.

It should be mentioned that CSS queries work much more performant than the very powerful XPath evaluation. Therefore, avoid the excessive usage of document.evaluate when document.querySelectorAll works as well. Reserve it for the cases where you really need to parse the DOM by complex expressions.

Blaine answered 3/5, 2020 at 18:29 Comment(4)
Amazing! This is exactly what I should have been using from the start. Thanks! MDN docs for Document.evaluate()Renitarenitent
@icl7126 Thank you! I've added a performance notice. You should decide from case to case which method to use.Blaine
would this be more performant than recursing into an entire DOM structure to find all the Text nodes it contained?Tatia
@Tatia I guess so, since it is a builtin. However, I never have done a performance test.Blaine
I
4

Not directly, no. But you can access it from its parent:

const parent = document.querySelector('div.a')

const textNodes = [...parent.childNodes] // has childNodes inside, including text ones
  .filter(child => child.nodeType === 3) // get only text nodes
  .filter(child => child.textContent.trim()) // eliminate empty text
  .map(textNode => textNode.textContent) // extract text content

console.log(textNodes[0])
// "Extract This Text"

// make it a function
const extractText = (DOMElement) => [...DOMElement.childNodes] // has childNodes inside, including text ones
  .filter(child => child.nodeType === 3) // get only text nodes
  .filter(child => child.textContent.trim()) // eliminate empty text
  .map(textNode => textNode.textContent) // extract text content

console.log(extractText(document.querySelector('div.a'))[0])
// "Extract This Text"
}
Impartible answered 2/4, 2022 at 5:53 Comment(0)
N
3

It can't, though my answer isn't that authoritative. ( You may have figure it out)

You can check out this select text node with CSS or Is there a CSS selector for text nodes.

Some verbose explaination(maybe useless, English is not my first language, sorry for some misusing of words or grammar.):

I was learning about ParentNode and since the querySelectorAll() method returning a NodeList, I was wondering if it could select text node. I tried but failed; googled and found this post.

Argument in querySelectorAll(selectors) or querySelector(selectors) is a DOMString containing one or more CSS selectors (of course no containing pseudo-element, otherwise the method would return null) which only apply to elements (not plain text).

News answered 13/12, 2019 at 2:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.