How to use multiple xpath selectors in a YQL query
Asked Answered
L

3

5

Hey, I'd like to scrape some data from my blog using YQL:

SELECT * FROM html WHERE url="http://site.com/blog" AND xpath="//div[@class='post']"

How can I use different bits of xpath in my query? E.g. can I do something like:

SELECT * FROM html WHERE url="http://site.com/blog" AND xpath="//div[@class='post']" AND xpath ="//div[@class='title']"

assuming I want to get the post and the title? I guess I could take in all the HTML but I'd rather only take what I need as speed is an issue here.

Once I have the HTML I want to extract the text from the markup, is it OK to use PHP regular expressions for this?

I also understand you can use CSS syntax, if you have experience using this with YQL and could guide me in how I could write a similar query to the one above but in CSS rather than XPATH I'd be grateful!

Thanks.

Leschen answered 13/10, 2010 at 15:46 Comment(0)
G
11

Regarding CSS:

See the YQL website itself for this. Search google for YQL and CSS (I can only post one link in here and the 2nd one is more useful.)

The example they have there is actually no longer working but you can try out this example, which scrapes the questions from the frontpage of stackoverflow.

YQL example

Multiple Selects with one XPATH:

You CAN do this directly with xpath syntax. e.g.

SELECT * FROM html WHERE url="www.asscompact.de" and xpath="//head/meta[@name='title']|//head/meta[@name='description']|//head/meta[@name='keywords']"
Graticule answered 25/10, 2010 at 16:18 Comment(3)
Thanks, wasn't sure about the syntax but that's cleared it up.Leschen
Upvoted .. I figured this out myself but wanted to know if I can give a space or something between the result of two xPaths, so that later I could parse the result and get two different values.Florist
Any idea how to fecth image and meta description from amazon.in/Seiko-Premier-Analog-Blue-Watch/dp/… ?Etsukoetta
I
0

You can also write Multiple XPATH Selects like this:

SELECT * FROM html WHERE url="www.asscompact.de" and xpath="//head/meta[@name='title' or @name='description']"
Iodide answered 18/7, 2016 at 12:33 Comment(0)
T
-3

It is not possible. You need to execute this query twice. The first time for the first xpath and the second time for the second xpath. Of course you can write your own open table declaration and provide support for this kind of queries.

Township answered 13/10, 2010 at 16:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.