YQL Losing HTML Element Attributes?
Asked Answered
V

1

5

YQL Console Link

Query:

select * from html where url='http://www.cbs.com/shows/big_brother/video/' and xpath='//div[@id="cbs-video-metadata-wrapper"]/div[@class="cbs-video-share"]/a'

Returns:

<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng"
    yahoo:count="1" yahoo:created="2011-07-09T23:14:02Z" yahoo:lang="en-US">
    <diagnostics>
        <publiclyCallable>true</publiclyCallable>
        <url execution-time="146" proxy="DEFAULT"><![CDATA[http://www.cbs.com/shows/big_brother/video/]]></url>
        <user-time>163</user-time>
        <service-time>146</service-time>
        <build-version>19262</build-version>
    </diagnostics> 
    <results>
        <a class="twitter-share-button" href="http://twitter.com/share"/>
    </results>
</query>

Should Return Something Similar To:

    <results>
        <a href="http://twitter.com/share" data-url="http://www.cbs.com/shows/big_brother/video/2045825951/big-brother-episode-1" class="twitter-share-button"></a>
    </results>

If I back out the query one level, it totally strips out the element, which I could also use to get the data I need.

Violate answered 9/7, 2011 at 23:42 Comment(0)
H
8

We have a new html parser that recognizes custom attributes now.

Add compat="html5" to trigger the new parser.

e.g.:

select * from html where url = "http://mydomain.com" and compat="html5"
Hardesty answered 8/11, 2012 at 22:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.