Statistics about "Microformat vs HTML+RDFa" adoption
Asked Answered
H

2

7

Are there some recent and reliable statistics about "Web use" (webpages using one standard or another) of these standards?

Or an specific statistic about vCard (person and/or organization) scope of use?

Only statistics, this question is not about "what the best ideia?" or "how to use it?". Looking for statistics numbers to compare Microformats adoption with (any kind of) RDFa in HTML adoption.

We can considere, for "counting pages" statistics, that Microdata is a kind of RDFa-HTML.


NOTES

Explain context

The RDFa Lite is the only W3C recommendation, when we talk about "Microdata vs Microformat", and Microdata have a better map to RDFa Lite. HTML5 has become a W3C Recommendation in 2014-10-28, and neither one was blessed by W3C. I understand that schema.org is the best way to adopt (reuse community-schemas) RDFa.

By other hand Microformats is older, and the most simple; so, perhaps, the most used in the Web (!? is it?).

About "vCard data statistics"

If we need some scope for the statistics, let's use vCard as scope:

  • Microformat's hCard and h-Card are standards for display vCards on (any) HTML, and was used for people and organizations.

  • schema.org's Person and Organization encodes vCard information with (standard) RDFa Lite or Microdata.

Other notes

Wikipedia express an old (2012's) and not-confirmable assertion (no source!), "Microformats such as hCard, however, continue to be published more than schema and others on the web", and Webdatacommons is a mess, no statistical report.

(edit) now Wikipedia's citation error is fixed.


(edit after @sashoalm comment) Note for those who disagree that this question is valid.

This question is a software problem, not a "request for off-site resource"...

PROBLEM: to decide what library, framework, data-model, etc. in a project, we need to use tools that are in use today and in the next few years... To make project decisions in a software development, we need statistics about user tendency, framework adoption, etc.

PS: here in Stackoverflow there are a lot of discussions about language statistics, that is the same "set of problems". Example: 1, 2, 3,4, 5, 6. See also the questions tagged with [usage-statistics].

Hopi answered 19/2, 2015 at 14:53 Comment(3)
Doesn't that fall under "Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it."?Carlina
@Carlina I edited, please check and discuss here before delete.Hopi
Don't worry, I can't delete the question all by myself.Carlina
H
4

Now I see, there are some statistics (!!), the link of Wikipedia was lost... I corrected. It isn't updated, is from "Winter 2013" (~1.5 or 2 years old collected data), but show reality and tendencies.

http://webdatacommons.org/structureddata/index.html#toc2

This is the chart at the report (with RDFa+HTML dominance!):

enter image description here

enter image description here

Interpreting:

  • the section 5, "Extraction Process", say that "on each page, we run our RDF extractor based on the Anything To Triples (Any23) library", so all (RDF and Microformat) resulted in "triples" (not only RDF).

  • The ideia for "per domain" statistics is that domains use uniform politics for all pages... But I think this uniformity is false, only few pages per domain adopt "semantic markup" ... It is not more unbiased than URLs, is only another picture. Anyway, the outcome was dead heat, ~57% vs 43%.

  • Only 21% of the "semantic markup URLs" of 2013 was Microformat, all other are RDFa-HTML (Microdata is also a kind of RDFa).

  • using the average of percentuals of Domains (Ds) and URLs (Us), (Ds+Us)/2, the outcome is ~60% for RDFs and ~40% for Microformats.

  • before 2013 there was a dominance of Microformats, so, is evident the big growing of "RDFa-HTML" since 2011... The tendency is clear.

  • If we adopt the arithmetic mean of "per domain" and "per URL" countings, we have Microformats and RDFa-HTML near each other, with but with little less Microformat (and the strong tendency to RDFa-HTML grow in 2014).

Here a table for @sashoalm discussion, showing the percentuals and totals

enter image description here


NOTE1: HTML5 was released only 2014-10-28, so only ~2015-10 we will can check the real (definitive) impact of the new standard on the Web. An important expected impact is that Microdata not was blessed by HTML5, so the only standard is HTML+RDFa (that recommends RDFa Lite)... In the future perhaps there will less Microdata and more schema.org.

NOTE2: methodological problem of counting web-pages, of boilerplate text with some huge-cloned "semantic markup": I think that the "next generation" of statiscs can use some "per domain analisys" to make URL substatistics (sampling) of diversity (of semantically marked pages). Ideal is to weigh (p. ex. count once the non-clones and use 1+SQRT(count) of clones) the boilerplate.

Conclusion

Today perhaps some people use Microformat, but there are more pages in the Web using RDFa-HTML (Microdata, RDFa, RDFa Lite, etc.), and the tendency is to grow.

If your project is for next years, the statistics say to use RDFa.


NOTE

Another insteresting counting for RDFa is not the use, but the reuse of vocabularies (!). See Linked Open Vocabularies (LOV)

LOV

Hopi answered 19/2, 2015 at 14:53 Comment(9)
interesting. if you look at this results per format as of 2013-11, microformats are killing the competition. webdatacommons.org/structureddata/2013-11/stats/stats.htmlRepast
disagree about the repetition of semantics in a site. odds are if they're even being used, someone cares and is repeating them over. another obvious example would be an hcard in a footer, that is on every document on a site. but this is a great talk. not sure if comments or answers are the right way to go back and forth here, but i like itRepast
hum... About "microformats killing the competition", what you see that I not see? Please check if you was mistaken Microdata with Microformat (Microdata is RDFa). The charts on your link are the charts that I showing here... And the first chart here is "URLs with Triples", it shows only ~25% of Microformats.Hopi
i was looking at results per format, and saw the differences in number of domains column.Repast
@albert, (ok edited with a table) About formats, you must use sum "html-rdfa"+"html-microdata"... Even in the domain column (but see my comments against it (!), it not a "killing" result, it is only 57% vs 43%. In the URLs column the RDFa wins with 79% (!).Hopi
you should blog about this my friend. and accept your own answerRepast
also compare 2013 to 2012 - i stopped once i saw how much microdata had jumped webdatacommons.org/structureddata/2012-08/stats/stats.htmlRepast
and i think this slideshare says exactly what you are saying too: slideshare.net/RobertMeusel/web-data-commonsRepast
@albert, please correct and complement my answer (and my English), now it is a Wiki (you can edit!)... you can add your good findings. I need your review also at Wikipedia's RDFa/statistics article, RDFa/variants... where I am working apparently alone.Hopi
D
2

The last statistics from the WebDataCommons as follows:

Source: http://webdatacommons.org/structureddata/2016-10/stats/stats.html

Number of domain parsed: 34 million pay-level-domains
Number of domains with RDFa, Microdata and Microformats: 5.63 million (16.5%)

Popularity of different formats: enter image description here

Deploy answered 13/5, 2017 at 21:49 Comment(1)
Hi Intendia, thanks! See also this answer showing JSON-LD vs markup semantic, we can sum RDFa+Microdata+microformat as "makup semantic", so there are 73% markup and 27% JSON-LD in the universe of domains expressing semantics (7,75 million).Hopi

© 2022 - 2024 — McMap. All rights reserved.