Microdata, RDFa or JSON-LD Appropriate or best usage?
Asked Answered
O

6

67

I have been wondering which of those formats is "best"? Schema.org, Microdata, and RDFa are bit of a pain to implement. They can break validation and require quite an effort to put into documents.

JSON-LD is, at last for me, a way better to implement structured data. But does it work? What level of support is there for it (at least by Google)?

Opacity answered 13/11, 2014 at 10:15 Comment(1)
Just to Update: Google now displays on their pages json-ld style microdata. Even data testing tool displays them as examples. Seems they started to prefer them at the time being.Opacity
C
53

Schema.org is a vocabulary that can, like any other vocabulary, be used in many forms. The website http://schema.org/ has examples using Microdata and the RDF syntaxes RDFa and JSON-LD, but these are not the only syntaxes it can be used with. You could, for example, use it with any other RDF syntax like Turtle or RDF/XML.

There is no best syntax. They all have advantages and disadvantages. See for example my answer about differences between Microdata and RDFa. Note that you can use different syntaxes (and vocabularies) in the same document.

Now, if you have a specific consumer in mind, you should consult their documentation. However, support of syntaxes comes and goes, and not everything they might support is necessarily documented, and not everything that is documented necessarily works.

In case of Google, you are probably interested in their Rich Snippets. Their documentation about Rich Snippets mentions Microdata, Microformats and RDFa. However, note that not all linked examples use the Schema.org vocabulary, but the older Data-vocabulary.org or Microformats (as you can’t use vocabularies like Schema.org nor Data-vocabulary.org with Microformats). And there are also some Rich Snippets that aren’t listed on that page, like the Sitelinks Search Box, for which they even recommend the JSON-LD syntax.

As general advice: Search engines typically favor visible content over hidden metadata. For example, having keywords as hidden metadata easily allows authors to claim that their documents are about something different than they really are (either because of trying to trick the search engine, or because authors forget to update content in both places). Therefore, uncoupling the metadata from the content, like it’s the case with JSON-LD, could (possibly!) lead to the same issues current search engines have with hidden metadata. (If or which search engines actually handle it like that is a question which is off-topic on Stack Overflow.)

Another possible advantage for coupling the metadata with the content (for example, with RDFa), is that you could easily and automatically generate the same information in JSON-LD, Turtle etc. because everything’s just RDF. Just parse the RDFa, convert to formats of your preference, and embed (in script) or link (with rel-alternate) it if it makes sense.

But yes, adding RDFa is often more complex than adding a JSON-LD blob, because you have to adapt it to the existing markup. (However, it should not "break validation" unless you’re making mistakes.)

Crimp answered 13/11, 2014 at 16:5 Comment(3)
About mistake, I had problem with schema.org/openingHours . As they use <time datetime=""> property. Which should be in ISO format to be valid.. But schema.org got own format, which is not compatible "Mo-Tu 11:00-22:00" for example. Anyway, very good answer. Thank you for your time. Did not know about difference between syntax and vocab. And Indeed. JSON-LD could lead to overuse like meta tags and descriptions got overused. But Microdata can also (by hidden content in CSS for example). And you can, i think, easier tell difference in content and JSON-LD than between content and contentOpacity
@Gacek: I reported the issue you mentioned last month; note that this is not an error with Microdata or Schema.org per se, it’s only their example that is wrong. You can (and should), of course, use the openingHours property with any other suitable element.Crimp
As of January 29, 2021, data-vocabulary.org markup will no longer be eligible for Google rich result features. To be eligible after January 29, 2021, you need to replace data-vocabulary.org markup with schema.org markup. developers.google.com/search/docs/advanced/structured-data/…Ekg
S
17

The lines between Microdata, RDFa, and JSON-LD are indeed currently very blurry and that there is still no widely accepted de facto among the three. This will have to wait for now. Perhaps a couple or more years.

Meanwhile, Microdata should not be labeled with Schema.org like you mentioned because those two are different things. Schema.org is a vocabulary so it can be used for Microdata, RDFa, and JSON-LD.

Using Schema.org as the vocabulary and using JSON-LD as the data representation is probably the most anticipated pair because of two common aspects about them:

  1. Easy to read for humans; and
  2. Lightweight machine-readable

but even so there are still disconnects between the two like this example.

Regarding the JSON-LD support, since Bing, Google, Yahoo!, and Yandex acknowledges the use of schema.org then perhaps it is safe to say they are also supporting it like in this example.

2017 Update

Google has been very pro-active in promoting JSON-LD-schema.org these past couple or three years.

Starflower answered 25/4, 2015 at 19:3 Comment(3)
Google does recommend JSON-LD representation. However, I see no mention of Bing, Yahoo!, Yandex support JSON-LD. While they do support schema.org vocabulary, historically they support microdata representation. CMIIW.Anecdotist
@HendyIrawan Truth. Bingbot [still] doesn't understand JSON-LD. Bing structured markup tester also does not register the data. This is prob the same for Yahoo. bing.com/webmaster/help/…Issuable
Bing now supports JSON-LD plus.google.com/106943062990152739506/posts/fEV3TyBhAr4Duma
D
10

It seems Google is leaning towards the use of JSON-LD but it hasn't implemented it for every use-case!

Google is in the process of adding JSON-LD support to more markup-powered features. So far, JSON-LD is supported for all Knowledge Graph features, sitelink search boxes, Event Rich Snippets, and Recipe Rich Snippets; Google recommends the use of JSON-LD for those features. For the remaining Rich Snippets types and breadcrumbs, Google recommends the use of microdata or RDFa.

http://developers.google.com/structured-data/schema-org

Donoho answered 6/10, 2015 at 10:23 Comment(2)
what's the current status? it seems like google is very slow in updating documentations, and I still see in docs that they are "in process" of updating to JSON-LD.Golightly
In the overwiew section there is a recommend label at JSON-LD.Iatrics
N
7

(updating answers!)

About "popularity", please see this question/answers.

Microdata today is the most popular: in a universe of 34 million of domains, 5.63 million (~17%) use "content markup" (I will use the jargon markup) by RDFa (0,9 million), Microdata (2.5 million) or Microformats, and less than half use separated semantic descriptors, noticing the most popular as JSON-LD, with 2.12 million (6%).
PS: we prefer "per-domain statistics" (instead per-page statistics) because pages in same domain in general have same templates and other local-authority convention enforcements.

In a universe of "domains expressing semantics" (7,75 million) the statistic profile is:

  • 73% markup semantic
  • 27% separated semantic
  • (... intersection as mix "separated+markup" can be zero to simplify...)

Rule of thumb in 2017

Use markup semantic with Microdata and, after it, if you need to express something more to machines, use JSON-LD.


Use markup semantic because it is the most popular, and because marked contented will be verificable/auditable simultaneously by humans and machines.

Important: remember that Microdata, RDFa (a W3C standard) and JSON-LD (a W3C standard) can be (easily) translated to RDF, so all these formats are compatible.


PS: for HTML tables see also W3C's tabular-metadata. For open non-HTML resources, as CSV files, use RDF-compatible W3C's tabular-data-model and/or frictionlessdata/specs.

Nina answered 14/5, 2017 at 6:22 Comment(2)
One more good representation of popularity could be just opening some popular sites with inspector and checking what format they use. From what I see StackOverflow uses Microdata (proptype attributes), Youtube also uses Microdata, booking.com uses both Microdata and json+ld, wikipedia uses json+ld, netflix uses json+ld...Abscission
@Klesun, as cited, see webdatacommons.org/structureddata/index.html#results-2022-1Nina
B
6

Google uses JSON-LD as reference examples for Structured Data SEO for their Knowledge Graph (companies and people). See https://developers.google.com/structured-data/customize/overview

I personally use a combination of JSON-LD and Microdata for my sites (for the time being).

I would say they have other means to identify if the information you provide through JSON-LD is relevant to their search engine (like checking your page is actually talking about what it claims to talk about).

Bibulous answered 5/5, 2015 at 14:49 Comment(2)
Microdata is deprecatedRequiem
What is your source?Bibulous
C
1

From scratch, JSON-LD would be the way to go. Let's let one of the primary creators of JSON-LD, Manu Sporny, weigh in:

The desire for better Web APIs is what motivated the creation of JSON-LD, not the Semantic Web. If you want to make the Semantic Web a reality, stop making the case for it and spend your time doing something more useful, like actually making machines smarter or helping people publish data in a way that’s useful to them.

JSON-LD is all about publishing the data in ways that are useful/easy to implement because...

it’s based on technology that most web developers use today.

Crumpler answered 29/1, 2018 at 14:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.