I don't suppose that the members of this list appreciate the epic
Microdata vs. RDFa battle leaking into this mailing list, but I want
to address a few inaccuracies below.
Introduction: I work for Opera Software and have been active in the
WHATWG and W3C HTML WG devloping HTML5 for the last year and a half. I
believe I have a good understanding of what browser vendors are likely
and not likely to support, although I don't speak for or make any
promises on behalf of Opera Software in this mail.
I have also worked on implementing the microdata DOM API in
JavaScript, an ongoing experiment at
http://gitorious.org/microdatajs
and will be able to answer any technical questions about the
processing of microdata. In short, I can only say that it is really
quite intuitive and simple, with few surprises. It maps well to the
RDF model if you want it, but doesn't force authors to think in terms
of subject, predicate, object triples.
On Sat, Jan 16, 2010 at 06:32, Manu Sporny <msporny(a)digitalbazaar.com> wrote:
Aryeh Gregor <Simetrical+wikilist <at>
gmail.com> writes:
[snip]
The compactness of the markup between Microdata and
RDFa is more or less
the same in this particular example. There are some things that are
easier to express in Microdata and there are some things that are easier
to express in RDFa. We get the following Microdata out:
type
http://n.whatwg.org/work
work
http://upload.wikimedia.org/...terrestrialglobe-1592-20061127.jpg
title "Emery Molyneux Terrestrial Globe"
author "Bob Smith"
license
http://creativecommons.org/licenses/by-sa/3.0/us/
So, we get more-or-less the same number of data items out, but there is
a problem. What does "title" mean in the semantic sense? Does it mean
"job title" or does it mean "work title"? The term "title"
in this case
is ambiguous.
No, as long as an item type is used (
http://n.whatwg.org/work) there
is no ambiguity. This particular item type is defined at
http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#…
Title here "Gives the name of the work." without ambiguity.
Furthermore, for this particular vocabulary the mapping to RDF is
defined, as such:
title:
http://purl.org/dc/elements/1.1/title
author:
http://creativecommons.org/ns#attributionName
license:
http://www.w3.org/1999/xhtml/vocab#license
In other words you express the exact same information as with RDFa but
without the mental overhead of triples or mixing multiple
vocabularies.
Concern #2:
Getting Microdata and RDFa markup correct is easier if there are
templates or if the semantic markup is performed automatically by the
CMS based on a pre-defined form. For example,
http://en.wikipedia.org/wiki/Augustus, note the Infobox on the
right. It would be much better for the RDFa markup to happen
automatically via MediaWiki's template process, than for it to be marked
up by
hand.
Certainly, but if wiki editors are *able* to do it by hand, then IMHO
microdata is much less error-prone.
However - XHTML1+RDFa is a published W3C
Recommendation and it is safe
Is Wikipedia using XHTML served as application/xml+xhtml? It seems
that RDFa in "XHTML" as deployed only works because consumers pretend
that the data is XHTML even though it is served as text/html and
treated as such by browsers. I would assume that most pages using RDFa
today are neither valid XHTML, nor served with the XHTML MIME type.
Any attempts to use browser DOM APIs to access the data will have
surprising/confusing results, as HTML doesn't have namespaces but RDFa
uses the syntax.
Concern #4:
While I can't fault Aryeh's enthusiasm, I am now concerned that there
may be questions in this community that are going unanswered related to
RDFa and Microdata. I hope this will be a deliberate process as it is
easy to get semantic data markup wrong (regardless of the implementation
language - Microformats, Microdata or RDFa).
Agreed.
The microdata spec for the curious:
http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html
Finally I will note that it is very likely that the microdata DOM APIs
will get implemented in browsers, making the semantic data available
to both scrapers, to native browser interfaces and to browser
extensions such as user JavaScript. As an example, you might see an
icon in the address bar for saving events to a calendar, or the
license information of an image displayed in the native properties
dialog. I stress again that I don't make any promises on behalf of
Opera or any other browser vendor, these are just my predictions.
In other goodies, microdata already has a defined mapping to JSON, so
dumping all embedded data as JSON via a web interface would be quite
trivial and be using the same format that you will get from browsers
when they have implemented some of the DOM APIs.
--
Philip Jägenstedt