Duesentrieb checked in RDFa support for MediaWiki in r58712:
http://www.mediawiki.org/wiki/Special:Code/MediaWiki/58712
I discussed this with him at some length, and Tim commented on how it
ties into the parser. I'd like to discuss this a bit more broadly
because we're talking about extending wikitext -- whatever markup we
allow on Wikipedia (and in this case, particularly on Commons) at the
next scap is probably going to have to be allowed forever by default
in MediaWiki, because everyone will start using it and pages will
break if we disable it.
RDFa is a way to embed data in HTML more robustly than with attributes
like class and title, which are reserved for author use or have
existing functionality. It allows you to specify an external
vocabulary that adds some semantics to your page that HTML is not
capable of expressing by itself. RDFa is based on the RDF standard,
and is relatively old. Microdata is a new competing standard that was
created last year as part of HTML5, which aims to be much simpler to
use.
The major use case we have is marking up Commons image licenses.
Either RDFa or Microdata could allow machines to more easily tell what
licenses the images we use are under. But in the long term, it seems
likely that only one of these technologies will win, and the other
will die. We don't want to have to support the loser forever. So IMO
we should choose the better one and go with that alone.
Now, which to choose? RDFa is better-established, and the W3C is
still attached to it, but Microdata has much greater support among the
parties that matter, including Google, Mozilla, Apple, and Opera (as
judged from discussions in the WHATWG and W3C). It's a lot more
concise and simpler to use, is better integrated into HTML, and can
represent any semantics we'd want. At the bottom of this post is an
example exhibiting how much simpler microdata is. Both RDFa+HTML and
Microdata are Working Drafts at the W3C right now, although RDFa in
XHTML1 (which we won't be using for much longer) is a Recommendation.
I should note that currently Google and a couple of others support
RDFa but not Microdata. But come on -- we're Wikipedia. Google
already screen-scrapes our templates to figure out what licenses we
use anyway, parsing microdata has got to be easier. We shouldn't let
existing market shares deter us from picking the better technology.
My personal opinion on this is that we should enable Microdata by
default (which is much less intrusive than enabling RDFa -- just
whitelist a few extra attributes) and encourage Commons to use that
instead of RDFa. We can leave RDFa support in, but disabled by
default. What does everyone else think?
== Example of RDFa vs. Microdata ==
Suppose we have the following markup right now:
[[
<div id="bodyContent">
...
<img src="http://upload.wikimedia.org/wikipedia/commons/e/ef/EmeryMolyneux-terrestria…"
width="640" height="480">
...
<p>EmeryMolyneux-terrestrialglobe-1592-20061127.jpg by Bob Smith is
licensed under a <a
href="http://creativecommons.org/licenses/by-sa/3.0/us/">Creative
Commons Attribution-Share Alike 3.0 United States License</a>.</p>
]]
Sample RDFa code to say an image is under a CC-BY-SA 3.0 license seems
to be something like this, based off the license generator on the CC
website:
[[
<div id="bodyContent">
...
<img src="http://upload.wikimedia.org/wikipedia/commons/e/ef/EmeryMolyneux-terrestria…"
width="640" height="480" id="mw-image">
...
<p><span xmlns:dc="http://purl.org/dc/elements/1.1/"
href="http://purl.org/dc/dcmitype/StillImage" property="dc:title"
rel="dc:type">EmeryMolyneux-terrestrialglobe-1592-20061127.jpg</span>
by <span xmlns:cc="http://creativecommons.org/ns#" href="#mw-image"
property="cc:attributionName" rel="cc:attributionURL">Bob Smith</span>
is licensed under a <a rel="license"
href="http://creativecommons.org/licenses/by-sa/3.0/us/">Creative
Commons Attribution-Share Alike 3.0 United States License</a>.</p>
]]
This adds an id to the image, rel="license" to the license link, and
two extra tags with lots of lengthy attributes. To be valid RDFa, we
would need to add further markup somewhere, at least a version tag in
the <html> tag on every page AFAIK. Equivalent microdata is this:
<div id="bodyContent" itemscope="" itemtype="http://n.whatwg.org/work">
...
<img src="http://upload.wikimedia.org/wikipedia/commons/e/ef/EmeryMolyneux-terrestria…"
width="640" height="480" itemprop="work">
...
<p><span itemprop="title">EmeryMolyneux-terrestrialglobe-1592-20061127.jpg</span>
by <span itemprop="author">Bob Smith</span> is licensed under a <a
itemprop="license"
href="http://creativecommons.org/licenses/by-sa/3.0/us/">Creative
Commons Attribution-Share Alike 3.0 United States License</a>.</p>
This adds two attributes to an ancestor to indicate that the contents
form a work -- these could be moved to lower elements if desired,
AFAICT, but then they'd have to be duplicated. Instead of adding an
id to the <img>, it uses itemprop="work" to directly say it's the work
being referred to. Instead of <span
xmlns:dc="http://purl.org/dc/elements/1.1/"
href="http://purl.org/dc/dcmitype/StillImage" property="dc:title"
rel="dc:type">, we have <span itemprop="title">. Instead of <span
xmlns:cc="http://creativecommons.org/ns#" href="#mw-image"
property="cc:attributionName" rel="cc:attributionURL">, we have <span
itemprop="author">.
Overall, I think it's clear from this example that microdata is much
more concise and also more coherent. It's easy to see from this
example exactly how the microdata model works: you have a bunch of
stuff grouped as an item using itemscope, itemtype tells you what type
of item it is, and then itemprop tells you what each role each piece
has. It's barely longer than the un-annotated markup. RDFa, by
contrast, is a mess of boilerplate that's impossible to understand
unless you actually read the specs. Microdata's syntax has actually
been refined by a usability study run on it by Google.
Would it be possible to generate a log or statistics of searches on
Wikipedia using the "Go" button that did not immediately reach an article?
Properly anonymized of course. I think it would be useful for finding
missing articles and redirects to create. There would be a lot of crap of
course, but probably also very useful information on what people have
trouble finding.
On 10/26/09 10:05 AM, Siebrand Mazeland wrote:
> P: I would like to try and make a proposal with other CMS projects (like
> Joomla, Drupal, Typo3, Wordpress, etc.) for a dev room. Reason for not
> applying for a "MediaWiki dev room' is that I expect that this will not be
> honored because it has too tight a scope. I spoke to one of the people of the
> program committee last year, and I was advised to find a broader scope.
> Alternatively we could request a dev room with other wiki engines (tikiwiki,
> docuwiki, etc.). Personally I have no preference on which projects we would
> cooperate with, just as long as we will make a proposal that will get us the
> best chance to have a presence at FOSDEM 2010. Open to suggestions...
There's been some talk on trying to pair with OpenStreetMaps as well;
has anybody attempted to contact any partner yet?
-- brion
%% Not multiple edits on same template %%
I think the edit page sould be more smart. If a user open the same
page two times, the second time sould be warned that the page is
already opened. This may need some trans-window comunication, that is
not something browser love to do, but I guest is possible with
DOMStorage/Cookies or something else.
%% Autosave feature for edit %%
Again, in this day and age, losing data because a computer crash is a
problem that sould never ocurr. The edit page sould save the latest
version of the edited text to a local persistent area ( DOMStorage? ).
That way, I can 'accidentally close' the edit page, and wen I reopen
the edit, the page sould detect "He.. the page is on the same
revision, this mean my edited version is interesting", and let the
user continue editing. "You closed the edit page withouth saving. Do
you want to continue with the old version?". Somewhat like the
"drafts" feature on gmail, or the autosave feature of OpenOffice.
%% Edit in the browser any fileformat wen webeditors for such
fileformats are available %%
Wen possible, documents sould be editable inside the browser. This
mean as possible, add a SVG editor or a Math editor or a HTML editor
or a DOC editor or a PNG editor. The PNG is wrong -> download ->
edit -> upload is lame. The PNG is wrong -> edit -> save is cool.
The current edit page only supports wikicode :-(
%% Un-cruft mode %%
There seems to be a lot of "META" information on the wiki, all this
metainformation like "stub", etc.. sould be optional. There must be
a single checkbox option to disable it all. I don't want to read even
1 more "citation needed" or "this is a stub" bloat. Maybe default
sould be to "show cruft".
%% Wikipedia Green, Blue and Orange %%
A way to fight Deletionism, could be to have something like
"different levels" on the wikipedia (a wiki). Set a group of pages
on "Book Blue", for pages with a maintainer, and pages approved by a
superior committee of quality. Set a group of pages on "Green Book"
for pages that serve the merits of notability. "Yellow Book" for
pages that don't fulfill a notability criteria. Deletionism is
binary, computers can work with more values than 1, 0. Hell.. you
can make Green Book and Yellow Book invisible for the un-loged users,
only available for loged users. For this thing to work, Templates
sould use different colors for differents books. Colors will also
account for quality. The german will be mostly blue, while others will
have more green pages. There will be a wiki with 90.000 green and
10.000 blue. And other one with 10.000 green and 90.000 blue.
%% The Death of Wiki %%
Ultimatelly, al wikis lose the war against entropy and are abandoned.
This will hit all wikipedia wikis, and all based on mediawiki. While
you can't stop that, you can code something so the resulting dead body
of wiki is not pure shit. A possible idea could be to "auto-protect"
pages without edit in N years (4 years), and save id of the "last know
good version", this can be done flagging the pages as "dirty" after a
edit, and flag pages as "clean" wen a logged editor say so. Something
like "Page Milestones". Maybe the "History" view of a page sould list
first the last 10 milestones (newer first), then the "dump" of all
edits. There are probably about more than 10 years to think about this
issue.
Hi,
As you may know there are currently two entry points in MediaWiki for
javascript that wants to perform certain actions, action=ajax and
api.php. Only the following features still use action=ajax: ajax
watch, upload license preview and upload warnings check. I don't
really see much point for two entry points where one would suffice.
These could all be readily migrated to the API. However, this would
mean that they will become unavailable if the API is disabled. Would
that considered to be a problem?
Bryan
Bawolff: various Wikinews-related extensions
Jonathan Williford: extensions developed for http://neurov.is/on
Ning Hu: Semantic NotifyMe
Rob Lanphier and Conrad Irwin have been added to the core committer group.
-- Tim Starling
I was wondering if anybody would have the time to create some sort of application status dashboard, similar to the ones found on google (http://www.google.com/appsstatus#hl=en) or amazon http://status.aws.amazon.com/).
Essentially, something that acts like a simplified external facing blog, where people could update the different pieces as problems are detected. Eventually, it would be nice to be able to tie it with our different monitoring softwares, but to begin with, it would be very convenient for our partners to be able to see the overall status of our different layers. For instance, if someone detects that m.wikipedia.org is not working, the first step would be to update said dashboard to inform the less technically-savy people who are not necessarily on IRC of the problem and that someone is looking at it / in the process of fixing it.
Another important feature would be to keep history on problems that have happened in the past, much like google does. I know this is already done to some extent with the server admin log, but having an easy to read interface would in my opinion prove beneficial.
Anyway, any suggestion on additional features, or requirements are welcomed.
--Fred.
HI all,
As part of the Wikimedia strategic planning process, we're trying to
get a sense of the current state of Wikimedia ops and MediaWiki. You
can see some claims on the strategy wiki right now:
http://strategy.wikimedia.org/wiki/Wikimedia_technology_infrastructurehttp://strategy.wikimedia.org/wiki/MediaWiki
(Note: These are not my personal opinions; I just copied them over
from other places.) Would love to get people's feedback here. If you
could edit these pages with your thoughts on the current state of both
Wikimedia ops and MediaWiki, that would be great. Analysis and links
to existing docs would be much appreciated.
This information will help Wikimedia make good recommendations as to
what to do invest in to improve these things.
Thanks!
=Eugene
--
======================================================================
Eugene Eric Kim ................................ http://xri.net/=eekim
Blue Oxen Associates ........................ http://www.blueoxen.com/
======================================================================
The procedure Brion used for the last couple of wmf-deployment updates
was:
* Create a new branch wmf-deployment-<date>, copied from trunk
* Spend a day or two merging in all the WMF-specific hacks and merging
out anything that's experimental or buggy
* Delete wmf-deployment
* Move wmf-deployment-<date> to wmf-deployment
I'm considering instead creating a permanent numbered branch for each
wmf-deployment major update. To deploy a major update, we would use
svn switch to change the checked-out directory. Then subsequent minor
updates would be merged to the most recent numbered branch.
The main advantage would be that the logs would be easier to navigate.
Currently, if you want to know what happened in the wmf-deployment
branch before the last major update, you have to specify path
revisions to svn log, which is tedious and potentially confusing.
Navigating the history in viewvc is also difficult.
Another advantage is that we'd have a major version number that we can
say Wikimedia is running on. We had three deployments of 1.16 and
there may be a fourth before we branch it and start on 1.17, but these
branch points are unnamed. Instead we could call them 1.16wmf1,
1.16wmf2 and 1.16wmf3 in Special:Version, and we could keep track of
when those updates were deployed, so that users would have a better
idea of how to talk about the software that we're running.
I've used svn switch a few times in other situations, and haven't
encountered any problems with it. As long as there are no uncommitted
patches in the live working copy, it should go ahead without a hitch.
Any thoughts on that?
-- Tim Starling