Dear Wikitext experts,
please, check out Sztakipedia, a new Wiki RTE at:
http://pedia.sztaki.hu/ (please check the video first, and then the tool itself)
which aims at implementing some of the Visions you described here:
http://www.mediawiki.org/wiki/Future/Parser_plan (the RTE part)
Some background:
Sztakipedia did not start out as an editor for Wikipedia. It was meant
to be a web-based editor for UIMA annotated rich content, supported
with natural language background processing.
The tool was functional by the end of 2010, and we wanted a popular
application to demonstrate its features, so went on applying it to
Wiki editing.
To do that, we have made some wiki-specific stuff:
-After checking out many parsers, we have created a new one in JavaCC
-Created lots of content helpers based on dbpedia, like the link
recommendation, infobox recommendation, infobox editor help
-Integrated external resources to help editing, like the Book
Recommendation or Yahoo-based category recommendation
Sztakipedia is right now in its alpha phase, with many show stoppers,
like handling cite references properly, or editing templates embedded
in templates,
etc...
I am aware that you are working on a new syntax, parser and RTE and
they will eventually become the official ones for Wiki editing
(Sztakipedia is in Java anyway).
However, I still think that there is much to learn from our project. We will
write a paper next month on the subject and I will be honored is some
of you read and comment it. The main contents will be:
-problematic stuff in the current wikitext syntax we struggled with
-usability tricks, like extracting the infobox pages to provide help
for the fields, showing the abstracts of the articles to be linked
-recommendations, machine learning to support the user+ background theory
Our plan right now is to create an API for our recommendation services
and helpers and a MediaWiki js plugin to get its results to the
current wiki editor. This way I hope the results of this research -
which started out as a rather theoretical one - will be used in a real
world scenario by at least a few people. I hope we will be able to
extend the your planned new RTE the same way in the future.
Please, share with me your thoughs/comments/doubts about Sztakipedia.
Also I wanted to ask some things:
-Which is the most wanted helper feature according to you:
infobox/category/link recommendation? External data import from the
Linked Open Data? (Like our Book Recommender right now which has
millions of book records in it?) Field _value_ recommendation for
infoboxes from the text? Other?
-How do you measure the performance of a parser? I saw hints to some
300 parser test cases somewhere...
-Which is the best way to mash up external services to support the Wiki editor
interface (because if you call an external REST service from JS in mediawiki, it
will be cross-site scripting I'm afraid)?
Thank you very much,
Best Regards
Mihály Héder
MTA Sztaki,
Budapest, Hungary
Here are some updates from the parser & visual editor front from the
last few weeks.
The wikitech-l list has had some relevant discussion around:
* a wiki object model:
http://lists.wikimedia.org/pipermail/wikitech-l/2011-August/054499.html
* deprecated wikitext markup:
http://lists.wikimedia.org/pipermail/wikitech-l/2011-August/054603.html
* the template system:
http://lists.wikimedia.org/pipermail/wikitech-l/2011-August/054649.html
The visual editor folks are in a research and development phase,
exploring other systems and techniques (research) and prototyping a
proof of concept (development).
Trevor and Inez have been making substantial progress on the visual
editor. Check out their work:
http://www.mediawiki.org/wiki/Special:Code/MediaWiki/author/tparscal
and http://www.mediawiki.org/wiki/Special:Code/MediaWiki/author/inez .
They spent part of August working on the transaction protocol/blocking
piece. There are two possible models: 1. we save the entire doc at
every keystroke, 2. we build it as a series of events, and if we want to
undo, we can just reverse it.
Insert, move, & annotate are now reversible. They are replacing the
current system with a transaction-based system, which will give us
better collaboration, & better undo for free as well. But this is still
in research, very much in flux.
Neil is working on a demo of the Etherpad integration with MediaWiki,
but much of his time has been taken up with working on the Upload
Wizard, as you can see in
http://www.mediawiki.org/wiki/Special:Code/MediaWiki/author/neilk . He
hopes to get it done ASAP.
We're interested in seeing how MediaWiki's trunk developers can reuse
existing Wikia code. Wikia and Wikimedia Foundation have been
collaborating and sharing ideas, code, and designs.
Since chat and collaborative editing share auth requirements, Neil wrote
an Identity API extension for that. I believe this is it:
/branches/extensions-realtime/IdentityApi
>From the notes I'm seeing, parser progress is a blocker for visual
editor progress right now -- maybe Brion can speak more to that?
In version one of the visual editor, should we have lots of broken
features, or a few working features? It looks like we'll start by
picking a set of use cases and supporting only those for a version 1.
For example, for new page creation, we don't even need the parser. And
Trevor really wants to get something out in the wild to see what people
do with it.
Almost all of this is from meeting notes so I may have some things wrong
-- please correct me if I do!
We do continue to need help from people with significant experience in
user facing, highly interactive applications and with serious JavaScript
skills. If you fit that description and can spare some time, take a
look at the code links above and reply to the list, or to Trevor, Neil,
or Brion.
--
Sumana Harihareswara
Volunteer Development Coordinator
Wikimedia Foundation
Original thread:
http://lists.wikimedia.org/pipermail/wikitech-l/2011-August/054603.html
Sumana Harihareswara
Volunteer Development Coordinator
Wikimedia Foundation
Forwarded conversation
Subject: [Wikitech-l] Cleaning up deprecated html in WikiText
------------------------
From: *Daniel Friesen* <lists(a)nadir-seen-fire.com>
Date: Thu, Aug 11, 2011 at 5:39 AM
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Right now in wide usage on wiki markup like so is used:
{|
|-
|valign=top width=100%|
|}
The bgcolor, cellpadding, cellspacing, valign, align, width, height,
etc... presentational attributes have all been completely removed from
html5 and pages using these attributes aren't valid.
There's no way we'll expect all the instances of valign and width to
disappear from every wiki on their own. And frankly in the context of
authoring WikiText I don't believe the user should have to care about
that and be forced to write a longer style line.
What are people's opinions on the idea of taking these removed
presentational attributes, and turning them into sugared parts of
WikiText that are output as actual css in the output.
The change would essentially mean that this:
|valign=top width=100%|
Would become:
<td style="vertical-align: top; width: 100%;">
Instead of this:
<td valign="top" width="100%">
I can only find one downside. Text browsers like w3m do make use of
valign but don't support css, hence the change does make the valign
revert to normal vertically centered alignment.
I should make a few notes:
- This doesn't even affect all text browsers. lynx doesn't display
tables in a tabular form and hence doesn't care what type of alignment
attributes you have.
- This has absolutely nothing to do with web accessibility; Screen
readers output to things like audio and braille, and hence don't display
things visually so alignment means nothing to them. And w3 appears to
restrict users with poor eyesight to proper css capable browsers.
Standards for web accessibility for users with poor eyesight seams
focused on things like ensuring usability with screen zooms and larger
fonts, rather than expecting users with bad eyesight to use text browsers.
--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
----------
From: *Daniel Kinzler* <daniel(a)brightbyte.de>
Date: Thu, Aug 11, 2011 at 6:13 AM
To: daniel(a)nadir-seen-fire.com, Wikimedia developers <
wikitech-l(a)lists.wikimedia.org>
This sounds good to me.
I'd like to take this opportunity to once more float the idea of having two
types of "tables" in wikitext:
* "grids" used for layout, that can be nested, and may be rendered as html
tables but would better be rendered using divs with appropriate styles.
* actual data tables, which can not contain block elements in their cells
(and
can not be nested), for actual tabular data.
i think this would make for cleaner wiki text and for cleaner html output.
-- daniel
Inspired by Brion's slides (couldn't make it to Haifa myself), some
random questions and musings:
- Is there a definition / "complete" example of the JSON output of the
new parser somewhere? I didn't see it on the parser pages...
- Will there be multiple "resolutions" of parsing? One would be
template name and key-value-pair parameters, another would be the
template replaced with the corresponding wikitext, another one the
template replaced with the corresponding wikitext parsed into JSON.
Either all-in-one large JSON object, or one of those "on demand"?
Also, extension tag/attributes/contents, rendered extension output,
WikiSource transclusions etc.
- One of the functions I have issues with in WYSIFTW is copy&paste.
Besides making it work in the new editor, would it be worth to add
special behaviour for (cut|copy)/paste between articles? Like,
automatically adding the source article link to the edit description,
so the source of text can be traced, even if it's just manually?
- Toolserver access to full wiki text is a pain. Once the new parser
is live (even if it's "only" in parallel with the old one), could we
have new, fast access capability for both raw wikitext and parser JSON
output on the toolserver? I mean that in addition to API parser
output, which I take as a given here :-)
- Will there be a JSON-in-XML dump besides the current wikitext-in-XML one?
- Will there be an interface for to the parser for JavaScript tools
/outside/ edit mode? I'm thinking "Add a reference", "insert image"
etc. Just getting a char-based WikiText position from a mouse click
would be very helpful indeed, so the user can click where he wants the
reference in the rendered HTML, and JS can insert it at the
corresponding WikiText position.
- A point discussed endlessly before: As a "side effect" of the new
parser, will we store page-template-passed_value triplets in the
database? Think {{Information}} on Commons.
- Will there be an import page or JS function for parser JSON objects?
Think Word/OpenOffice export, or "paste HTML" (with JS HTML-to-JSON
converter).
That should keep us busy for a while... ;-)
Magnus
Hello!
I have one question which bothered me every time I thought about
WYSIWYG in MediaWiki (in fact, it was the only issue that seemed a
*real* obstacle to me).
How new WYSIWYG editor is going to handle templates?
--vvv
On Mon, Aug 1, 2011 at 11:18 AM, Magnus Manske
<magnusmanske(a)googlemail.com>wrote:
> In case you missed it all the way over there in Haifa:-)
>
>
> http://dirkriehle.com/2011/07/29/technical-report-on-wom-an-object-model-fo…
>
Their model, while interesting and an excellent reference, makes some
explicit choices that diverge from what we're currently working on:
* ugly but common structures where eg tables are opened/closed across
templates are not supported
* not all input is representable
They're not awful decisions -- and we might have made them 8-10 years ago
had anybody made an attempt to *plan* the markup language. ;) But we have an
existing data set of millions of documents that we have to support, and for
the first next-generation parser I'm hoping to basically define something
that's *very close* to how the current parser works, so that that first
decade of Wikipedia documents can be fully used with a specified parser
anytime in the future.
We can make the structures cleaner later and deprecate the old tables &
whatnot -- parser functions and such allow for beautifully nested structures
and a future wysiwyg world will take most of the low-level *markup* out of
normal editors' faces -- but for now we have to make it work with what we've
got. ;)
-- brion
Magnus sent this link and I think you'd be interested:
"Abstract: Wikipedia is a rich encyclopedia that is not only of great
use to its contributors and readers but also to researchers and
providers of third party software around Wikipedia. However, Wikipedia’s
content is only available as Wikitext, the markup language in which
articles on Wikipedia are written, and whoever needs to access the
content of an article has to implement their own parser or has to use
one of the available parser solutions. Unfortunately, those parsers
which convert Wikitext into a high-level representation like an abstract
syntax tree (AST) define their own format for storing and providing
access to this data structure. Further, the semantics of Wikitext are
only defined implicitly in the MediaWiki software itself. This situation
makes it difficult to reason about the semantic content of an article or
exchange and modify articles in a standardized and machine-accessible
way. To remedy this situation we propose a markup language, called XWML,
in which articles can be stored and an object model, called WOM, that
defines how the contents of an article can be read and modified."
-------- Original Message --------
Subject: [Wikitech-l] WOM
Date: Mon, 1 Aug 2011 19:18:48 +0100
From: Magnus Manske <magnusmanske(a)googlemail.com>
Reply-To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
In case you missed it all the way over there in Haifa:-)
http://dirkriehle.com/2011/07/29/technical-report-on-wom-an-object-model-fo…
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l