Dear Wikitext experts,
please, check out Sztakipedia, a new Wiki RTE at:
http://pedia.sztaki.hu/ (please check the video first, and then the tool itself)
which aims at implementing some of the Visions you described here:
http://www.mediawiki.org/wiki/Future/Parser_plan (the RTE part)
Some background:
Sztakipedia did not start out as an editor for Wikipedia. It was meant
to be a web-based editor for UIMA annotated rich content, supported
with natural language background processing.
The tool was functional by the end of 2010, and we wanted a popular
application to demonstrate its features, so went on applying it to
Wiki editing.
To do that, we have made some wiki-specific stuff:
-After checking out many parsers, we have created a new one in JavaCC
-Created lots of content helpers based on dbpedia, like the link
recommendation, infobox recommendation, infobox editor help
-Integrated external resources to help editing, like the Book
Recommendation or Yahoo-based category recommendation
Sztakipedia is right now in its alpha phase, with many show stoppers,
like handling cite references properly, or editing templates embedded
in templates,
etc...
I am aware that you are working on a new syntax, parser and RTE and
they will eventually become the official ones for Wiki editing
(Sztakipedia is in Java anyway).
However, I still think that there is much to learn from our project. We will
write a paper next month on the subject and I will be honored is some
of you read and comment it. The main contents will be:
-problematic stuff in the current wikitext syntax we struggled with
-usability tricks, like extracting the infobox pages to provide help
for the fields, showing the abstracts of the articles to be linked
-recommendations, machine learning to support the user+ background theory
Our plan right now is to create an API for our recommendation services
and helpers and a MediaWiki js plugin to get its results to the
current wiki editor. This way I hope the results of this research -
which started out as a rather theoretical one - will be used in a real
world scenario by at least a few people. I hope we will be able to
extend the your planned new RTE the same way in the future.
Please, share with me your thoughs/comments/doubts about Sztakipedia.
Also I wanted to ask some things:
-Which is the most wanted helper feature according to you:
infobox/category/link recommendation? External data import from the
Linked Open Data? (Like our Book Recommender right now which has
millions of book records in it?) Field _value_ recommendation for
infoboxes from the text? Other?
-How do you measure the performance of a parser? I saw hints to some
300 parser test cases somewhere...
-Which is the best way to mash up external services to support the Wiki editor
interface (because if you call an external REST service from JS in mediawiki, it
will be cross-site scripting I'm afraid)?
Thank you very much,
Best Regards
Mihály Héder
MTA Sztaki,
Budapest, Hungary
Dear all,
I have recently subscribed to this list and I wanted to introduce myself.
I have been working as a student on the 2011 edition of the Google
Summer of Code on a MediaWiki parser [1] for the Mozilla Foundation.
My mentor is Erik Rose.
For this purpose, we use a Python PEG parser called Pijnu [2] and
implement a grammar for it [3]. This way, we parse the wikitext into
an abstract syntax tree that we will then transform to HTML or other
formats.
One of the advantages of Pijnu is the simplicity and readability of
the grammar definition [3]. It is not finished yet, but what we have
done so far seems very promising.
Please don't hesitate to give advice of feedback, or even test it if you wish!
Best regards
[1] https://github.com/peter17/mediawiki-parser
[2] https://github.com/peter17/pijnu
[3] https://github.com/peter17/mediawiki-parser/blob/master/mediawiki.pijnu
--
Peter Potrowl
Now, when there are more people on this list I thought I might bring up
tables for discussion again. There are two things that I would like to
have specifyed: treatment of "table garbage", and mixing of table flavours.
There are two flavours of tables: html-tables and wikitext tables. A
wikitext table has the structure:
^'{|'
table garbage
^'|' block element contents
^'|-'
table garbage
^'|}'
An html table has the structure:
'<table>'
table garbage
'<tr>'
table garbage
'<td>' block element contents '</td>'
table garbage
'</tr>'
table garbage
'</table>'
MediaWiki processes tables by extracting any recognizable part of the
table from text, and writing out the rendered html at a position right
_after_ the position where the table appears. The things that I call
"table garbage" are left in place and will thus suprisingly appear
before the table in the rendered output. (Table garbage is parsed the
same way as block element contents.)
1. How should the treatment of table garbage be specified? My
recommendation is to change the semantics compared to the original
and just specify that table garbage should be ignored.
The behavior of mediawiki is that the internal table tokens ('<td>',
'<tr>' etc for html tables and ^'|', ^'|-' etc for wikitext tables) are
activated when opening up a table of the corresponding type. But when
nesting tables of different types, the internal table tokens can be used
more or less interchangeably.
<table>
<td>
{|
| cell <td> cell <tr><td> cell
|-
| cell
|}
</table>
renders as this html:
<table>
<td>
<table>
<tr>
<td> cell </td><td> cell <tr></td><td> cell
</td></tr>
<tr>
<td> cell
</td></tr></table>
</table>
I have previously suggested that it should be specifyed that only the
internal table tokens of the right type can used. Thus, opening a
wikitext table inside an html table would activate parsing of the
wikitext table tokens and deactivate parsing of html table tokens. This
is a behavior that I find appealing. But since PEGs are currently in
fashion, this is a behavior that might be problematic to implement. So
there is also a third alternative: implicitly terminate the inner table
when encountering table tokens from the outer table, which should be
straightforward to implement with a PEG grammar.
So to summarize the alternatives:
1. Once both types of tables have been opened, use internal tokens
interchangeably.
2. Let inner tables take precedence and disable tokens of outer table type.
3. Let outer tables take precedence and implicitly terminate inner table
if table tokens of outer table type is encountered.
Which should be specified? I recommend 2 or 3.
Best regards,
Andreas Jonsson
I've started throwing some initial notes into the various sub-sections
listed here:
http://www.mediawiki.org/wiki/Future/Parser_plan
Adding very brief stubs describing the various default & some of the common
extension parser function & tag hooks, the beginnings of some notes on the
parser<->context interface (which will need to provide access to template
fetches, page lookups, and various information such as language, available
hooks of various types, current time, etc).
http://www.mediawiki.org/wiki/Wikitext_parser/Contexthttp://www.mediawiki.org/wiki/Wikitext_parser/Core_tag_hookshttp://www.mediawiki.org/wiki/Wikitext_parser/Core_parser_functions
The function & tag hook descriptions can use filling out, and anything that
looks tricky to implement should get explicitly noted! We know that some
functions will not be fully implemented in the JavaScript editing versions
(no immediate need to do a standalone Latex interpreter!) while others will
probably need to be tested in this environment early on like the if/switch
stuff.
These will also need sane ways to represent them during editing --
suggestions are welcome!
I've been updating the ParserPlayground gadget files as an in-SVN version
Extension:ParserPlayground -- you can enable this version on any local trunk
test wiki by pulling it from extensions SVN and enabling it. This lets us
keep the master copy versioned more easily than just keeping the pages on
MediaWiki.org as gadget files. There are a few changes such as making the
inspector mode enableble/disablable and when it's off offering a primitive
editing feature, starting to integrate into the WikiEditor toolbar
infrastructure.
The gadget on MediaWiki.org will switch over to use that later this week
(prototyped by my updates to the CodeEditor gadget and extension last couple
weeks); it still needs to be made a little more pluggable, retain its state
better, and have a more editing-centric rendering output.
http://www.mediawiki.org/wiki/Extension:ParserPlayground
-- brion
A friend recommends Localwiki's new rich text editor: "it's very good,
intuitive, and elegant."
https://github.com/localwiki/saplinghttp://localwiki.org/
My friend calls it "definitely worth a look" as rich text editing examples
go. The creator is Philip Neustrom, and he might be a useful resource to
provide a code tour or to share how they thought about developing its
features.
Sumana Harihareswara
Volunteer Development Coordinator
Wikimedia Foundation
(Cross-posting this update to wikitech-l & wikitext-l since this ties into
Visual Editor project)
I've made some more updates to the CodeEditor extension & gadget
demonstrating integration & adaption of the WikiEditor editor & toolbar
framework.
The syntax-highlighting mode in CodeEditor can now be toggled on and off;
when it's off, any WikiEditor functions that had been patched for CodeEditor
fall automatically through to the defaults, and you get the regular textarea
view.
(If you just want to go poke at it, scroll down to the older quoted post and
follow the extension link ;)
There's a few more things I'll want to poke to make it suitable for dropping
the visual editor in:
* a concept of 'document format' that lets us specify that certain tools are
suitable for wikitext pages, while others are suitable for JS/CSS code --
this'll let things like the 'bold' and 'link' buttons hide themselves
automatically on regular wiki pages, while the JS/CSS pages can
automatically show tools like 'indent/deindent block', 'find declaration',
'syntax cleanup'. (Right now this can be done by manually removing/adding
tools in the JS & CSS modes, but we can integrate that better -- just like
the browser & RTL compatibility checks currently in some WikiEditor modules)
* An abstraction layer on data type / structure type? For the way tools like
text insertions, link dialogs, search & replace etc work we can in many ways
treat 'plain textarea', 'wikiEditor iframe with template folding', and 'Ace
syntax-highting editor' as equivalent: all are views on a plain text
document that can be addressed character by character, and all that needs to
be implemented are the pieces to get text in and out, do selection and
insert/delete, etc. For the visual editor, we'll have a document structure
that's very different, so something that 'makes a section bold' would work
differently: operating on a DOM-like model to move some nodes around, rather
than dropping bits of text in.
* cleaner implementation for toggle switches on the toolbar
In the meantime though, I should be able to get the ParserPlayground demos
more tightly integrated into the editor widget, and will try to hack up some
temporary handling for the bold/italic/link etc on the provisional dom
structure.
-- brion
On Mon, Jun 13, 2011 at 5:37 PM, Brion Vibber <brion(a)pobox.com> wrote:
> On Fri, May 6, 2011 at 11:22 AM, Brion Vibber <brion(a)pobox.com> wrote:
>
>> On Fri, May 6, 2011 at 11:20 AM, Trevor Parscal <tparscal(a)wikimedia.org>wrote:
>>
>>> The way the WikiEditor works right now, the textbox can be replaced with
>>> anything that can support a few methods, such as getSelection,
>>> encapsulateSelection, etc. There are some modules that depend on specific
>>> edit box implementations, such as the current and only alternative to the
>>> textarea we called "iframe" since it's a contentEditable iframe.
>>>
>>> If you take a look at jquery.wikiEditor.iframe.js, you will see what I
>>> mean.
>>> It should be pretty straightforward to drop anything in there, and be
>>> able
>>> to take advantage of the toolbar. There are some things, like find and
>>> replace that may need to be reworked or just turned off, but even things
>>> like the link dialog should work just fine but just supporting a few
>>> methods.
>>>
>>> The API could be better documented, and re-factored a bit to be even more
>>> generic, but the basic structure is there, and can be reused without much
>>> hacking.
>>>
>>
>> Spiffy... I'll play with it for CodeEditor, see if I can make the
>> special-char inserts for instance work on it (which would actually be useful
>> for some JS!).
>>
>
> Finally got around to poking at this recently as practice for the rich
> editor project.
>
> CodeEditor is now implemented as an extension:
> http://www.mediawiki.org/wiki/Extension:CodeEditor
> and the gadget pulls in the JS from there -- so if you're using the gadget
> on mediawiki.org, it should continue to work.
>
> I've also got it now working with WikiEditor's formatting toolbar (mostly),
> special characters list (works great!), and search & replace dialog,
> implementing most of the same interfaces that WikiEditor's iframe mode does.
>
> We'll probably want to extend that API a bit further, a few offhand notes:
>
> * Our jquery.textSelection module which implements the
> fetch-and-encapsulate-text stuff still has a few WikiEditor-specific
> assumptions, and probably needs to be generalized a little more.
>
> * Various bits of formatting & help text that are suitable for wikitext
> pages are less useful when you're on a JS or CSS page. We may want to have a
> concept of moving up from 'generic' editor (a few generic buttons) to having
> a specific data format ('wiki' pages get the wikitext help; 'js' pages get a
> MediaWiki JS API help page; 'css' pages get a list of common selectors and a
> link to CSS documentation). Those should be independent of what actual
> *editor mode* is being used as well, so we can show JS-relevant help on a JS
> page even if you don't have some fancy syntax highlighting thing.
>
> * For things that are 'fancy views of plain text' like the WikiEditor
> iframe mode and CodeEditor, the formatting toolbars etc work fairly
> straightforwardly; we just need to get at some selection of text, spit back
> a modified bit of text, and fiddle around the selection or view. This
> probably won't adapt so well for a rich visual editor; so we may need an
> adaptor layer to let plain-text and enhanced-text editors fiddle with the
> wikitext sample fragments while a rich editor has its own adaptor that turns
> high-level calls like 'wrap in a link' or 'make bold' and does the
> appropriate AST & view modifications.
>
> * A few of WikiEditor's experimental fields require the iframe mode and
> force it to switch in; may need something to avoid ambiguity when we're
> deliberately using a different mode.
>
> * Probably would be good to add a specific notion of switching editor
> modes; WikiEditor's preview tab opens up _surrounding_ the editor, but if we
> switch between plaintext & syntax-highlighting, we probably want a toggle on
> the toolbar which just swaps the guts around.
>
> -- brion
>
>
Brion Vibber this week told me about what he, Neil, and Trevor are
working on regarding parser/visual editor, so here's a snapshot. Please
correct it if it's inaccurate.
Brion focusing on the parser and visual editor, as well as MediaWiki
code review. Brion, Trevor, and Neil are still working on the early parts!
Brion is doing preliminary test work with CodeEditor, and says
"ParserPlayground gadget will add more of that code soon".
Trevor's investigating the editing surface work and some early DOM tests.
Neil's on combining DOM transforms & planning for the editor
communication connection.
And Erik Moeller and I are grabbing some community folks, several of
whom are from Wikia, to coordinate contributions. Inez Korczyński, for
example, is interested in contributing. Maciej Brencz has just put up a
short description of how Wikia's editor internals work -
http://www.mediawiki.org/wiki/Future/Wikia_Reverse_Parser .
Right now we're strongly looking for parser test cases and Abstract
Syntax Trees. Once we have a stabler base, maybe in August or
September, there'll be more opportunity to implement plugins and UI
extensions. More info to come via this list and also live on
http://www.mediawiki.org/wiki/Future and its subpages.
--
Sumana Harihareswara
Volunteer Development Coordinator
Wikimedia Foundation
Maciej Brencz, thanks for updating me! Maciej told me that Mike Schwartz from
Wikia recently added a list of Wikia's "test cases and situations we need to
fallback to source mode":
<http://www.mediawiki.org/wiki/Future/Parser_test_cases>.
I'll put together a short description of how we handle parsing of
wikitext to HTML and reverse parsing of HTML back to wikitext in our
Rich Text Editor.
I hope that together we will make significant improvement to the MW Parser!
I hope so too, Maciej. Thanks!
Welcoming parser test cases from all interested parties; stick 'em on that
page.
--
Sumana Harihareswara
Volunteer Development Coordinator
Wikimedia Foundation
We had a whole bunch of folks who've had their hands in the world of
MediaWiki parsing & rich text editing here at the Berlin Hackathon, and made
some great progress on setting out some ideas for how to start actually
working on it.
Tomorrow I'll distill our session notes into a clearer description of the
core ideas & next steps (dare I say... a manifesto? :)
In the meantime, if you're brave you can peek at the raw session notes:
http://etherpad.wikimedia.org/mwhack11Sat-Parser
We're reviving the wikitext-l mailing list for people interested in the
project; it's gotten some traffic about interesting projects but we'll be
making it an active working group. I'll also be making regular posts here on
wikitech-l, on the Wikimedia tech blog, and on the wikis -- but I don't want
to clutter wikitech-l *too* much with the nitty-gritty details. ;)
Project hub pages will go up tomorrow at
http://www.mediawiki.org/wiki/Future
-- brion vibber (brion @ wikimedia.org / brion @ pobox.com)