Wikitext-l June 2011

wikitext-l@lists.wikimedia.org

10 participants
10 discussions

by Mihály Héder

Dear Wikitext experts, please, check out Sztakipedia, a new Wiki RTE at: http://pedia.sztaki.hu/ (please check the video first, and then the tool itself) which aims at implementing some of the Visions you described here: http://www.mediawiki.org/wiki/Future/Parser_plan (the RTE part) Some background: Sztakipedia did not start out as an editor for Wikipedia. It was meant to be a web-based editor for UIMA annotated rich content, supported with natural language background processing. The tool was functional by the end of 2010, and we wanted a popular application to demonstrate its features, so went on applying it to Wiki editing. To do that, we have made some wiki-specific stuff: -After checking out many parsers, we have created a new one in JavaCC -Created lots of content helpers based on dbpedia, like the link recommendation, infobox recommendation, infobox editor help -Integrated external resources to help editing, like the Book Recommendation or Yahoo-based category recommendation Sztakipedia is right now in its alpha phase, with many show stoppers, like handling cite references properly, or editing templates embedded in templates, etc... I am aware that you are working on a new syntax, parser and RTE and they will eventually become the official ones for Wiki editing (Sztakipedia is in Java anyway). However, I still think that there is much to learn from our project. We will write a paper next month on the subject and I will be honored is some of you read and comment it. The main contents will be: -problematic stuff in the current wikitext syntax we struggled with -usability tricks, like extracting the infobox pages to provide help for the fields, showing the abstracts of the articles to be linked -recommendations, machine learning to support the user+ background theory Our plan right now is to create an API for our recommendation services and helpers and a MediaWiki js plugin to get its results to the current wiki editor. This way I hope the results of this research - which started out as a rather theoretical one - will be used in a real world scenario by at least a few people. I hope we will be able to extend the your planned new RTE the same way in the future. Please, share with me your thoughs/comments/doubts about Sztakipedia. Also I wanted to ask some things: -Which is the most wanted helper feature according to you: infobox/category/link recommendation? External data import from the Linked Open Data? (Like our Book Recommender right now which has millions of book records in it?) Field _value_ recommendation for infoboxes from the text? Other? -How do you measure the performance of a parser? I saw hints to some 300 parser test cases somewhere... -Which is the best way to mash up external services to support the Wiki editor interface (because if you call an external REST service from JS in mediawiki, it will be cross-site scripting I'm afraid)? Thank you very much, Best Regards Mihály Héder MTA Sztaki, Budapest, Hungary

12 years, 7 months

MediaWiki parser in Python

by Peter17

Dear all, I have recently subscribed to this list and I wanted to introduce myself. I have been working as a student on the 2011 edition of the Google Summer of Code on a MediaWiki parser [1] for the Mozilla Foundation. My mentor is Erik Rose. For this purpose, we use a Python PEG parser called Pijnu [2] and implement a grammar for it [3]. This way, we parse the wikitext into an abstract syntax tree that we will then transform to HTML or other formats. One of the advantages of Pijnu is the simplicity and readability of the grammar definition [3]. It is not finished yet, but what we have done so far seems very promising. Please don't hesitate to give advice of feedback, or even test it if you wish! Best regards [1] https://github.com/peter17/mediawiki-parser [2] https://github.com/peter17/pijnu [3] https://github.com/peter17/mediawiki-parser/blob/master/mediawiki.pijnu -- Peter Potrowl

12 years, 9 months

Tables, again ...

by Andreas Jonsson

Now, when there are more people on this list I thought I might bring up tables for discussion again. There are two things that I would like to have specifyed: treatment of "table garbage", and mixing of table flavours. There are two flavours of tables: html-tables and wikitext tables. A wikitext table has the structure: ^'{|' table garbage ^'|' block element contents ^'|-' table garbage ^'|}' An html table has the structure: '<table>' table garbage '<tr>' table garbage '<td>' block element contents '</td>' table garbage '</tr>' table garbage '</table>' MediaWiki processes tables by extracting any recognizable part of the table from text, and writing out the rendered html at a position right _after_ the position where the table appears. The things that I call "table garbage" are left in place and will thus suprisingly appear before the table in the rendered output. (Table garbage is parsed the same way as block element contents.) 1. How should the treatment of table garbage be specified? My recommendation is to change the semantics compared to the original and just specify that table garbage should be ignored. The behavior of mediawiki is that the internal table tokens ('<td>', '<tr>' etc for html tables and ^'|', ^'|-' etc for wikitext tables) are activated when opening up a table of the corresponding type. But when nesting tables of different types, the internal table tokens can be used more or less interchangeably. <table> <td> {| | cell <td> cell <tr><td> cell |- | cell |} </table> renders as this html: <table> <td> <table> <tr> <td> cell </td><td> cell <tr></td><td> cell </td></tr> <tr> <td> cell </td></tr></table> </table> I have previously suggested that it should be specifyed that only the internal table tokens of the right type can used. Thus, opening a wikitext table inside an html table would activate parsing of the wikitext table tokens and deactivate parsing of html table tokens. This is a behavior that I find appealing. But since PEGs are currently in fashion, this is a behavior that might be problematic to implement. So there is also a third alternative: implicitly terminate the inner table when encountering table tokens from the outer table, which should be straightforward to implement with a PEG grammar. So to summarize the alternatives: 1. Once both types of tables have been opened, use internal tokens interchangeably. 2. Let inner tables take precedence and disable tokens of outer table type. 3. Let outer tables take precedence and implicitly terminate inner table if table tokens of outer table type is encountered. Which should be specified? I recommend 2 or 3. Best regards, Andreas Jonsson

12 years, 10 months

Quick updates: stub function & tag hook descriptions, context interface, ParserPlayground

by Brion Vibber

I've started throwing some initial notes into the various sub-sections listed here: http://www.mediawiki.org/wiki/Future/Parser_plan Adding very brief stubs describing the various default & some of the common extension parser function & tag hooks, the beginnings of some notes on the parser<->context interface (which will need to provide access to template fetches, page lookups, and various information such as language, available hooks of various types, current time, etc). http://www.mediawiki.org/wiki/Wikitext_parser/Context http://www.mediawiki.org/wiki/Wikitext_parser/Core_tag_hooks http://www.mediawiki.org/wiki/Wikitext_parser/Core_parser_functions The function & tag hook descriptions can use filling out, and anything that looks tricky to implement should get explicitly noted! We know that some functions will not be fully implemented in the JavaScript editing versions (no immediate need to do a standalone Latex interpreter!) while others will probably need to be tested in this environment early on like the if/switch stuff. These will also need sane ways to represent them during editing -- suggestions are welcome! I've been updating the ParserPlayground gadget files as an in-SVN version Extension:ParserPlayground -- you can enable this version on any local trunk test wiki by pulling it from extensions SVN and enabling it. This lets us keep the master copy versioned more easily than just keeping the pages on MediaWiki.org as gadget files. There are a few changes such as making the inspector mode enableble/disablable and when it's off offering a primitive editing feature, starting to integrate into the WikiEditor toolbar infrastructure. The gadget on MediaWiki.org will switch over to use that later this week (prototyped by my updates to the CodeEditor gadget and extension last couple weeks); it still needs to be made a little more pluggable, retain its state better, and have a more editing-centric rendering output. http://www.mediawiki.org/wiki/Extension:ParserPlayground -- brion

12 years, 10 months

Localwiki rich text editor

by Sumana Harihareswara

A friend recommends Localwiki's new rich text editor: "it's very good, intuitive, and elegant." https://github.com/localwiki/sapling http://localwiki.org/ My friend calls it "definitely worth a look" as rich text editing examples go. The creator is Philip Neustrom, and he might be a useful resource to provide a code tour or to share how they thought about developing its features. Sumana Harihareswara Volunteer Development Coordinator Wikimedia Foundation

12 years, 10 months

Re: [Wikitext-l] [Wikitech-l] Alternate/remote editor API?

by Brion Vibber

(Cross-posting this update to wikitech-l & wikitext-l since this ties into Visual Editor project) I've made some more updates to the CodeEditor extension & gadget demonstrating integration & adaption of the WikiEditor editor & toolbar framework. The syntax-highlighting mode in CodeEditor can now be toggled on and off; when it's off, any WikiEditor functions that had been patched for CodeEditor fall automatically through to the defaults, and you get the regular textarea view. (If you just want to go poke at it, scroll down to the older quoted post and follow the extension link ;) There's a few more things I'll want to poke to make it suitable for dropping the visual editor in: * a concept of 'document format' that lets us specify that certain tools are suitable for wikitext pages, while others are suitable for JS/CSS code -- this'll let things like the 'bold' and 'link' buttons hide themselves automatically on regular wiki pages, while the JS/CSS pages can automatically show tools like 'indent/deindent block', 'find declaration', 'syntax cleanup'. (Right now this can be done by manually removing/adding tools in the JS & CSS modes, but we can integrate that better -- just like the browser & RTL compatibility checks currently in some WikiEditor modules) * An abstraction layer on data type / structure type? For the way tools like text insertions, link dialogs, search & replace etc work we can in many ways treat 'plain textarea', 'wikiEditor iframe with template folding', and 'Ace syntax-highting editor' as equivalent: all are views on a plain text document that can be addressed character by character, and all that needs to be implemented are the pieces to get text in and out, do selection and insert/delete, etc. For the visual editor, we'll have a document structure that's very different, so something that 'makes a section bold' would work differently: operating on a DOM-like model to move some nodes around, rather than dropping bits of text in. * cleaner implementation for toggle switches on the toolbar In the meantime though, I should be able to get the ParserPlayground demos more tightly integrated into the editor widget, and will try to hack up some temporary handling for the bold/italic/link etc on the provisional dom structure. -- brion On Mon, Jun 13, 2011 at 5:37 PM, Brion Vibber <brion(a)pobox.com> wrote: > On Fri, May 6, 2011 at 11:22 AM, Brion Vibber <brion(a)pobox.com> wrote: > >> On Fri, May 6, 2011 at 11:20 AM, Trevor Parscal <tparscal(a)wikimedia.org>wrote: >> >>> The way the WikiEditor works right now, the textbox can be replaced with >>> anything that can support a few methods, such as getSelection, >>> encapsulateSelection, etc. There are some modules that depend on specific >>> edit box implementations, such as the current and only alternative to the >>> textarea we called "iframe" since it's a contentEditable iframe. >>> >>> If you take a look at jquery.wikiEditor.iframe.js, you will see what I >>> mean. >>> It should be pretty straightforward to drop anything in there, and be >>> able >>> to take advantage of the toolbar. There are some things, like find and >>> replace that may need to be reworked or just turned off, but even things >>> like the link dialog should work just fine but just supporting a few >>> methods. >>> >>> The API could be better documented, and re-factored a bit to be even more >>> generic, but the basic structure is there, and can be reused without much >>> hacking. >>> >> >> Spiffy... I'll play with it for CodeEditor, see if I can make the >> special-char inserts for instance work on it (which would actually be useful >> for some JS!). >> > > Finally got around to poking at this recently as practice for the rich > editor project. > > CodeEditor is now implemented as an extension: > http://www.mediawiki.org/wiki/Extension:CodeEditor > and the gadget pulls in the JS from there -- so if you're using the gadget > on mediawiki.org, it should continue to work. > > I've also got it now working with WikiEditor's formatting toolbar (mostly), > special characters list (works great!), and search & replace dialog, > implementing most of the same interfaces that WikiEditor's iframe mode does. > > We'll probably want to extend that API a bit further, a few offhand notes: > > * Our jquery.textSelection module which implements the > fetch-and-encapsulate-text stuff still has a few WikiEditor-specific > assumptions, and probably needs to be generalized a little more. > > * Various bits of formatting & help text that are suitable for wikitext > pages are less useful when you're on a JS or CSS page. We may want to have a > concept of moving up from 'generic' editor (a few generic buttons) to having > a specific data format ('wiki' pages get the wikitext help; 'js' pages get a > MediaWiki JS API help page; 'css' pages get a list of common selectors and a > link to CSS documentation). Those should be independent of what actual > *editor mode* is being used as well, so we can show JS-relevant help on a JS > page even if you don't have some fancy syntax highlighting thing. > > * For things that are 'fancy views of plain text' like the WikiEditor > iframe mode and CodeEditor, the formatting toolbars etc work fairly > straightforwardly; we just need to get at some selection of text, spit back > a modified bit of text, and fiddle around the selection or view. This > probably won't adapt so well for a rich visual editor; so we may need an > adaptor layer to let plain-text and enhanced-text editors fiddle with the > wikitext sample fragments while a rich editor has its own adaptor that turns > high-level calls like 'wrap in a link' or 'make bold' and does the > appropriate AST & view modifications. > > * A few of WikiEditor's experimental fields require the iframe mode and > force it to switch in; may need something to avoid ambiguity when we're > deliberately using a different mode. > > * Probably would be good to add a specific notion of switching editor > modes; WikiEditor's preview tab opens up _surrounding_ the editor, but if we > switch between plaintext & syntax-highlighting, we probably want a toggle on > the toolbar which just swaps the guts around. > > -- brion > >

12 years, 10 months

EditSurface Prototyping

by Trevor Parscal

I have been working on some experimental approaches to working around contentEditable/designMode which involve rendering a cursor, selection paining, and manually flowing text. http://svn.wikimedia.org/viewvc/mediawiki/trunk/parsers/wikidom/ The requirements for this system, no matter what the approach, are here: http://etherpad.wikimedia.org/EditSurfaceInterface - Trevor

12 years, 10 months

Current visual editor/parser development snapshot

by Sumana Harihareswara

Brion Vibber this week told me about what he, Neil, and Trevor are working on regarding parser/visual editor, so here's a snapshot. Please correct it if it's inaccurate. Brion focusing on the parser and visual editor, as well as MediaWiki code review. Brion, Trevor, and Neil are still working on the early parts! Brion is doing preliminary test work with CodeEditor, and says "ParserPlayground gadget will add more of that code soon". Trevor's investigating the editing surface work and some early DOM tests. Neil's on combining DOM transforms & planning for the editor communication connection. And Erik Moeller and I are grabbing some community folks, several of whom are from Wikia, to coordinate contributions. Inez Korczyński, for example, is interested in contributing. Maciej Brencz has just put up a short description of how Wikia's editor internals work - http://www.mediawiki.org/wiki/Future/Wikia_Reverse_Parser . Right now we're strongly looking for parser test cases and Abstract Syntax Trees. Once we have a stabler base, maybe in August or September, there'll be more opportunity to implement plugins and UI extensions. More info to come via this list and also live on http://www.mediawiki.org/wiki/Future and its subpages. -- Sumana Harihareswara Volunteer Development Coordinator Wikimedia Foundation

12 years, 10 months

new parser test cases

by Sumana Harihareswara

Maciej Brencz, thanks for updating me! Maciej told me that Mike Schwartz from Wikia recently added a list of Wikia's "test cases and situations we need to fallback to source mode": <http://www.mediawiki.org/wiki/Future/Parser_test_cases>. I'll put together a short description of how we handle parsing of wikitext to HTML and reverse parsing of HTML back to wikitext in our Rich Text Editor. I hope that together we will make significant improvement to the MW Parser! I hope so too, Maciej. Thanks! Welcoming parser test cases from all interested parties; stick 'em on that page. -- Sumana Harihareswara Volunteer Development Coordinator Wikimedia Foundation

12 years, 10 months

Hackathon parser session initial notes

by Brion Vibber

We had a whole bunch of folks who've had their hands in the world of MediaWiki parsing & rich text editing here at the Berlin Hackathon, and made some great progress on setting out some ideas for how to start actually working on it. Tomorrow I'll distill our session notes into a clearer description of the core ideas & next steps (dare I say... a manifesto? :) In the meantime, if you're brave you can peek at the raw session notes: http://etherpad.wikimedia.org/mwhack11Sat-Parser We're reviving the wikitext-l mailing list for people interested in the project; it's gotten some traffic about interesting projects but we'll be making it an active working group. I'll also be making regular posts here on wikitech-l, on the Wikimedia tech blog, and on the wikis -- but I don't want to clutter wikitech-l *too* much with the nitty-gritty details. ;) Project hub pages will go up tomorrow at http://www.mediawiki.org/wiki/Future -- brion vibber (brion @ wikimedia.org / brion @ pobox.com)

12 years, 11 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Wikitext-l June 2011