Wikitext-l August 2011

wikitext-l@lists.wikimedia.org

8 participants
7 discussions

by Mihály Héder

Dear Wikitext experts, please, check out Sztakipedia, a new Wiki RTE at: http://pedia.sztaki.hu/ (please check the video first, and then the tool itself) which aims at implementing some of the Visions you described here: http://www.mediawiki.org/wiki/Future/Parser_plan (the RTE part) Some background: Sztakipedia did not start out as an editor for Wikipedia. It was meant to be a web-based editor for UIMA annotated rich content, supported with natural language background processing. The tool was functional by the end of 2010, and we wanted a popular application to demonstrate its features, so went on applying it to Wiki editing. To do that, we have made some wiki-specific stuff: -After checking out many parsers, we have created a new one in JavaCC -Created lots of content helpers based on dbpedia, like the link recommendation, infobox recommendation, infobox editor help -Integrated external resources to help editing, like the Book Recommendation or Yahoo-based category recommendation Sztakipedia is right now in its alpha phase, with many show stoppers, like handling cite references properly, or editing templates embedded in templates, etc... I am aware that you are working on a new syntax, parser and RTE and they will eventually become the official ones for Wiki editing (Sztakipedia is in Java anyway). However, I still think that there is much to learn from our project. We will write a paper next month on the subject and I will be honored is some of you read and comment it. The main contents will be: -problematic stuff in the current wikitext syntax we struggled with -usability tricks, like extracting the infobox pages to provide help for the fields, showing the abstracts of the articles to be linked -recommendations, machine learning to support the user+ background theory Our plan right now is to create an API for our recommendation services and helpers and a MediaWiki js plugin to get its results to the current wiki editor. This way I hope the results of this research - which started out as a rather theoretical one - will be used in a real world scenario by at least a few people. I hope we will be able to extend the your planned new RTE the same way in the future. Please, share with me your thoughs/comments/doubts about Sztakipedia. Also I wanted to ask some things: -Which is the most wanted helper feature according to you: infobox/category/link recommendation? External data import from the Linked Open Data? (Like our Book Recommender right now which has millions of book records in it?) Field _value_ recommendation for infoboxes from the text? Other? -How do you measure the performance of a parser? I saw hints to some 300 parser test cases somewhere... -Which is the best way to mash up external services to support the Wiki editor interface (because if you call an external REST service from JS in mediawiki, it will be cross-site scripting I'm afraid)? Thank you very much, Best Regards Mihály Héder MTA Sztaki, Budapest, Hungary

12 years, 7 months

Visual editor & parser updates from August

by Sumana Harihareswara

Here are some updates from the parser & visual editor front from the last few weeks. The wikitech-l list has had some relevant discussion around: * a wiki object model: http://lists.wikimedia.org/pipermail/wikitech-l/2011-August/054499.html * deprecated wikitext markup: http://lists.wikimedia.org/pipermail/wikitech-l/2011-August/054603.html * the template system: http://lists.wikimedia.org/pipermail/wikitech-l/2011-August/054649.html The visual editor folks are in a research and development phase, exploring other systems and techniques (research) and prototyping a proof of concept (development). Trevor and Inez have been making substantial progress on the visual editor. Check out their work: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/author/tparscal and http://www.mediawiki.org/wiki/Special:Code/MediaWiki/author/inez . They spent part of August working on the transaction protocol/blocking piece. There are two possible models: 1. we save the entire doc at every keystroke, 2. we build it as a series of events, and if we want to undo, we can just reverse it. Insert, move, & annotate are now reversible. They are replacing the current system with a transaction-based system, which will give us better collaboration, & better undo for free as well. But this is still in research, very much in flux. Neil is working on a demo of the Etherpad integration with MediaWiki, but much of his time has been taken up with working on the Upload Wizard, as you can see in http://www.mediawiki.org/wiki/Special:Code/MediaWiki/author/neilk . He hopes to get it done ASAP. We're interested in seeing how MediaWiki's trunk developers can reuse existing Wikia code. Wikia and Wikimedia Foundation have been collaborating and sharing ideas, code, and designs. Since chat and collaborative editing share auth requirements, Neil wrote an Identity API extension for that. I believe this is it: /branches/extensions-realtime/IdentityApi >From the notes I'm seeing, parser progress is a blocker for visual editor progress right now -- maybe Brion can speak more to that? In version one of the visual editor, should we have lots of broken features, or a few working features? It looks like we'll start by picking a set of use cases and supporting only those for a version 1. For example, for new page creation, we don't even need the parser. And Trevor really wants to get something out in the wild to see what people do with it. Almost all of this is from meeting notes so I may have some things wrong -- please correct me if I do! We do continue to need help from people with significant experience in user facing, highly interactive applications and with serious JavaScript skills. If you fit that description and can spare some time, take a look at the code links above and reply to the list, or to Trevor, Neil, or Brion. -- Sumana Harihareswara Volunteer Development Coordinator Wikimedia Foundation

12 years, 8 months

Fwd from [Wikitech-l]: Cleaning up deprecated html in WikiText

by Sumana Harihareswara

Original thread: http://lists.wikimedia.org/pipermail/wikitech-l/2011-August/054603.html Sumana Harihareswara Volunteer Development Coordinator Wikimedia Foundation Forwarded conversation Subject: [Wikitech-l] Cleaning up deprecated html in WikiText ------------------------ From: *Daniel Friesen* <lists(a)nadir-seen-fire.com> Date: Thu, Aug 11, 2011 at 5:39 AM To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org> Right now in wide usage on wiki markup like so is used: {| |- |valign=top width=100%| |} The bgcolor, cellpadding, cellspacing, valign, align, width, height, etc... presentational attributes have all been completely removed from html5 and pages using these attributes aren't valid. There's no way we'll expect all the instances of valign and width to disappear from every wiki on their own. And frankly in the context of authoring WikiText I don't believe the user should have to care about that and be forced to write a longer style line. What are people's opinions on the idea of taking these removed presentational attributes, and turning them into sugared parts of WikiText that are output as actual css in the output. The change would essentially mean that this: |valign=top width=100%| Would become: <td style="vertical-align: top; width: 100%;"> Instead of this: <td valign="top" width="100%"> I can only find one downside. Text browsers like w3m do make use of valign but don't support css, hence the change does make the valign revert to normal vertically centered alignment. I should make a few notes: - This doesn't even affect all text browsers. lynx doesn't display tables in a tabular form and hence doesn't care what type of alignment attributes you have. - This has absolutely nothing to do with web accessibility; Screen readers output to things like audio and braille, and hence don't display things visually so alignment means nothing to them. And w3 appears to restrict users with poor eyesight to proper css capable browsers. Standards for web accessibility for users with poor eyesight seams focused on things like ensuring usability with screen zooms and larger fonts, rather than expecting users with bad eyesight to use text browsers. -- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name] _______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ---------- From: *Daniel Kinzler* <daniel(a)brightbyte.de> Date: Thu, Aug 11, 2011 at 6:13 AM To: daniel(a)nadir-seen-fire.com, Wikimedia developers < wikitech-l(a)lists.wikimedia.org> This sounds good to me. I'd like to take this opportunity to once more float the idea of having two types of "tables" in wikitext: * "grids" used for layout, that can be nested, and may be rendered as html tables but would better be rendered using divs with appropriate styles. * actual data tables, which can not contain block elements in their cells (and can not be nested), for actual tabular data. i think this would make for cleaner wiki text and for cleaner html output. -- daniel

12 years, 8 months

New parser ponderings

by Magnus Manske

Inspired by Brion's slides (couldn't make it to Haifa myself), some random questions and musings: - Is there a definition / "complete" example of the JSON output of the new parser somewhere? I didn't see it on the parser pages... - Will there be multiple "resolutions" of parsing? One would be template name and key-value-pair parameters, another would be the template replaced with the corresponding wikitext, another one the template replaced with the corresponding wikitext parsed into JSON. Either all-in-one large JSON object, or one of those "on demand"? Also, extension tag/attributes/contents, rendered extension output, WikiSource transclusions etc. - One of the functions I have issues with in WYSIFTW is copy&paste. Besides making it work in the new editor, would it be worth to add special behaviour for (cut|copy)/paste between articles? Like, automatically adding the source article link to the edit description, so the source of text can be traced, even if it's just manually? - Toolserver access to full wiki text is a pain. Once the new parser is live (even if it's "only" in parallel with the old one), could we have new, fast access capability for both raw wikitext and parser JSON output on the toolserver? I mean that in addition to API parser output, which I take as a given here :-) - Will there be a JSON-in-XML dump besides the current wikitext-in-XML one? - Will there be an interface for to the parser for JavaScript tools /outside/ edit mode? I'm thinking "Add a reference", "insert image" etc. Just getting a char-based WikiText position from a mouse click would be very helpful indeed, so the user can click where he wants the reference in the rendered HTML, and JS can insert it at the corresponding WikiText position. - A point discussed endlessly before: As a "side effect" of the new parser, will we store page-template-passed_value triplets in the database? Think {{Information}} on Commons. - Will there be an import page or JS function for parser JSON objects? Think Word/OpenOffice export, or "paste HTML" (with JS HTML-to-JSON converter). That should keep us busy for a while... ;-) Magnus

12 years, 8 months

Templates

by Victor Vasiliev

Hello! I have one question which bothered me every time I thought about WYSIWYG in MediaWiki (in fact, it was the only issue that seemed a *real* obstacle to me). How new WYSIWYG editor is going to handle templates? --vvv

12 years, 8 months

Re: [Wikitext-l] [Wikitech-l] WOM

by Brion Vibber

On Mon, Aug 1, 2011 at 11:18 AM, Magnus Manske <magnusmanske(a)googlemail.com>wrote: > In case you missed it all the way over there in Haifa:-) > > > http://dirkriehle.com/2011/07/29/technical-report-on-wom-an-object-model-fo… > Their model, while interesting and an excellent reference, makes some explicit choices that diverge from what we're currently working on: * ugly but common structures where eg tables are opened/closed across templates are not supported * not all input is representable They're not awful decisions -- and we might have made them 8-10 years ago had anybody made an attempt to *plan* the markup language. ;) But we have an existing data set of millions of documents that we have to support, and for the first next-generation parser I'm hoping to basically define something that's *very close* to how the current parser works, so that that first decade of Wikipedia documents can be fully used with a specified parser anytime in the future. We can make the structures cleaner later and deprecate the old tables & whatnot -- parser functions and such allow for beautifully nested structures and a future wysiwyg world will take most of the low-level *markup* out of normal editors' faces -- but for now we have to make it work with what we've got. ;) -- brion

12 years, 9 months

Fwd: [Wikitech-l] WOM

by Sumana Harihareswara

Magnus sent this link and I think you'd be interested: "Abstract: Wikipedia is a rich encyclopedia that is not only of great use to its contributors and readers but also to researchers and providers of third party software around Wikipedia. However, Wikipedia’s content is only available as Wikitext, the markup language in which articles on Wikipedia are written, and whoever needs to access the content of an article has to implement their own parser or has to use one of the available parser solutions. Unfortunately, those parsers which convert Wikitext into a high-level representation like an abstract syntax tree (AST) define their own format for storing and providing access to this data structure. Further, the semantics of Wikitext are only defined implicitly in the MediaWiki software itself. This situation makes it difficult to reason about the semantic content of an article or exchange and modify articles in a standardized and machine-accessible way. To remedy this situation we propose a markup language, called XWML, in which articles can be stored and an object model, called WOM, that defines how the contents of an article can be read and modified." -------- Original Message -------- Subject: [Wikitech-l] WOM Date: Mon, 1 Aug 2011 19:18:48 +0100 From: Magnus Manske <magnusmanske(a)googlemail.com> Reply-To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org> To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org> In case you missed it all the way over there in Haifa:-) http://dirkriehle.com/2011/07/29/technical-report-on-wom-an-object-model-fo… _______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

12 years, 9 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Wikitext-l August 2011