Wikitext-l February 2008

wikitext-l@lists.wikimedia.org

13 participants
9 discussions

by Minh Lê Ngọc

I saw we get a lot of troubles with wikitext. I wonder this idea is possible: we will concentrate to build a wysiwyg editor which produces xml-like content so that: - people don't have to know any wiki language - developer is able to forget wikitext and its complexity To prove that it is possible with a friend of mine, I had write a xml-like language for wiki (in Vietnamese). I can ask him to translate it into English (because I'm not really good in English).

16 years, 2 months

A thought about incorrect syntax

by Steve Bennett

Here's a half-thought about the difficulties of parsing near-correct syntax like this: [[image:foo.jpg|thumb|Some '''valiant''' attempt at including an image that fails because it only has one trailing square bracket.] This is quite expensive to parse. According to current practice, we should render this literally, except that the 'valiant' should be rendered in bold. If text like this were to be frequently parsed, that would add up to a lot of computational effort for not much gain - clearly the user didn't *really* want two square brackets, the word 'image' etc. So some possibilities: - Detect the error at save time, alter the text to some more parser-friendly but equivalent wikitext. eg.: <nowiki>[[</nowiki>image:foo.jpg... - Detect the error at save time, wrap it in some new extension/tag like <error>[[image:...</error> or {{error|[[image:...}} This could have some pretty good benefits (render it in red, generate a list of errors* somewhere...) - Detect the error at render time, and shortcut to displaying strictly literally (ie, not attempting to parse the bolded text within). This way at least you're only parsing it once. (Has implications for the way some security is handled, like escaping & and < chars...) Probably in reality slow parsing of incorrect syntax is a very minor issue. But it was just a thought. Steve * I mean "generate a list of friendly suggestions to the user". Yes, I know everyone goes ballistic at the word "error" and assumes that the user will not be able to save error-ridden syntax. :)

16 years, 2 months

Fwd: [Wikitech-l] Preprocessor syntax in ABNF

by David Gerard

Does this make the ANTLR problem any simpler? - d. ---------- Forwarded message ---------- From: Tim Starling <tstarling(a)wikimedia.org> Date: 17 Feb 2008 06:19 Subject: [Wikitech-l] Preprocessor syntax in ABNF To: wikitech-l(a)lists.wikimedia.org Just a fun little project for my Sunday afternoon: http://www.mediawiki.org/wiki/Preprocessor_ABNF Turns out the production rules are pretty simple. The magic is in the disambiguation. An EBNF representation of the whole of MediaWiki wikitext, if such a thing is possible, would only go a small way towards specifying the language. -- Tim Starling _______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

16 years, 2 months

Draft 10 published

by Steve Bennett

Hi all, I've published what I'm calling (for no good reason) "draft 10" here: http://www.mediawiki.org/wiki/Markup_spec/ANTLR/draft Mostly, I got to a certain level of feature completeness. Specifically, the list of 4 features that were previously missing (tables, magic words, categories and inline HTML) have been implemented. I redid the table stuff - turns out I was getting too fancy for my own good. I now do less semantic checking, and am thus much more tolerant of borderline input. I've also cleaned it up a bit and have roughly grouped all the rules into levels, thus: Top level, block elements: line: (table) => table^ | (headerline) => headerline^ | (listmarker) => listline^ | (hrline) => hrline^ | (spaceline) => spaceline^ | paragraph^ ; Next level, inline text (generally, stuff that appears within a line, and doesn't contain new lines) inline_text @init { text_levels++; } : ( ((LEFT_BRACKET LEFT_BRACKET LEFT_BRACKET) => literal_left_bracket |(literal_left_bracket bracketed_url) => literal_left_bracket |(image) => image |(category) => category |(external_link) => external_link |(internal_link) => internal_link |(magic_link) => magic_link |(magic_word) => magic_word |pre_block |(formatted_text_elem) =>formatted_text_elem ) ((nbsp_before_punctuation) => nbsp_before_punctuation)? ((ws) =>printing_ws)? )+; finally { text_levels --;} The exception there is <pre> blocks which really do contain newlines. Next level down is formatted text, which can appear in places like link captions: formatted_text @init { text_levels++; } : ( (formatted_text_elem) => formatted_text_elem ((nbsp_before_punctuation) => nbsp_before_punctuation)* ((printing_ws) => printing_ws)? )+; finally { text_levels --; } formatted_text_elem: ( (accidental_magic_link) => accidental_magic_link | ((punctuation_before_nbsp)=> punctuation_before_nbsp) | (APOSTROPHES) => bold_and_italics | angle_tag | ((html_entity) => html_entity) | unformatted_characters ); And the very lowest level is unformatted characters: unformatted_characters: (html_dangerous |punctuation |meaningless_characters |digits ); Anyway, when I say "feature complete", most of the major features that I know of are present in some form. None of them is complete in itself (except perhaps images), but it's a start. So what next: suggestions for more features to add would be handy.Also, I need to get around to making it do more than just generate an AST. Theoretically it's not too much work to take the ASP and spit out some kind of XHTML. It would also be nifty if someone could figure out a way of embedding wikitext into the grammar to mark it up somehow. Does section inclusion work yet? If so, would it be possible to insert comments somehow that would allow other pages to transclude sections? Then some of the documention could be stored outside the grammar itself, yet shown alongside... Steve

16 years, 2 months

A happy little milestone

by Steve Bennett

I have successfully parsed my first nested table. It's 3 in the morning but I'm quite happy :) One of the really complicated bits about the nested table syntax is that the contents of multi-line cells looks exactly like normal text (with lists, headers, tables and so forth) except that each row can't begin with a pipe. I tried at least 4 different ways of implementing that rule (my practical ANTLR knowledge is still pretty weak), and finally this simple method worked: nonpipeline: (table) => table^ | (headerline) => headerline^ | (listmarker) => listline^ | (hrline) => hrline^ | (spaceline) => spaceline^ | (nonpipe paragraph?)^ ; It's a complete duplicate of the normal "line" rule, except with the addition of "nonpipe" before the paragraph. Anyway, now it's onto the next round of "yes, the grammar works, now to stop ANTLR spewing 5000 warnings at me". Steve

16 years, 2 months

So, the hardest wikitext construct to parse?

by Steve Bennett

Table rows: {| |- |You parse and parse and parse and read and read and you have no idea whether this is a table cell or a style property for the cell, until you hit either a | or a ||. Oops, it was just a style property, better go back and parse it again. |} That's kind of evil. For big table rows, that could get very expensive to parse. Steve

16 years, 2 months

More table syntax joy

by Steve Bennett

Consider this: {| |- |foo || boo || moo moo moo moo moo moo moo zoo |} Being able to deal with both single-line and multi-line cell contents is quite painful. In the case above, we know that foo and boo are single-liners only by the time we hit the ||. With the moo's, we only know it's *not* a single liner once we hit the newline. And we only know that hte 'zoo' is the continuation of that multiliner once we work out way through all that whitespace. In my delirious state, I'm dreaming up all sorts of friendlier syntax...maybe I should write one up somewhere. :) Steve

16 years, 2 months

Pipetrick weirdness

by Steve Bennett

Never noticed this before. Compare these: [[foo|]] [[foo| ]] Both render as if they were [[foo]], but the first one is replaced by [[foo|foo]] at savetime, while the second one isn't it. Feature or bug? I'm a bit skeptical about the need to transform pipetricks at savetime. I think a developer once explained it as not wanting third party users of wikitext to have to know the transformation rules, but that sounds pretty flimsy to me. Steve

16 years, 2 months

Spaceblock and <pre> weirdness

by Steve Bennett

There is no shortage of curious behaviour in our parser when you look hard enough: foo (note the space at left, this is what I call a 'spaceblock' and renders as <pre>foo</foo>) <pre>foo</pre> (again with space, renders exactly as before - the parser evidently decides the extra <pre> is redundant) But what about: foo </pre> Strangely enough, this renders without a <pre> block at all. Steve

16 years, 2 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Wikitext-l February 2008