Wikitext-l

wikitext-l@lists.wikimedia.org

239 discussions

Strange parsing result: lots of additional empty paragraphs

by Emmanuel Engelhart

Hi, Have a look to this rendering: http://parsoid.wmflabs.org/fi/Kypros The parser adds a bunch of this line at the beginning: </span><p data-parsoid='{"fostered":true,"autoInsertedEnd":true}' about="#mwt2"><br data-parsoid="{}"></p><span about="#mwt2" data-parsoid="{}"> Looks pretty strange and, to me, not understandable... Bug or Feature? Emmanuel -- Kiwix - Wikipedia Offline & more * Web: http://www.kiwix.org * Twitter: https://twitter.com/KiwixOffline * more: http://www.kiwix.org/wiki/Communication

10 years, 9 months

wikitext-l now seems to be just discussion of Parsoid & wikitext

by Sumana Harihareswara

Hi, just wanted to forward this note from wikitech-l and clarify what wikitext-l seems to be for now. :-) Thanks, Sumana -------- Original Message -------- Subject: Re: [Wikitech-l] VE: why editing a paragraph opens the whole page? Date: Fri, 02 Aug 2013 07:55:49 -0400 From: Sumana Harihareswara <sumanah(a)wikimedia.org> Organization: Wikimedia Foundation To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org> CC: David Gerard <dgerard(a)gmail.com> On 08/02/2013 07:49 AM, David Gerard wrote: > On 2 August 2013 12:43, rupert THURNER <rupert.thurner(a)gmail.com> wrote: > >> i tried to create a bug to discuss splitting VE up to only edit parts of a >> page, see below. andre klapper suggested this ideally is broken up into >> requirements easier to implement, and i should post this to wikitext-l. as >> i did not see any discussion going on there about VE i tried here. please >> forward if this to the appropriate channel if i did not get it right again. > > > wikitext-l has had literally no posts since early June. It's certainly > not where VE discussion is going on. Why would someone asking about VE > be directed there? Wikitext-l used to be where we talked about Parsoid *and* VE. Now that discussion of VE has moved to wikitech-l I've altered the list description at https://lists.wikimedia.org/mailman/listinfo/wikitext-l accordingly; I reckon wikitext-l is now just for parser/wikitext discussion. Sorry for the confusion. -- Sumana Harihareswara Engineering Community Manager Wikimedia Foundation

10 years, 9 months

[Parsoid] Many questions about references HTML output

by Emmanuel Engelhart

Hi, I'm trying to get the same output for the references, from the Parsoid as with the PHP Parser. I use to make tests with the first reference of the following article: https://en.wikipedia.org/wiki/Kiwix == Call == * PHP: 100+ languages<sup id="cite_ref-sourceforge_1-0" class="reference"><a href="#cite_note-sourceforge-1"><span>[</span>1<span>]</span></sup> * Parsoid (simplified): 100+ languages<span class="reference" id="cite_ref-sourceforge-1-0"><a href="#cite_note-sourceforge-1">[1]</a> Parsoid uses a <span> tag instead a <sup> tag, I guess this is normal, but I'm not sure... == References (multiple calls in this example) == * PHP: ^ <a href="#cite_ref-sourceforge_1-0"><sup><i><b>a</b></i></sup></a> <a href="#cite_ref-sourceforge_1-1"><sup><i><b>b</b></i></sup></a></span> <span class="reference-text"><span class="citation web"><a rel="nofollow" class="external text" href="http://sourceforge.net/projects/kiwix/">"Kiwix"</a> * Parsoid (simplified): ↑ <a href="#cite_ref-sourceforge-1-0">1.0</a> <a href="#cite_ref-sourceforge-1-1">1.1</a></span><span><span class="citation web"><a href="http://sourceforge.net/projects/kiwix/" class="external">"Kiwix"</a> The differences: * "↑", "^", seems to me strange... Why? * Each reference content is a number instead of a letter, for example "1.0" instead of "a". In Arabic this is not "a". BTW, in the arabic language the letter shouldn't be "a", but the first letter of the arabic alphabet. Here again I don't know if this is normal... * No styling for the references, for example "1.0" instead of <sup><i><b>a</b></i></sup>. Normal? * It seems to me the parsoid forgets to put a space just behind de "1.1" reference, so the text of reference is "stick/concatenated" visually. Seems to me to be a small bug. Could you please tell me which of these things are features, and which one are bugs, so I can know what I should fix on my side? Kind regards Emmanuel -- Kiwix - Wikipedia Offline & more * Web: http://www.kiwix.org * Twitter: https://twitter.com/KiwixOffline * more: http://www.kiwix.org/wiki/Communication

10 years, 11 months

[PARSOID] Strange behaviour if ref and template are mixed

by Emmanuel Engelhart

Hi, Try the following code at: http://parsoid.wmflabs.org/_rtform/ ===================== {{Geobox|River | country = Kyrgyzstan | mouth_lat_d = 46| mouth_lat_m = 09| mouth_lat_s = 15| mouth_lat_NS =N | mouth_long_d =60 | mouth_long_m =52 | mouth_long_s =25 | mouth_long_EW =E | discharge = 1180 | discharge_note = foobar1<ref>http://myref.url</ref> }} foobar2<ref>http://myref.url</ref> == References == <references/> ===================== As a result, I get a click-able link only for the second reference, not for first, although both are absolutely identical: "http://myref.url. Looks like a small bug to me. Emmanuel -- Kiwix - Wikipedia Offline & more * Web: http://www.kiwix.org * Twitter: https://twitter.com/KiwixOffline * more: http://www.kiwix.org/wiki/Communication

10 years, 11 months

Collaborative Real-time Editing in VisualEditor

by Abhishek Das

Hi! I am Abhishek Das, a student developer from Indian Institute of Technology Roorkee, India. I have been understanding MediaWiki's code and have started contributing by solving a few bugs marked easy. Here is my user page: http://www.mediawiki.org/wiki/User:Abhshkdz I have wide experience in building applications in PHP, MySQL & JavaScript (both client-side & server-side: Node.js). You can see my Github profile here: https://github.com/abhshkdz. Recently, I have been working on a lot of real-time web applications using server set events, websockets as well as Socket.IO. I have experience in building collaborative document editors. This is a video conferencing with collaborative document editing application that I built during a 24-hour hackathon and won it: https://github.com/abhshkdz/hackview. It uses webRTC for Peer-to-peer video conferencing, and collaborative doc editing over shareJS. Few days back, I built an open-source version of WorkFlowy (http://workflowy.com) using Backbone.js and Socket.IO: https://github.com/abhshkdz/HackFlowy. It has been gaining a lot of traction on Github with lots of stars and forks. I'm really excited about this one :D. The tasks get synced real-time using socket.io and mysql is used as the database. I wanted to implement a real-time collaborative editor in Wikimedia's VisualEditor as well. As pointed out by Sumana Harihareswara some work has already been done on this. I wanted to know what all is under development at the moment and how I can contribute in making it perfect. Looking forward to hearing from you soon. Thanks Abhishek Das Abhishek Das B. Tech. (2nd year) Electrical Engineering IIT Roorkee

11 years, 1 month

IRC chats this week for Q&A about Lua scripting on wikis

by Sumana Harihareswara

Hi, all! As you might have seen on the Wikimedia tech blog <https://blog.wikimedia.org/2013/03/11/lua-templates-faster-more-flexible-pa…> or the tech ambassadors list <http://lists.wikimedia.org/pipermail/wikitech-ambassadors/2013-March/000171…>, we enabled Lua on all Wikimedia sites last week. Lua <https://www.mediawiki.org/wiki/Lua> is a scripting language that enables wiki editors to write faster and more powerful MediaWiki templates. We're pretty excited about this, partly because of the promise of Lua + Wikidata <https://blog.wikimedia.org/2013/03/14/what-lua-scripting-means-wikimedia-op…>. To enable Lua templating on your own wiki, install the Scribunto extension <https://www.mediawiki.org/wiki/Extension:Scribunto>. If you have questions about how to convert existing templates to Lua (or how to create new ones), we'll be holding two support sessions on IRC this week: one on Wednesday <http://www.timeanddate.com/worldclock/fixedtime.html?hour=02&min=00&sec=0&d…>(for Oceania, Asia & America) and one on Friday <http://www.timeanddate.com/worldclock/fixedtime.html?hour=18&min=00&sec=0&d…> (for Europe, Africa & America); see m:IRC office hours <https://meta.wikimedia.org/wiki/IRC_office_hours> for details. If you can't make it, you can also get help at mw:Talk:Lua scripting <https://www.mediawiki.org/wiki/Talk:Lua_scripting>. If you'd like to learn about this kind of event earlier in advance, consider becoming a Tech ambassador <https://meta.wikimedia.org/wiki/Tech/Ambassadors> by subscribing to the mailing list <https://lists.wikimedia.org/mailman/listinfo/wikitech-ambassadors>. Thanks, Sumana -- Sumana Harihareswara Engineering Community Manager Wikimedia Foundation P.S. Thanks to Guillaume Paumier for writing the wikimediaannounce-l post I modded to write this! http://lists.wikimedia.org/pipermail/wikimediaannounce-l/2013-March/000614.…

11 years, 2 months

Parsoid blog post & job opening

by Gabriel Wicke

Hi, we just published a blog post about Parsoid at http://blog.wikimedia.org/2013/03/04/parsoid-how-wikipedia-catches-up-with-… We are also looking for somebody to join us in our Parsoid adventure: http://hire.jobvite.com/Jobvite/Job.aspx?j=oIsbXfw2&c=qSa9VfwQ Cheers, Gabriel

11 years, 2 months

Getting started hacking parsoid

by Andrew Dunbar

I'd like to get involved in the parsoid effort. I've been hanging out in the IRC channel on freenode but there's usually just a dozen lurkers and no action. In the mailing lists parsoid seems to be mentioned about as often in wikitext-l and wikitech-l - which one is best to ask questions of this nature? I'd like to scratch my own itch rather than necessarily go after things on the todo list and roadmap. Basically I'm interested in what parsoid can do for parsing wikitext markup into HTML (or other formats). I want to use it without a mediawiki install and without an internet connection. I see there is already some kind of support for reading in articles from compressed dump files. Any suggestions where I should start or where I can hang out to chat live with people who could help getting me involved? Andrew Dunbar (hippietrail)

11 years, 3 months

A grab bag of parsoid issues

by C. Scott Ananian

For context: I've been working on replacing the html5 and jsdom modules (which depend on the native 'contextify' module) with the pure-javascript 'domino' implementation of DOM4. This seems to be faster, cleaner, and fix some bug caused by jsdom's eccentric DOM handling. Domino is (in my brief experience) more reliable and standards-compliant. Here's a list of issues I came across in the process: * There were 3 new failures in wt2html tests. (There were also some new passes, so the number of correct tests increases on net.) They are: 1) "expansion of multi-line templates in attribute values (bug 6255 sanity check 2)" For reference, this test looks like: !! test > Expansion of multi-line templates in attribute values (bug 6255 sanity > check) > !! input > <div style="background: > #00FF00">-</div> > !! result > <div style="background: #00FF00">-</div> > !! end > !! test > Expansion of multi-line templates in attribute values (bug 6255 sanity > check 2) > !! input > <div style="background: 
#00FF00">-</div> > !! result > <div style="background: 
#00FF00">-</div> > !! end I'm not sure how this test ever passed in jsdom -- the inputs here are actually identical to an HTML parser, since hex-escape decoding happens very early. But apparently the wikitext parser should defer processing of the &#10 somehow? On the domino branch our HTML serialization now uses the upstream standard HTML5-serialization algorithm, which doesn't escape newlines. ( http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#se…) Note that the first test also involves whitespace normalization, which the PHP parser does (see https://www.mediawiki.org/wiki/Special:Code/MediaWiki/14689) but parsoid does not do. (I've got a patch to do whitespace normalization in parsoid if there's interest, but it causes other tests to break.) What's the plan to handle cases like this? Is it really important to generate the 
 in the output? 2) "Play a bit with r67090 and bug 3158" This is a parsoid-only test which looks like: > !! test > Play a bit with r67090 and bug 3158 > !! options > disabled > !! input > <div style="width:50% !important"> </div> > <div style="width:50% !important"> </div> > <div style="width:50% !important"> </div> > <div style="border : solid;"> </div> > !! result > <div style="width:50% !important"> </div> > <div style="width:50% !important"> </div> > <div style="width:50% !important"> </div> > <div style="border : solid;"> </div> > !! end In standard HTML serialization,   is encoded uniformly as   so even if you wanted to be bug-compatible with the 'border :' style, you should be emitting a   not a   there. The other two cases are whitespace normalization within attributes (again). I'm guessing jsdom (incorrectly) did this by default whether you wanted it or not; you need to explicitly add attribute-normalization into the domino case if that's desired. (But there's some other reason why the 'border :' case is failing now which needs to be chased down, unrelated to the   vs   issue.) 3) "Parsoid-only: Table with broken attribute value quoting on consecutive lines" > !! test > Parsoid-only: Table with broken attribute value quoting on consecutive > lines > !! options > disabled > !! input > {| > | title="Hello world|Foo > | style="color:red|Bar > |} > !! result > <table> > <tr> > <td title="Hello world">Foo > </td><td style="color: red;">Bar > </td></tr></table> > !! end jsdom used to insert the extraneous semicolon at the end of the 'style' attribute. domino does not. I believe this test case is broken and the extraneous semicolon should be removed. * Other observed bugs & failures: http://parsoid.wmflabs.org/en/Pi gives: > TypeError: Cannot assign to read only property 'ksrc' of #<KV> > at AttributeExpander._returnAttributes > (/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/ext.core.AttributeExpander.js:71:20) > at AttributeTransformManager.process > (/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/mediawiki.TokenTransformManager.js:1017:8) > at AttributeExpander.onToken > (/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/ext.core.AttributeExpander.js:46:7) > at AsyncTokenTransformManager.transformTokens > (/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/mediawiki.TokenTransformManager.js:568:17) > at AsyncTokenTransformManager.onChunk > (/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/mediawiki.TokenTransformManager.js:356:17) > at SyncTokenTransformManager.EventEmitter.emit (events.js:96:17) > at SyncTokenTransformManager.onChunk > (/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/mediawiki.TokenTransformManager.js:904:7) > at PegTokenizer.EventEmitter.emit (events.js:96:17) > at PegTokenizer.process > (/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/mediawiki.tokenizer.peg.js:88:11) > at ParserPipeline.process > (/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/mediawiki.parser.js:360:21) http://localhost:8000/simple/Game gives: > starting parsing of Game > *********** ERROR: cs/s mismatch for node: A s: 3808; cs: 3821 ************ > completed parsing of Game in 1491 ms * [[File:]] tag parsing for images appears to be incomplete: a) alt= and class= are not parsed b) 'thumb' and 'right' should result in <img class="thumb tright" /> or some such, but there doesn't appear to be an indication of either option in the parsoid output. * I'd like to see title and revision information in the <head> * Interwiki links are not converted to relative links when the "interwiki" is actually the current wiki. (Maybe this isn't really a bug.) Let's discuss these a bit and I'll file bugzilla tickets for the bits we can agree are actually bugs. ;) --scott -- ( http://cscott.net/ )

11 years, 3 months

Size: Parsoid output vs Mediawiki markup

by C. Scott Ananian

[Resending without the full list of articles, which caused the message to be bounced into moderation.] Here are the results of a quick test I ran over the weekend, comparing a compressed excerpt from simple.wikipedia.org in mediawiki markup to the compressed parsoid representation of the same articles. The list of articles is attached to this message. [Not any more.] For the base case I used the processing pipeline for the OLPC's "Wikipedia activity", source code at github.com/cscott/wikiserver It begins with a hand-written "portal page", then grabs all articles within two links of the portal page. The original markup was taken from the simplewiki-20130112-pages-articles.xml dump. Templates were then fully expanded, and just the selected articles were written. Articles are separated by the character 0x01, a newline, the title of the article, a newline, the length of the article in bytes, a newline, and the character 0x02 and a newline. For comparison, I took the list of articles included in the dump and wrote a small script to fetch them from parsoid, using the HEAD of the master branch from this weekend (2013-02-17, roughly). I wrote the full parsoid HTML document (including top-level <html> tag, <head>, <base href>, and <body> but not including a <!DOCTYPE>) to a file, separating articles with the title of the article, a newline, the length of the article in bytes, and a newline. Results, with and without compression: # of articles: 3640 Mediawiki markup, uncompressed: 18M Parsoid markup, uncompressed: 199M Mediawiki markup, gzip -9 compressed: 6.4M Parsoid markup, gzip -9 compressed: 26M Mediawiki markup, bzip2 -9 compressed: 4.7M Parsoid markup, bzip2 -9 compressed: 17M Mediawiki markup, lzma -9 compressed: 4.4M Parsoid markup, lzma -9 compressed: 15M So there's currently a 10x expansion in the uncompressed size, but only 3-4x expansion with compression. --scott -- ( http://cscott.net/ )

11 years, 3 months

← Newer
1
...
4
5
6
7
8
9
10
...
24
Older →

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Wikitext-l