Hi, just wanted to forward this note from wikitech-l and clarify what
wikitext-l seems to be for now. :-)
Thanks,
Sumana
-------- Original Message --------
Subject: Re: [Wikitech-l] VE: why editing a paragraph opens the whole page?
Date: Fri, 02 Aug 2013 07:55:49 -0400
From: Sumana Harihareswara <sumanah(a)wikimedia.org>
Organization: Wikimedia Foundation
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
CC: David Gerard <dgerard(a)gmail.com>
On 08/02/2013 07:49 AM, David Gerard wrote:
> On 2 August 2013 12:43, rupert THURNER <rupert.thurner(a)gmail.com> wrote:
>
>> i tried to create a bug to discuss splitting VE up to only edit parts of a
>> page, see below. andre klapper suggested this ideally is broken up into
>> requirements easier to implement, and i should post this to wikitext-l. as
>> i did not see any discussion going on there about VE i tried here. please
>> forward if this to the appropriate channel if i did not get it right again.
>
>
> wikitext-l has had literally no posts since early June. It's certainly
> not where VE discussion is going on. Why would someone asking about VE
> be directed there?
Wikitext-l used to be where we talked about Parsoid *and* VE. Now that
discussion of VE has moved to wikitech-l I've altered the list
description at https://lists.wikimedia.org/mailman/listinfo/wikitext-l
accordingly; I reckon wikitext-l is now just for parser/wikitext
discussion. Sorry for the confusion.
--
Sumana Harihareswara
Engineering Community Manager
Wikimedia Foundation
Hi,
I'm trying to get the same output for the references, from the Parsoid
as with the PHP Parser. I use to make tests with the first reference of
the following article:
https://en.wikipedia.org/wiki/Kiwix
== Call ==
* PHP: 100+ languages<sup id="cite_ref-sourceforge_1-0"
class="reference"><a
href="#cite_note-sourceforge-1"><span>[</span>1<span>]</span></sup>
* Parsoid (simplified): 100+ languages<span class="reference"
id="cite_ref-sourceforge-1-0"><a href="#cite_note-sourceforge-1">[1]</a>
Parsoid uses a <span> tag instead a <sup> tag, I guess this is normal,
but I'm not sure...
== References (multiple calls in this example) ==
* PHP: ^ <a
href="#cite_ref-sourceforge_1-0"><sup><i><b>a</b></i></sup></a> <a
href="#cite_ref-sourceforge_1-1"><sup><i><b>b</b></i></sup></a></span>
<span class="reference-text"><span class="citation web"><a
rel="nofollow" class="external text"
href="http://sourceforge.net/projects/kiwix/">"Kiwix"</a>
* Parsoid (simplified): ↑ <a href="#cite_ref-sourceforge-1-0">1.0</a> <a
href="#cite_ref-sourceforge-1-1">1.1</a></span><span><span
class="citation web"><a href="http://sourceforge.net/projects/kiwix/"
class="external">"Kiwix"</a>
The differences:
* "↑", "^", seems to me strange... Why?
* Each reference content is a number instead of a letter, for example
"1.0" instead of "a". In Arabic this is not "a". BTW, in the arabic
language the letter shouldn't be "a", but the first letter of the arabic
alphabet. Here again I don't know if this is normal...
* No styling for the references, for example "1.0" instead of
<sup><i><b>a</b></i></sup>. Normal?
* It seems to me the parsoid forgets to put a space just behind de "1.1"
reference, so the text of reference is "stick/concatenated" visually.
Seems to me to be a small bug.
Could you please tell me which of these things are features, and which
one are bugs, so I can know what I should fix on my side?
Kind regards
Emmanuel
--
Kiwix - Wikipedia Offline & more
* Web: http://www.kiwix.org
* Twitter: https://twitter.com/KiwixOffline
* more: http://www.kiwix.org/wiki/Communication
Hi,
Try the following code at:
http://parsoid.wmflabs.org/_rtform/
=====================
{{Geobox|River
| country = Kyrgyzstan
| mouth_lat_d = 46| mouth_lat_m = 09| mouth_lat_s = 15| mouth_lat_NS =N
| mouth_long_d =60 | mouth_long_m =52 | mouth_long_s =25 | mouth_long_EW =E
| discharge = 1180
| discharge_note = foobar1<ref>http://myref.url</ref>
}}
foobar2<ref>http://myref.url</ref>
== References ==
<references/>
=====================
As a result, I get a click-able link only for the second reference, not
for first, although both are absolutely identical: "http://myref.url.
Looks like a small bug to me.
Emmanuel
--
Kiwix - Wikipedia Offline & more
* Web: http://www.kiwix.org
* Twitter: https://twitter.com/KiwixOffline
* more: http://www.kiwix.org/wiki/Communication
Hi!
I am Abhishek Das, a student developer from Indian Institute of Technology Roorkee, India. I have been understanding MediaWiki's code and have started contributing by solving a few bugs marked easy. Here is my user page: http://www.mediawiki.org/wiki/User:Abhshkdz
I have wide experience in building applications in PHP, MySQL & JavaScript (both client-side & server-side: Node.js). You can see my Github profile here: https://github.com/abhshkdz.
Recently, I have been working on a lot of real-time web applications using server set events, websockets as well as Socket.IO. I have experience in building collaborative document editors. This is a video conferencing with collaborative document editing application that I built during a 24-hour hackathon and won it: https://github.com/abhshkdz/hackview. It uses webRTC for Peer-to-peer video conferencing, and collaborative doc editing over shareJS.
Few days back, I built an open-source version of WorkFlowy (http://workflowy.com) using Backbone.js and Socket.IO: https://github.com/abhshkdz/HackFlowy. It has been gaining a lot of traction on Github with lots of stars and forks. I'm really excited about this one :D. The tasks get synced real-time using socket.io and mysql is used as the database.
I wanted to implement a real-time collaborative editor in Wikimedia's VisualEditor as well.
As pointed out by Sumana Harihareswara some work has already been done on this. I wanted to know what all is under development at the moment and how I can contribute in making it perfect.
Looking forward to hearing from you soon.
Thanks
Abhishek Das
Abhishek Das
B. Tech. (2nd year)
Electrical Engineering
IIT Roorkee
I'd like to get involved in the parsoid effort.
I've been hanging out in the IRC channel on freenode but there's usually
just a dozen lurkers and no action.
In the mailing lists parsoid seems to be mentioned about as often in
wikitext-l and wikitech-l - which one is best to ask questions of this
nature?
I'd like to scratch my own itch rather than necessarily go after things on
the todo list and roadmap.
Basically I'm interested in what parsoid can do for parsing wikitext markup
into HTML (or other formats).
I want to use it without a mediawiki install and without an internet
connection. I see there is already some kind of support for reading in
articles from compressed dump files.
Any suggestions where I should start or where I can hang out to chat live
with people who could help getting me involved?
Andrew Dunbar (hippietrail)
For context: I've been working on replacing the html5 and jsdom modules
(which depend on the native 'contextify' module) with the pure-javascript
'domino' implementation of DOM4. This seems to be faster, cleaner, and fix
some bug caused by jsdom's eccentric DOM handling. Domino is (in my brief
experience) more reliable and standards-compliant.
Here's a list of issues I came across in the process:
* There were 3 new failures in wt2html tests. (There were also some new
passes, so the number of correct tests increases on net.) They are:
1) "expansion of multi-line templates in attribute values (bug 6255 sanity
check 2)"
For reference, this test looks like:
!! test
> Expansion of multi-line templates in attribute values (bug 6255 sanity
> check)
> !! input
> <div style="background:
> #00FF00">-</div>
> !! result
> <div style="background: #00FF00">-</div>
> !! end
> !! test
> Expansion of multi-line templates in attribute values (bug 6255 sanity
> check 2)
> !! input
> <div style="background: #00FF00">-</div>
> !! result
> <div style="background: #00FF00">-</div>
> !! end
I'm not sure how this test ever passed in jsdom -- the inputs here are
actually identical to an HTML parser, since hex-escape decoding happens
very early. But apparently the wikitext parser should defer processing of
the 
 somehow? On the domino branch our HTML serialization now uses the
upstream standard HTML5-serialization algorithm, which doesn't escape
newlines. (
http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#se…)
Note that the first test also involves whitespace normalization, which the
PHP parser does (see
https://www.mediawiki.org/wiki/Special:Code/MediaWiki/14689) but parsoid
does not do. (I've got a patch to do whitespace normalization in parsoid
if there's interest, but it causes other tests to break.)
What's the plan to handle cases like this? Is it really important to
generate the in the output?
2) "Play a bit with r67090 and bug 3158"
This is a parsoid-only test which looks like:
> !! test
> Play a bit with r67090 and bug 3158
> !! options
> disabled
> !! input
> <div style="width:50% !important"> </div>
> <div style="width:50% !important"> </div>
> <div style="width:50% !important"> </div>
> <div style="border : solid;"> </div>
> !! result
> <div style="width:50% !important"> </div>
> <div style="width:50% !important"> </div>
> <div style="width:50% !important"> </div>
> <div style="border : solid;"> </div>
> !! end
In standard HTML serialization,   is encoded uniformly as so
even if you wanted to be bug-compatible with the 'border :' style, you
should be emitting a not a   there. The other two cases are
whitespace normalization within attributes (again). I'm guessing jsdom
(incorrectly) did this by default whether you wanted it or not; you need to
explicitly add attribute-normalization into the domino case if that's
desired. (But there's some other reason why the 'border :' case is failing
now which needs to be chased down, unrelated to the   vs issue.)
3) "Parsoid-only: Table with broken attribute value quoting on consecutive
lines"
> !! test
> Parsoid-only: Table with broken attribute value quoting on consecutive
> lines
> !! options
> disabled
> !! input
> {|
> | title="Hello world|Foo
> | style="color:red|Bar
> |}
> !! result
> <table>
> <tr>
> <td title="Hello world">Foo
> </td><td style="color: red;">Bar
> </td></tr></table>
> !! end
jsdom used to insert the extraneous semicolon at the end of the 'style'
attribute. domino does not. I believe this test case is broken and the
extraneous semicolon should be removed.
* Other observed bugs & failures:
http://parsoid.wmflabs.org/en/Pi gives:
> TypeError: Cannot assign to read only property 'ksrc' of #<KV>
> at AttributeExpander._returnAttributes
> (/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/ext.core.AttributeExpander.js:71:20)
> at AttributeTransformManager.process
> (/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/mediawiki.TokenTransformManager.js:1017:8)
> at AttributeExpander.onToken
> (/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/ext.core.AttributeExpander.js:46:7)
> at AsyncTokenTransformManager.transformTokens
> (/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/mediawiki.TokenTransformManager.js:568:17)
> at AsyncTokenTransformManager.onChunk
> (/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/mediawiki.TokenTransformManager.js:356:17)
> at SyncTokenTransformManager.EventEmitter.emit (events.js:96:17)
> at SyncTokenTransformManager.onChunk
> (/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/mediawiki.TokenTransformManager.js:904:7)
> at PegTokenizer.EventEmitter.emit (events.js:96:17)
> at PegTokenizer.process
> (/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/mediawiki.tokenizer.peg.js:88:11)
> at ParserPipeline.process
> (/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/mediawiki.parser.js:360:21)
http://localhost:8000/simple/Game gives:
> starting parsing of Game
> *********** ERROR: cs/s mismatch for node: A s: 3808; cs: 3821 ************
> completed parsing of Game in 1491 ms
* [[File:]] tag parsing for images appears to be incomplete:
a) alt= and class= are not parsed
b) 'thumb' and 'right' should result in <img class="thumb tright" /> or
some such, but there doesn't appear to be an indication of either option in
the parsoid output.
* I'd like to see title and revision information in the <head>
* Interwiki links are not converted to relative links when the "interwiki"
is actually the current wiki. (Maybe this isn't really a bug.)
Let's discuss these a bit and I'll file bugzilla tickets for the bits we
can agree are actually bugs. ;)
--scott
--
( http://cscott.net/ )
[Resending without the full list of articles, which caused the message to
be bounced into moderation.]
Here are the results of a quick test I ran over the weekend, comparing a
compressed excerpt from simple.wikipedia.org in mediawiki markup to the
compressed parsoid representation of the same articles. The list of
articles is attached to this message. [Not any more.]
For the base case I used the processing pipeline for the OLPC's "Wikipedia
activity", source code at github.com/cscott/wikiserver
It begins with a hand-written "portal page", then grabs all articles within
two links of the portal page. The original markup was taken from
the simplewiki-20130112-pages-articles.xml dump. Templates were then fully
expanded, and just the selected articles were written. Articles are
separated by the character 0x01, a newline, the title of the article, a
newline, the length of the article in bytes, a newline, and the character
0x02 and a newline.
For comparison, I took the list of articles included in the dump and wrote
a small script to fetch them from parsoid, using the HEAD of the master
branch from this weekend (2013-02-17, roughly). I wrote the full parsoid
HTML document (including top-level <html> tag, <head>, <base href>, and
<body> but not including a <!DOCTYPE>) to a file, separating articles with
the title of the article, a newline, the length of the article in bytes,
and a newline.
Results, with and without compression:
# of articles: 3640
Mediawiki markup, uncompressed: 18M
Parsoid markup, uncompressed: 199M
Mediawiki markup, gzip -9 compressed: 6.4M
Parsoid markup, gzip -9 compressed: 26M
Mediawiki markup, bzip2 -9 compressed: 4.7M
Parsoid markup, bzip2 -9 compressed: 17M
Mediawiki markup, lzma -9 compressed: 4.4M
Parsoid markup, lzma -9 compressed: 15M
So there's currently a 10x expansion in the uncompressed size, but only
3-4x expansion with compression.
--scott
--
( http://cscott.net/ )