Per bug 12056, *some* systems appear to have bad interactions with the
new preprocessor, leaving strip markers in place of <nowiki>, <ref>, etc
on page edit.
Neither Tim nor I could reproduce it on our test systems... I tried
switching the new code in briefly on the live system, but alas found
that at least some portion of our production servers are showing it.
Will do some further investigation on this; might be interaction with
another extension, or order of operations, or a weird config issue... Bleah!
Another note -- I found that *something* on the live site is triggering
very deeply nested function calls in the proprocessor expansion, which
triggers the 100-stack-frame-deep recursion bailout in Xdebug which is
running on one box for diagnostic purposes.
Stack trace: http://wikitech.leuksman.com/view/Preprocessor_stack_trace
-- brion vibber (brion @ wikimedia.org)
Any wiki markup appears to render in order from left to right, take
effect at the opening code, and end at either closing code or paragraph,
so closing code is effectively unnecessary if the end of a paragraph
accomplishes the same result:
''italics test''<br>
... works as well as
''italics test<br>
... and so on. On the one hand, I'd like this NOT to generate an error.
On the other hand ... well, if Word Perfect would just take over
CFKeditor or FCKeditor or FCUKeditor or whatever it's dang name is, then
we'd have a true reveal codes window "dashboard" below a true wysiwyg
window "windshield" the way Word Perfect and the web were supposed to be
... hasn't this all been worked out already 20 years ago?!? Why are we
reinventing word processors from scratch on every new toy / environment
that gets invented? </rant>
Steve Bennett has been writing a parser grammar, and investigating how
the present parser *actually* works.
Turns out the apostrophe-italic combination only works once a para. Is
this expected?
- d.
---------- Forwarded message ----------
From: Steve Bennett <stevagewp(a)gmail.com>
Date: 27 Nov 2007 15:05
Subject: Re: [Wikitext-l] Determining the behaviour of apostrophes
To: Wikitext-l <wikitext-l(a)lists.wikimedia.org>
On 11/28/07, Jared Williams <jared.williams1(a)ntlworld.com> wrote:
> The code is still missing the searching for an single-letter preceeding a
> bold to split at. Seems none of the tests exercise that particular bit of
> code.
That's a relief. Now that I understand this rule, I think it's a
complete load of bollocks, and should be removed from any notion of
"correct" treatement of wikitext. Mismatched apostrophe groupings
should be considered erroneous input whose rendering is undefined.
Why?
For starters, as discussed, the French wikipedia doesn't even use this
construct. Worse, it only works *once* per paragraph. Look at how this
renders:
* L'''amour'' is great the first time. But l'''amour'' fails the second time.
You guessed it, bold from the first ''' to the second ''', and italics
from the first '' to the second ''. And why would it be any different?
The treatment of 4 apostrophes is much less offensive. This renders correctly:
* L''''amour''' is bold the first time. And l''''amour''' is still
bold the second time.
The 4 apostrophes -> apostrophe, bold rule is at least consistent,
though it's still not intuitive that this: ''''blah'''' put the first
apostrophe in normal text, while the second one is bold. Hard to
believe the user really wants that...
Of course, the only time 4 apostrophes ever renders as anything
*other* than apostrophe followed by bold is when the crazy rule above
is invoked, turning it into two apostrophes followed by italics.
Steve (rambly late at night)
_______________________________________________
Wikitext-l mailing list
Wikitext-l(a)lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikitext-l
I am migrating a production mediawiki installation from 1.5.8/MySQL to
1.11.0/Postgres.
Using the backup procedure [1] to export/import the main wiki data via XML
works, but leaves me needing to run some maintenance scripts to rebuild some
of the internal metadata (e.g. file upload information).
As far as I can see, the maintenance scripts have not yet been updated for
Postgres, but don't necessarily need much patching to be made to work.
Minimal changes to the FiveUpgrade.inc file made it possible for me to get
the rebuildImages.php script to work.
I'm seeking feedback as to whether I've missed something obvious about how
to perform this upgrade, or whether this is work that could usefully be
contributed back to the project, and whether anyone is already working on
this and would like to collaborate ?
Adam
--
[1] http://www.mediawiki.org/wiki/Backup
Hello All,
I'm a wikipedia user from TAIWAN.
I have a small ssh server at home and use it to build
secured connection to internet while I was out.
But I found wikipedia seems have problem with it.
I always got "ERROR" page to tell me "try again later",
until I found a tiny message at the bottom of that page:
-----------------------------
If reporting this error to the Wikimedia System Administrators, please
include the following details:
Request: GET http://ja.wikipedia.org/, from 220.130.167.166 via
yf1002.yaseo.wikimedia.org (squid/2.6.STABLE13) to ()
Error: ERR_ACCESS_DENIED, errno [No Error] at Thu, 22 Nov 2007 13:09:57 GMT
-----------------------------
So, it seems wikipedia server blocked my server or blocked my tunneled
connection thru the server. I sent this problem to Wikipedia information
team (Ticket#2007112210010413) and get a response to tell I post my
problem here.
I waited a few days and confirmed the problem is still exists.
Any suggesstion is welcome.
Brst regards,
Brian, H.S. Chen
Administrator of maid.hschen.idv.tw (220.130.167.166)
Brion said to me a couple of weeks ago "the parser is slow for large
articles, fix it". So along these lines, I have rewritten the preprocessor
phase to make it faster in PHP. I also have plans for further speed
improvement via a partial port to C.
This work was planned and started before the recent parser discussions on
wikitech-l, by Steve Bennett et al. I chose to ignore those discussions to
improve my productivity. Apologies if I'm stepping on any toes.
I'll cover the technical side of this first, and then the impact for the
user in terms of wikitext syntax change.
This text is mostly adapted from my entry in RELEASE-NOTES.
== Technical viewpoint ==
The parser pass order has changed from
* Extension tag strip and render
* HTML normalisation and security
* Template expansion
* Main section...
to
* Template and extension tag parse to intermediate representation
* Template expansion and extension rendering
* HTML normalisation and security
* Main section...
The new two-pass preprocessor can skip "dead branches" in template
expansion, such as unfollowed #if cases and unused defaults for template
arguments. This provides a significant performance improvement in
template-heavy test cases taken from Wikipedia. Parser function hooks can
participate in this performance improvement by using the new
SFH_OBJECT_ARGS flag during registration.
The intermediate representation I have used is a DOM document tree, taking
advantage of PHP's standard access to libxml's efficient tree structures.
I construct the tree via an XML text stage, although it could be done
directly with DOM. My gut feeling was that the XML implementation would be
faster, but I've made the interfaces such that it could be done either
way. The XML form is not exposed.
One reason for using an intermediate representation is so that the parse
results for templates can be cached. The theory is that the cached results
can then be used to efficiently expand templates with changeable
arguments, such as {{cite web}}. ( There's also an expansion cache for
templates expanded with no arguments, such as {{•}}. )
Another reason is that I couldn't see any efficient (O(N) worst-case time
order) way to implement dead branch elimination without an intermediate
representation.
The pre-expand include size limit has been removed, since there's no
efficient way to calculate such a figure, and it would now be meaningless
for performance anyway. The "preprocessor node count" takes its place,
with a generous default limit.
The context in which XML-style extension tags are called has changed, so
extensions which make use of the parser state may need compatibility
changes. Since extension tags are now rendered simultaneously with
template expansion, there is a possibility for future improvement of the
extension tag interface. For example, we could have
preprocessor-transparent tags which act like parser functions, and we
could give extension tags access to the template arguments (i.e. triple
brace expansion).
== User viewpoint ==
The main effect of this for the user is that the rules for uncovered
syntax have changed.
Uncovered main-pass syntax, such as HTML tags, are now generally valid,
whereas previously in some cases they were escaped. For example, you could
have "<ta" in one template, and "ble>" in another template, and put them
together to make a valid <table> tag. Previously the result would have
been "<table>".
Uncovered preprocessor syntax is generally not recognised. For example, if
you have "{{a" in Template:A and "b}}" in Template:B, then "{{a}}{{b}}"
will be converted to a literal "{{ab}}" rather than the contents of
Template:Ab. This was the case previously in HTML output mode, and is now
uniformly the case in the other modes as well. HTML-style comments
uncovered by template expansion will not be recognised by the preprocessor
and hence will not prevent template expansion within them, but they will
be stripped by the following HTML security pass.
The rules for template expansion during message transformation were
counterintuitive, mostly accidental and buggy. There are a few small
changes in this version: for example, templates with dynamic names, as in
"{{ {{a}} }}", are fully expanded as they are in HTML mode, whereas
previously only the inner template was expanded. I'd like to make some
larger breaking changes to message transformation, after a review of
typical use cases.
The header identification routines for section edit and for numbering
section edit links have been merged. This removes a significant failure
mode and fixes a whole category of bugs (tracked by bug #4899). Wikitext
headings uncovered by template expansion or comment removal will still be
rendered into a heading tag, and will get an entry in the TOC, but will
not have a section edit link. HTML-style headings will also not have a
section edit link. Valid wikitext headings present in the template source
text will get a template section edit link. This is a major break from
previous behaviour, but I believe the effects are almost entirely beneficial.
-- Tim Starling
On 11/25/07, aaron(a)svn.wikimedia.org <aaron(a)svn.wikimedia.org> wrote:
> + /**
> + * As we use the same small set of messages in various methods and that
> + * they are called often, we call them once and save them in $this->message
> + */
> + function preCacheMessages() {
> + // Precache various messages
> + if( !isset( $this->message ) ) {
> + $this->message['last'] = wfMsgExt( 'last', array( 'escape') );
> + }
> + }
As a general remark (I've seen this technique in more than one special
page's code), do we really need this? Doesn't MessageCache handle
local message caching anyway?
On 24/11/2007, Simetrical <Simetrical+wikilist(a)gmail.com> wrote:
> But I suspect the larger lag is probably due to the fact that
> MediaWiki is likely to be slower than, for instance, mirrors that only
> need to maintain static content, or mirrors that serve much smaller
> audiences. Many American users complain of slowness too, and living
> in New York, I can testify that Wikipedia is not infrequently
> noticeably slower than mirror sites. It's not something specific to
> places far away from America.
A request:
You'll see there are fundraiser blog posts being linked from the
sitenotice. We could really do with one about the technical structure,
the *remarkable* feat we manage of running this site on nearly nothing
and the fact that more money = better service.
Is there anyone who knows the structure well enough to describe it
simply and has time to write something quickly? Presumably someone
should stick around for the "why don't you ..." questions in the
comments.
- d.