Rowan Collins wrote:
On 26/04/05, Tim Starling
<t.starling(a)physics.unimelb.edu.au> wrote:
On
25/04/05, Tim Starling <timstarling(a)users.sourceforge.net> wrote:
As I said on #mediawiki when I was whinging about bug 1770, if a
solution involves globals, it's the wrong solution. I'm not surprised
that my quick hack caused side-effects, but I really hate link table
corruption so I wanted to fix that first.
Well, as I say, in the code I'm working on (will upload to
http://bugzilla.wikimedia.org/show_bug.cgi?id=689 soon; no, really, I
finally finished what I'd started) - and I notice in your recent
change also - that hack doesn't require a global, because it is
calling a Parser function from within Parser.
Interesting, you suggested splitting makeImageLinkObj(), which is
exactly the idea I came up with independently, and implemented. Great
minds think alike.
But that was, of course, beside the point: the
question is, should it
be possible to replace the link-holders on only part of the text, or
is this not an intended ability? The way the function was implemented
(using a global array and a string of text as an input) suggested that
it didn't operate on the article as a whole, but emptying the array of
link-holders each call violated that.
However, changes since seem to have reverted to this behaving as
expected (the link-holders are replaced in the provided string, and
others links still work).
My original idea was to replace link placeholders in the output buffer,
not the parser output text. This is the scheme I helped X13 to
implement. This unfortunately had serious problems, and it was soon
moved to Parser.php (1.329) by Will Mahan. His CVS log entry was:
"Move replaceLinkHolders() from OutputPage to Parser, because
it needs to happen before unstripNoWiki() and before tidy.
This also makes the parser more self-contained, so there is
no need to create an OutputPage object for the parser
tester."
The use of a global variable to hold the links was an inconsistent
holdover from before this move, the parser state is the obvious place to
put this information.
Emptying the link array on each call was a misguided attempt to fix bug
1770, I'm not advocating or defending it. My more recent changes were
mostly code refactoring, I was careful not to change behaviour
significantly.
Perhaps a
better solution would be to break up Skin::makeLinkObj() to
the point where a workalike function can be put in Parser with minimal
code duplication. The function in Parser would add link holders to a
Parser member variable, and the Skin::makeLinkObj() would just resolve
links immediately. Skin::makeLinkObj() would mainly be called for
navigation links and special pages.
As far as I understand what you're suggesting here, it doesn't really
solve the current issue, which is that within an image caption, the
links need to be parsed recursively, but not replaced with
link-holders - so that, when used in alt and/or title attributes, the
HTML tags can be stripped back out and leave the text.
The current approach does seem kind of wasteful, in that it replaces
them with link-holders and then immediately resolves them afterwards;
but in order to not replace them at all, the whole of
replaceInternalLinks() needs to run in a "resolve immediately" kind of
way. Was this what the "postParseLinkColor" system (which I notice
you've just removed) used to do, or was that something different?
I don't think a "resolve immediately" mode is a good way to make the
parser faster. Optimisation of DB query count is the most important
thing. I considered replacing <!--LINK--> tags in the alt text with
<!--TLINK-->, which would be later replaced by stripped-down link text,
the only thing in my way was Sanitizer::stripAllTags(), which was
removing the comments.
The resolve immediately mode was removed from Linker because its
function has been replaced by Linker::makeLinkObj(), which always
resolves immediately, and Parser::makeLinkHolder(), which never does.
This removes the need to bracket calls to Linker::makeLinkObj() with
mode changes.
In general, which approach do you think seems better?
(A)
$foo = $this->replaceInternalLinks($foo);
$foo = $this->replaceLinkHolders($foo);
(B)
$this->resolveLinksStraightAway(true); # or $skin->... or whatever
$foo = $this->replaceInternalLinks($foo);
$this->resolveLinksStraightAway(false); # or $skin->... or whatever
I think A is better, because it requires one query, not one per link.
But if we can delay the call to replaceLinkHolders() to once per parse
rather than once per image caption, that would be much better than either.
I guess the third option is to only use placeholders
for the part of
the markup that depends on the target's existence, which is presumably
just a CSS class or whatever (thus leaving other parts of the code to
separate the HTML tags from the displayed text). The problem then
being people who want the old-style "link-name?" rather than
colour-distinctions, I guess.
Yes, I think that's a good idea. It's not just the class, it's also the
URL. You'd have to process the HTML at the end of the transformation,
find the links, do the existence check query, and modify the tags. To
handle question marks you'd have to find the matching </a> and add
something after it. If that's too hard, remove the feature, few people
will miss it. Least of all developers like me who are obsessed with
parser cache hit rate.
-- Tim Starling