On Wed, 8 Feb 2012 15:20:41 +0100, Mihaly Heder <hedermisi(a)gmail.com>
If wikitext is going to be replaced the new language
should be
designed on an abstract level first.
This is correct but if we're talking about
a universal DOM that could
represent all potential syntax and has space for extensions (nodes of
the new type can be safely added in future) then new markup can be
discussed in general terms before the DOM itself.
It doesn't really matter unless we start correlating DOM and markup -
then it will be time for BNFs.
So the real question is whether a new-gen wiki syntax
will be
compatible with a consensual data model we might have in the future.
I don't
think it's a good idea to design wiki DOM and new wiki syntax
separately, otherwise it'll be the same trouble current wikitext is
stuck in.
The real problem is whether the core devs is interested in new markup
at all or not. I don't think anything difficult in designing new DOM
except a few tricky places (templates, inclusions) but it should not
take another year to be complete, definitely not.
On Wed, 8 Feb 2012 07:42:33 -0700, Stanton McCandlish
<smccandlish(a)gmail.com> wrote:
I'm a "geek" and do not
"dislike" or "despise" XML/[X]HTML or WYSIWYG or
wikimarkup. They all have their uses for different users and even the
same user in different situations for different purposes.
Indeed, but my point was
that XML is hardly more usable in text
editing environments than some convenient wiki markup. You seem to
agree with this later in your text.
applying a 'class="work-title"',
would produce factually incorrect output
(i.e., in one context at least, outright *corrupt data*) that said that a
comma-space was part of the title of the work.
This is because wikitext allows
dealing with underlying/resulting HTML
on low-level. A proper markup must abstract the user out of everything
so he can't just insert a tag wherever he feels is pertinent. If a
user does need to insert a tag then the markup is not well-planned and
must be corrected.
This will both increase security (XSS prevention, etc.), uniformity
(someone writes <b>, someone <strong>) and portability - this in
particular because this is why current wikitext is so problematic with
all of its low-level HTML stuff that must be transformed on upgrade.
It's crucial that I be able to tweak stuff at the
character-by-character level, and alter the markup around that content in
any way I need to.
Good point. Also, in text-only environments text tools like
Search &
Replace can be used and not only to edit text itself but its markup as
well.
But for actual article drafting, in prose sentences
and paragraphs, as
opposed to tweaking, I vastly prefer WYSIWYG. I seriously doubt I'm
alone in any of this, even in the combination of preferences I've
outlined.
This might be true. This only seconds the point everybody seems to
agree on - to have both markup- and WYSIWYG editors in one place.
On Wed, 8 Feb 2012 16:06:55 +0100, "Oren Bochman" <orenbochman(a)gmail.com>
I disagree that that xhtml is a geek only storage
format or that the
current Wikisyntax has a lower learning curve.
This is exactly the problem of
current wikitext. I would compare it
with C++ and "ideal" wiki markup - with Pascal or even BASIC.
I think that an xml subset is the ideal should be the
underlying format.
Underlying format NOT MEANT for human interaction directly. Not
by
non-geeks. This is what I meant under "storage format".
This could provide interoperability with other wikis
format and a
friendlier variant of the existing wiki markup.
Good point.
easy to parse (unambiguous, won't require context
or semantics to parse)
This definition should be extended to
"context-specific" because some
items might be ambiguous but used in different places. For example,
anything inside a code block is unprocessed can can be as ambigous as
the editor desires - this is the point of a code block. It only needs
to have a proper end token.
Would be fully learnable in a couple of hours...
Starting editor should be able to learn the new markup in 5 minutes.
Or have all of its basic formatting listed under a small help box.
If we put our heads together and come up with
something like that we will
make some real progress.
This is what I'm trying to push here. One point that
keeps me from
starting doing this myself and presenting the results is whether this
research will actually be used by the MediaWiki team - Gabriel says
it's "all planned" which I read "things just won't get
worse".
On Wed, 8 Feb 2012 16:27:57 +0100, Mihaly Heder <hedermisi(a)gmail.com>
But then there are millions of pages already written
in legacy
wikitext and those must be editable with the new editor. So right now
instead the rational approach, an empirical one should be taken - they
have to rather ''find'' than invent a good enough model for those old
articles, and also store everything in the old format.
This is bad practice. I
agree that the amount of pages written in
"outdated" markup is overwhelming; however, this only means that the
migration layer must be well-tested and thoroughly written, nothing
else. If you will "find" with a "good enough model" you will end up
with the same millions of pages (or more by that time) that will
hopefully use slightly better markup.
After all, even if some hundreds of wiki pages cannot be converted in
completely automatic mode Wikipedia/WMF has enough staff to fix this
in a sane period.
At this time a cardinal action must be taken to eliminate old syntax
completely, one and for all. Otherwise the same discussion will arise
several years later (after several more years of "searching for a
model").
I just mean that noone standed up and proposed
"Hey, this would look
better in this different way".
All those millions of people who edited pages
didn't think this
actually looks wrong? I doubt it very much, perhaps there was just no
place to tell their thoughts or there was no one to hear because "this
is fine for amateurs".
Anyone can create new templates, with any name and
parameters he wishes.
Templates are powerful but widely abused feature since they
can be
used to hide parser/markup bugs. I even think templates should only be
created by devs after discussion, otherwise it results in what we see
now.
3. {{About
"Something, something and something", of kind}}
As you can see, no character is banned from the title (...)
What about the
separator? Eg. [[The character "]]
Nothing, it's fine. Two options exist
for the parser:
1. Either it treats all " as starting a new context and thus [[The
character "]]"]] actually creates a link with caption <The character
"]]">.
2. Or it treats ]] as an ultimate token and standalone " is output as is.
Right, and
pipes should not appear in templates either. It's too special
symbol.
Why so? So far the only reason you gave is that it's not on all
keyboard
layouts.
And is not used in most languages, yes. Is it bad enough reason? Why
choose it for an international project like MediaWiki, if there are
alternatives?
* remote
links can also get their title from <title> after fetching
first 4 KiB of that page or something
No way. That can be good for preloading a
title on link insertion and
storing it indefinitely, but not for doing it every time.
Of course not every time,
the engine might maintain the cache with
remote links or somehow else alleviate the traffic. And it can be
disabled and then the parser wil use some other means of generating
title for titleless external links.
Only if pages with no spaces are more common than
pages with spaces in
the name.
Taking enwiki articles as a sample:
* 7746101 articles with space.
* 1416235 articles without space.
Thanks for the statistics. Well, then my point
about "half of the
cases" isn't fair; however, this doesn't change the fact the pipe
isn't as universal as double equality sign which can still typed with
the same speed and is less prone to misoperations because it's double
and has less chances to appear in-text.
However, you could take advantage of the
space-is-underscore, and use
[[Some_page this page]] (but still not 'clean')
Yes, this is not very clean
and relies on parser/engine behavior. It
should be fine to have "Some page" and "Some_page" as two different
pages.
Nitpicking, first heading has just one equal sign [at
each side] :)
And this is the problem. Even DokuWiki uses not less than 2
"=" for
headings. First-level heading appears so rare in the document that it
can have "==". Actually, since a document has just one first-level
heading all others (2+) can use two "==" as well because there's no
sense in creating second-level heading before the document title
(first-level).
I think MediaWiki currently lets the user create even the 6th level
heading before the doc title?
Standardizing
is fine unless it starts looking unnatural. The following
example might be argued but I can't think of another one quickly:
tar -czf file.tar.gz .
Not a bad example, as that's one of those utilities
with odd parameters
"... The different styles were developed at different times ..."
Yes,
you've got my idea.
That's a source of problems. It's fine having
dumb programs that you
need to walk-through. When the programs are smart, if they don't go up
to the leve, that's an issue.
This is true but this just requires more
conscious developer. Nobody
will argue that it's harder to write smart programs that dump programs
following certain preexisting conventions (e.g. cryptic *nix CL
interface that can explain everything or nearly so).
When designing a text markup why should we follow bad guidelines?
No. Clean syntax of
1. Foo
2. Bar
3. Baz
This is the syntax I have suggested for ordered lists earlier. "1. 1.
1." only compliments it.
Not at all
because we are talking about context-specific grammar.
Addresses in links can hold no formatting and thus all but context
ending tokens (]], space and ==) are ignored there.
Oh, you're not autolinking
urls.
I didn't really understand that.
Well, it seems like the thread ends here.
Signed,
P. Tkachenko