Am 11.03.2016 um 11:20 schrieb Markus Kroetzsch:
Maybe the community needs a bit more explanation as to
why you "consciously"
decide to override their judgement.
The idea is to give the community a tool to explicitly model their judgement
that something is an identifier, and introduce that idea of external identifiers
into the software exactly because that need was expressed by the community.
Relevant use cases: linking, mapping, and UI structure.
The use of property P1921 clearly tells you
what the community wants. If we want to have URIs only for some subset of
properties, then we will use P1921 only on a subset. It is very easy and gives
us complete control. The use of ExternalId as an additional restricting
mechanism is neither helpful nor desired.
Can you given an example of something you want to map to a URI, but that is not
an external identifiers? There are probably edge cases, and thinking about them
and deciding on the desired semantics is a good thing, I believe.
We can decide for ourselves which
properties should have URIs exported for them, without needing conscious but
unprincipled development decisions to constrain us.
"unprincipled", wow. The decision followed the principle that we want to have
software that is extensible and maintainable, and we want a data model that
makes explicit the semantics of values. Following these principles, the
declaration of what a value is dictates what you can do with it. That's the
basic idea of object oriented design.
Of course, it would be possible to ditch these principles, and use the "duck
typing" approach: anything that has a formatter URL could be linked, etc. But
that introduces several problems:
* modeling: values can suddenly stop "being" identifiers, or become other
things, based on the statements on the property definition. This can lead to
inconsistencies in the way values are represented in dumps etc.
* implementation: we would either need to hard code a special case, or a
mechanism to apply all kinds of behaviors (formatting, mapping, parsing, etc)
based on all kinds of statements on properties. We can hard code for a few
things, but a general mechanism would hardly be scalable or maintainable. We do
have a solid and simple mechanism based on data types that works fine to cover
the use cases for external identifiers.
* stability: if we base more and more behavior of the software on properties and
statements defined by the community, the community would no longer be free to
modify such properties and statements. That would break the software. We do
compromise about this sometimes: Wikibase can be configured to know about a few
properties and items (such as P1630). But we should be careful about it, because
it takes away control from the community.
* consistency: You can't link just any kind of value based on a formatter uri.
That only works for string values, and probably shouldn't be done for string
values that have the "url" data type. So linking would only work for properties
declared to be plain strings per their data type. Again, behavior is bound to
the data type.
These principles are actually why we have data types at all. You were there when
we decided for having them. If we don't care about the points above, we wouldn't
need data types at all, value types would be sufficient. Everything else would
be covered by "if it quacks like a duck...". That would mean a less expressive
data model, and more complicated software. A lot more complicated, if you want
to apply this for everything.
It would be helpful if you could share some pointers
(1) to the original
announcement and documentation for this restricting behaviour for URI exports
(clearly, this information is vital for the ongoing discussion on property
conversion),
It's a modeling tool, not a restriction. If there are things that should be
mapped to URIs but for some reason shouldn't have the ExternalId type, we should
look at these edge cases closely to find out what is wrong. Since clearly, if
it's not an identifier of some sort, it can't sensibly have a URI, and if it is
an identifier of some sort, there should be no reason not to mark it as such to
the software, by making it an ExternalId.
and (2) to the discussions have lead to this design
(surely you
must have consulted with some RDF/SPARQL users and developers to conclude that
some P1921 should be ignored).
I do not think any should be ignored. I think that properties that use P1921
should be ExternalIds. Please explain why you would not want that.
I am really curious to learn what "we"
refers to
in "we made a conscious decision".
Decisions about the design and implementation of the software are made by the
development team ("us"), based on requirements and considerations on technical
as well as the product level, which in turn is informed from community
interaction, among other things.
As is often the case, solutions that have to be maintainable and scalable are
not quite as nice as one-off solutions for a special case. MediaWiki is
conservative about adding special case features for good reasons: it's quite
complex as it is, if it had tried to cater to every special case, it would have
collapsed under its own weight a long time ago.
The idea is to generalize from special cases, and implement something that will
work for many more cases, even though it perhaps covers only 90% of what you
could do by catering to the special case directly.
Of course, overly generic multi-option multi-purpose mechanisms should also be
avoided, because they are hard to understand and hard to maintain. So a balance
needs to be found.
Trying to strike that balance, in 2012 we (in this case including you, iirc)
designed data types to be a simple yet sufficiently generic mechanism for
associating behavior with values. So now we use it to associate behavior with
values (like mapping to URLs and URIs), and I am very reluctant to introduce
another mechanism for associating behavior with values.
--
Daniel Kinzler
Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.