Am 26.08.2013 12:41, schrieb Markus Krötzsch:
Hi Daniel,
if I understand you correctly, you are in favour of equating datavalue types and
property types. This would solve indeed the problems at hand.
The reason why both kinds of types are distinct in SMW and also in Wikidata is
that property types are naturally more extensible than datavalue types.
CommonsMedia is a good example of this: all you need is a custom UI and you can
handle "new" data without changing the underlying data model. This makes it
easy
for contributors to add new types without far-reaching ramifications in the
backend (think of numbers, which could be decimal, natural, positive,
range-restricted, etc. but would still be treated as a "number" in the
backend).
This could be solved using polymorphism: CommonsMedia, IRI, etc could simply
derive from StringValue. Similarly, Percentage could derive from NumberValue, etc.
This is largely academic though, I don't see a good way to transition from the
current system to what I have in mind.
Using fewer datavalue types also improves
interoperability. E.g., you want to
compare two numbers, even if one is a natural number and another one is a decimal.
Indeed. Which is why I'm reluctant to add more, like the IRI type.
There is no simple rule for deciding how many
datavalue types there should be.
The general guideline is to decide on datavalue types based on use cases. I am
arguing for diversifying IRIs and strings since there are many contexts and
applications where this is a crucial difference. Conversely, I don't know of any
application where it makes sense to keep the two similar (this would have to be
something where we compare strings and IRIs on a data level, e.g., if you were
looking for all websites with URLs that are alphabetically greater than the
postcode of a city in England :-p).
Currently, my primary concern are validators and simple renderers to be used
e.g. in diffs. For validation against a max length as well as regular
expressions, it would be useful to be able to treat URLs as strings. The same is
true for basic rendering in diffs.
As for the possible confusion, I think some naming
discipline would clarify
this. In SMW, there is a stronger difference between both kinds of types, and a
fixed schema for property type ids that makes it easy to recognise them.
I try to use "data value type" vs. "property type", but whenever
"data type" is
used, it's unclear what is meant.
In any case, using string for IRIs does not seem to
solve any problem. It does
not simplify the type system in general and it does not help with the use cases
that I mentioned.
Well, for my use cases mentioned above, URLs should be strings :)
What I do not agree with are your arguments about all
of this
being "internal". We would not have this discussion if it were. The data model
of Wikidata is the primary conceptual model that specifies what Wikidata stores.
You might still be right that some of the implementation is internal, but the
arguments we both exchange are not really on the implementation level ;-).
I do not see why it is useful for a property value to expose two types. That's
the situation we currently have, and it's confusing. For a canonical
representation, there should be only one type, namely the one that is needed to
be able to fully interpret the value given. Whether a URL can be treated as a
string or not depends on the use case and should be determined be the respective
code. It seems a bad idea to me to try and provide an arbitrary set of base
types with an arbitrary mapping to concrete/semantic types. If anything, a type
hierarchy would make sense.
-- daniel