Hi Ben, hi Max, hi Wikidata-istas (another one for the list),
thanks for the support. Good question, and an important one too! I did
not want to expand the proposal much more, so there is only one little
paragraph I added about upcoming Wikidata query functionality ("Wikibase
query services" under "Tools, technologies, and techniques"). Here I
will expand a bit more on why these things are quite different.
Question: What is the difference between the planned Wikidata query
service and the proposed Wikidata Toolkit?
Answer: First, let me assure you that a Wikidata query service is
coming. This has been an important project for a while now and a lot of
work in that direction is under way. My proposal neither replaces nor
prepares or supersedes this; they are just two different things.
The proposed toolkit is meant to support developers who want to work on
the data. A basic requirement to do this will be to run some forms of
queries, so this should somehow be supported. However, it is not clear
yet what form these queries should take. I am definitely interested in
features that resemble the "tree" feature in Magnus's Wikidata Query (a
kind of transitive closure over properties). This is something that is
not currently planned for Wikidata, and indeed it requires a type of
recursion that is best implemented in memory (like Magnus does) but hard
to get done efficiently if you delegate queries to MySQL (as Wikidata
does). I am also interested in constraints and in rules (again, Magnus
has some precedence for this in his Reasonator and bot proposals). This,
too, will require some form of recursion which is very hard to realise
on a relational DB (I tried it; others tried it; it just does not get
anywhere near the performance of in-memory systems, even on large data,
whether you use MySQL or Oracle).
Running some operation on data recursively is quite natural when you are
in a programming environment and your data is in a data structure. I
would like the Wikidata Toolkit to support this way of handling data.
This is very far from a (complex or easy) query language that captures
all possible queries in one general format. Of course there are query
languages that support regular expressions on binary relations
(so-called path queries), and there are also query languages with
recursion (e.g., Datalog), but my goal is not to create a service for
one such language. One could do that on top of the Toolkit for more than
one query language.
Besides all this querying, there are also some other tasks that are hard
to phrase as a query but are rather a kind of computational analysis.
You might want to do this in a programming language, or you might just
want to use the Toolkit to export a large matrix file that you can feed
into R or Matlab. Again, this is not something you would get from the
query service, even though you might need queries to get the data you need.
Now compared to this, the Wikidata plans are much more focussed and (for
this particular task) much more advanced than my proposal (which
proposes to start the work by working out what to do, see Task T1 ;-).
The plans for Wikidata are based on a language in the style of SMW's
#ask, i.e., a language where you have neither JOINs nor variables
explicitly -- instead, JOINS are implicit and "tree-shaped", i.e., there
are no cyclic relationships. A simple example of a query that is not
tree-shaped is "which people who were born in the same town that they
died in?" another is "which people are the child of married parents?".
Neither of these can be asked in SMW. There are many other features
where query languages can differ. It is clear to me that the
requirements should not all be satisfied by a single Wikidata query
service -- that would probably lead to a rather bloated and inefficient
service, too. Instead, Wikidata will focus on the most important
Wikipedia-based use cases first. The Toolkit should be "compatible" with
Wikidata's query support (maybe even have a representation for Wikibase
query objects), but it should also allow to explore other query types.
I hope this clarifies a bit the differences between Wikibase's upcoming
query web service and the Toolkit. Both activities should still benefit
one another: the Toolkit will be a good basis for exploring new query
features and implementation approaches; the web service will be a
convenient way to access live data (even our little Wikidata Analytics
script already accesses the Wikidata Web API when it needs data that
would take too long to seek inside a huge dump). So, summing up, all
will be good. :-)
Cheers,
Markus
On 30/09/13 21:43, Benjamin Good wrote:
Markus,
I already cast in my vote of support for you, but I had the same
question. If you could clarify the boundaries between what you are
doing and what wikidata is doing directly, that would be very helpful.
-Ben
On Mon, Sep 30, 2013 at 12:32 PM, Klein,Max <kleinm(a)oclc.org
<mailto:kleinm@oclc.org>> wrote:
Hello Markus,
Your draft proposal, seems so obvious when I read it, but I would
have never thought about proposing it myself. Despite the fact that
my research is cited as an example of a motivating capability, and
that it was 2 weeks of needless headache to code, I had always
thought that magical "Phase 3" was coming to solve our query woes. I
don't see that highlighted in your IEG. In fact from what I know,
there are still plans from the official Wikidata team to build
advanced query functionality. Although I do remember Denny saying
that the team was scrapping the "Phase" development paradigm, so
maybe I missed something along the way.
Anyway, I think you should address more the fact that this work is
ostensibly planned from the main Wikidata grant, and why this extra
work - or extra attention - is needed in addition.
Ps. Wikidatian, or Wikidatum are my faves so far.
Maximilian Klein
Wikipedian in Residence, OCLC
+17074787023 <tel:%2B17074787023>
________________________________________
From: wikidata-l-bounces(a)lists.wikimedia.org
<mailto:wikidata-l-bounces@lists.wikimedia.org>
<wikidata-l-bounces(a)lists.wikimedia.org
<mailto:wikidata-l-bounces@lists.wikimedia.org>> on behalf of Markus
Krötzsch <markus(a)semantic-mediawiki.org
<mailto:markus@semantic-mediawiki.org>>
Sent: Sunday, September 29, 2013 5:11 AM
To: Discussion list for the Wikidata project.
Subject: [Wikidata-l] Wikidata Toolkit: call for feedback/support
Dear Wikidatanions (*),
I have just drafted a little proposal for creating more tools for
external people to work with Wikidata, especially to build services on
top of its data [1]. Your feedback and support is needed.
Idea: Currently, this is quite hard for people, since we only have WDA
for reading/analysing dumps [2] and Wikidata Query as a single web
service to ask queries [3]. We should have more support for programmers
who want to load, query, analyse, and otherwise use the data. The
proposal is to start such a toolkit to enable more work with the data.
The plan is to kickstart this project with a small team using
Wikimedia's Individual Engagement program. For this we will need your
support -- feel free to add your voice to the wiki page [1]. Of course,
comments of all sorts are also great -- this email thread will be linked
from the page. If you would like to be involved with the project, that's
great too; let me know and I can add you to the proposal.
The proposal will already be submitted tomorrow, but support should also
be possible after that, I hope.
Cheers,
Markus
(*) Do we have a demonym yet? Wikipedian sounds natural, Wikidatan less
so. Maybe this should be another thread ... ;-)
[1]
https://meta.wikimedia.org/wiki/Grants:IEG/Wikidata_Toolkit
[2]
http://github.com/mkroetzsch/wda
[3]
http://208.80.153.172/wdq/
--
Markus Kroetzsch, Departmental Lecturer
Department of Computer Science, University of Oxford
Room 306, Parks Road, OX1 3QD Oxford, United Kingdom
+44 (0)1865 283529 <tel:%2B44%20%280%291865%20283529>
http://korrekt.org/
_______________________________________________
Wikidata-l mailing list
Wikidata-l(a)lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________
Wikidata-l mailing list
Wikidata-l(a)lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________
Wikidata-l mailing list
Wikidata-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l