Thanks for the update, Erik. I am in the process of designing my own
Wikidata dataset and have some questions which I'll discuss with you
offline.
There are several issues which I've found, though, that are of general
concern so I'll post them here. All of this is based upon the
WiktionaryZ tarball you released a couple months back, so sorry if some
of this is no longer applicable:
1. Wikidata uses a single set of tables for storing all multilingual
content: shorttext , translated_content . The problem with this
approach is that it will not scale: small multilingual tables with
limited ("seed data") content will be forced to do lookups on these
gigantic and ever-growing tables, and exporting a single dataset will
bog-down unnecessarily doing lookups (or even worse, full-table scans!)
on these generic tables. I have put a page on Meta- [[m:Multilingual
Wikidata]]- which discusses an approach for striping tables with
multilingual content
2. The tables for defining languages and language groups seem to be
shared between multilingual Mediawiki (the software) and WiktionaryZ
(user data). I think this is not only conceptually incorrect, but also
a security hole. See my comments from a while back:
[[m:Talk:Ultimate_Wiktionary_data_design#.22Dog_Food.22]]
Anyway, it would be much appreciated if you could post more Wikidata
designs as you come up with them. Thanks, and keep up the good work.