Re: [Wikitech-l] Some WiktionaryZ updates

14 Jun 2006

Thanks for the update, Erik.  I am in the process of designing my own 
Wikidata dataset and have some questions which I'll discuss with you 
offline.

There are several issues which I've found, though, that are of general 
concern so I'll post them here.  All of this is based upon the 
WiktionaryZ tarball you released a couple months back, so sorry if some 
of this is no longer applicable:

1. Wikidata uses a single set of tables for storing all multilingual 
content: shorttext , translated_content .  The problem with this 
approach is that it will not scale: small multilingual tables with 
limited ("seed data") content will be forced to do lookups on these 
gigantic and ever-growing tables, and exporting a single dataset will 
bog-down unnecessarily doing lookups (or even worse, full-table scans!) 
on these generic tables.  I have put a page on Meta- [[m:Multilingual 
Wikidata]]- which discusses an approach for striping tables with 
multilingual content

2. The tables for defining languages and language groups seem to be 
shared between multilingual Mediawiki (the software) and WiktionaryZ 
(user data).  I think this is not only conceptually incorrect, but also 
a security hole.  See my comments from a while back: 
[[m:Talk:Ultimate_Wiktionary_data_design#.22Dog_Food.22]]

Anyway, it would be much appreciated if you could post more Wikidata 
designs as you come up with them.  Thanks, and keep up the good work.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Some WiktionaryZ updates