Hi all,
For many years, Denny and I have been giving talks about why we need to
improve the data management in Wikipedia. To explain and motivate this,
we have often asked the simple question: "What are the world's largest
cities with a female mayor?" The information to answer this is clearly
in Wikipedia, but it would be painfully hard to get the result by
reading articles.
I recently had the occasion of actually phrasing this in SPARQL, so that
an answer can now, finally, be given. The query to run at
http://milenio.dcc.uchile.cl/sparql
is as follows (with some explaining comments inline):
PREFIX : <http://www.wikidata.org/entity/> SELECT DISTINCT ?city
?citylabel ?mayorlabel WHERE {
?city :P31c/:P279c* :Q515 . # find instances of subclasses of city
?city :P6s ?statement . # with a P6 (head of goverment) statement
?statement :P6v ?mayor . # ... that has the value ?mayor
?mayor :P21c :Q6581072 . # ... where the ?mayor has P21 (sex or
gender) female
FILTER NOT EXISTS { ?statement :P582q ?x } # ... but the statement
has no P582 (end date) qualifier
# Now select the population value of the ?city
# (the number is reached through a chain of three properties)
?city :P1082s/:P1082v/<http://www.wikidata.org/ontology#numericValue>
?population .
# Optionally, find English labels for city and mayor:
OPTIONAL {
?city rdfs:label ?citylabel .
FILTER ( LANG(?citylabel) = "en" )
}
OPTIONAL {
?mayor rdfs:label ?mayorlabel .
FILTER ( LANG(?mayorlabel) = "en" )
}
} ORDER BY DESC(?population) LIMIT 100
To see the results, just paste this into the box at
http://milenio.dcc.uchile.cl/sparql and press "Run query".
The query does not filter the most recent population but relies on
Virtuoso to pick the biggest value for DESC sorting, and on the world to
have (mostly) cities with increasing population numbers over time. This
is also the reason why the population is not printed (it would give you
more than one match per city then, even with DISTINCT). Picking the
current population will become easier once ranks are used more widely to
mark it.
There might also be some inaccuracies in cases where a past mayor does
not have an "end date" set in Wikidata (Madrid has a suspiciously large
number of current mayors ...), but a query can only ever be as good as
its input data.
I hope this is inspiring to some of you. One could also look for the
world's youngest or oldest current mayors with similar queries, for example.
Cheers,
Markus
Hi,
[ Aude and Christian Consonni, this should especially interest you. ]
I was throwing around ideas with a friend about how OpenStreetMap could be
integrated with Wikidata.
The thing that I care the most in any software is internationalization.
Having a map in which all labels of towns, streets and everything else is
translated to all languages sounds like a super-wonderful thing.
Wikidata allows labeling everything, translating everything, and attaching
properties to everything, so it sounds like it could be a good match.
But then the question of "what IS everything" came up. Wikidata was created
mostly with Wikipedia in mind, so Wikipedia's notability policies
influenced Wikidata. Roughly, Wikidata has items for every thing about
which there is, or can be, a Wikipedia article and for things that are
useful, or if it "fulfills some structural need
<https://www.wikidata.org/wiki/Wikidata:Notability>".
Towns obviously have or can a Wikipedia article about them, but probably
not every street or shop. But do they fulfill a structural need or is it
way too much?
If it's way too much, how can this be bridged, or federated, or whatever
the current popular word is? I don't even know exactly how does OSM store
labels and translations now, but it sounds like another instance of
Wikibase, if not Wikidata itself, can be used for it.
I don't have much to add, but I'd love to hear ideas from people who do
(again, Aude and Christian Consonni, I'm looking at you :) ).
--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
“We're living in pieces,
I want to live in peace.” – T. Moore
Hi, there is items about Wikibase data model in Wikidata (created by me,
but not only)
If I understand correctly, they could be cited in the semantic web as
https://www.wikidata.org/entity/Q19798647
(If they are kept /o\)
Tom²
Hi all,
I am trying to use the Wikidata Toolkit to extract interlanguage links for
certain pages from Wikipedia.
So far, I've tried different attempts based on the code provided in
SiteLinksExample (
https://github.com/Wikidata/Wikidata-Toolkit/blob/master/wdtk-examples/src/…)
without any success. I've realized that this is likely not the correct
approach.
Optimally I'd like to do this while processing a local file, I've
downloaded a pages-meta-current.xml.bz2 file, but I can't really get my
head around how to go ahead with this.
Any pointers are appreciated.
Best,
Alan
--
Alan Said
Recorded Future
e: alansaid(a)acm.org
t: @alansaid
w: www.alansaid.com
Hi,
I've been trying to create a query in the online editor
<http://wdq.wmflabs.org/wdq/?q=claim[31:%28tree[12280][][279]%29]%20AND%20tr…>
such
that I can retrieve a relationship, unknown to me in advance, for two
alpha-numeric IDs (those Q numbers).
So for instance, I have Terrell Buckley (Q5571382) and Miami Dolphins
(Q223243) I was trying to maybe use one of them as a 'TREE' and then check
to see if the other was existing on one of the nodes and them take the
relationship that equates to the link traversed to get from one to the
other, is that reasonable?
But so far I've not been able to figure it out nor find any illuminating
resources online.
In general, with two IDs I'd like to write a program that will query
Wikidata and find the relationship between them. I know the problem of
semantic relatedness, like Boston the band and Boston the city, but, I
guess I'll worry about that next.
I posted this exact question earlier to 'wiki-research-l(a)lists.wikimedia.org'
but I guess actually its pretty specific to wikidata, so I think this is a
better place for it. Isn't it?
Thank you for you consideration.
Sincerely,
Matthew
Greetings,
I am pleased to announce that nominations are now being accepted for the
2015 Wikimedia Foundation Elections. This year the Board and the FDC Staff
are looking for a diverse set of candidates from regions and projects that
are traditionally under-represented on the board and in the movement as
well as candidates with experience in technology, product or finance. To
this end they have published letters
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_elections_2015/Call_fo…>
describing
what they think is needed and, recognizing that those who know the
community the best are the community themselves, the election
committee is accepting
nominations
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_elections_2015#Informa…>
for
community members you think should run and will reach out to those
nominated to provide them with information about the job and the election
process.
This year, elections are being held for the following roles:
*Board of Trustees*
The Board of Trustees is the decision-making body that is ultimately
responsible for the long term sustainability of the Foundation, so we value
wide input into its selection. There are three positions being filled.
More information about this role can be found at
https://meta.wikimedia.org/wiki/Wikimedia_Foundation_elections_2015/Board_e….
*Funds Dissemination Committee (FDC)*
The Funds Dissemination Committee (FDC) makes recommendations about how to
allocate Wikimedia movement funds to eligible entities. There are five
positions being filled. More information about this role can be found at
https://meta.wikimedia.org/wiki/Wikimedia_Foundation_elections_2015/FDC_ele…
.
*Funds Dissemination Committee (FDC) Ombud*
The FDC Ombud receives complaints and feedback about the FDC process,
investigates complaints at the request of the Board of Trustees, and
summarizes the investigations and feedback for the Board of Trustees on an
annual basis. One position is being filled. More information about this
role can be found at
https://meta.wikimedia.org/wiki/Wikimedia_Foundation_elections_2015/FDC_Omb…
.
The candidacy submission phase lasts from 00:00 UTC April 20 to 23:59 UTC
May 5 for the Board and from 00:00 UTCApril 20 to 23:59 UTC April 30 for
the FDC and FDC Ombudsperson. This year, we are accepting both
self-nominations and nominations of others. More information on this
election and the nomination process can be found at
https://meta.wikimedia.org/wiki/Wikimedia_Foundation_elections_2015.
Please feel free to post a note about the election on your project's
village pump. Any questions related to the election can be posted on the
talk page on Meta, or sent to the election committee's mailing list,
board-elections(a)wikimedia.org
On behalf of the Elections Committee,
-Gregory Varnum (User:Varnent)
Coordinator, 2015 Wikimedia Foundation Elections Committee
Dear all,
Thank you all for your answers. I will have a look to the different
projects you have mentioned in your emails.
In the meantime I have spent a bit more time exploring Wikidata for
paintings as one of our project currently focuses on Art and comparing it
with the Europeana Data Model in terms of properties. I have noticed the
absence of some properties and I would be curious whether it is just an
overlook or whether there is a real intention behind the omission:
-Cultural Heritage data have most of the time a description property where
you will find lot of relevant free text information. The structured
property but inside you will find mostly free- text. I couldn't find a
similar property in Wikidata but there is something similar in Dbpedia. Is
it something you are planning to introduce or have you made the decision to
exclude any free-text infromation from Wikidata for now.
-While I was looking for painting in Wikidata I also noticed the absence of
information related to the size/dimension of the Artwork. The information
is most of the time present in Cultural Heritage data. Is it something
Wikidata is interested in or has it been omitted intentionally?
-Then the last question is about values in different languages for a given
property. How do you indicate the language in Wikidata? Are you using a
xml:lang attribute or something similar?
Thank you very much for your help
Best,
Valentine
Hi Lydia & all,
As promised before, I have collected a list of suggestions of what I
noticed myself or others have noticed to be unhandy or something that would
be great to be improved. The suggestions and ideas have not been searched
for in Phabricator (as that is still difficult), so it is possible that
some of the points are already in Phabricator or is already worked on.
The suggestions are *not* sorted by importance, but tried to be grouped by
subject/area.
I have tried to describe them as clear as possible, some with examples.
Maybe some of them can be connected with Phabricator tasks.
I hope these ideas will help improve Wikidata to make the use of it better
and easier.
Thanks!
Romaine
*A. General*
A1. loading: large item pages with much statements and sitelinks are too
heavy to load. For example items about countries, like Q30 (USA), make my
(up-to-date) browser freeze (not responding) before it is loaded.
A2. missing labels: show on more places that the shown label of an item is
not in the language of the user, but that the English version of the page
is shown. (In the statements section it is shown that the label is not
available in the language of the user, but is in English. Such is
recommended to do on more places.) Like in the contributions page, there it
is not shown that the shown label is in English instead of the language of
the user.
A3. overview: make the overview of pages more compact (read: less white
space, resulting in less annoying scrolling and having a full overview of
all statements at once), or have a gadget/skin who can do that. If 10
statements have been added to a page, too much scrolling is needed because
most of the page is empty with white.
The labels & descriptions section has been made more compact recently.
However, full labels of statements have to be shown in future as well.
A4. search: make it easier to search in the fields of other languages,
especially when no label and/or description is given to an item, but also
in other cases. (Regularly a subject is more known by the name in the other
language than in the language set for the user.)
A5. search: make it possible to switch of searching in statements section
(or something like that), because searching for example for "Gemini" gives
too much noise. Each page that has as statement something with (in this
example) Gemini, pops up in the search and makes it difficult to find the
right item.
A6. search: make it possible to easily search for where a specific property
with a specific Q or input is used. Like filling in P717:032 or P131:Q3150.
A7. tool request: tool to give an overview (table) of all items with a
certain property (or Q, or label, etc.), and in a second column being able
to set to show a specific statement (if added) to each of the rows.
Example: municipalities in Brazil (Q3184121). In the fist column all pages
with P31:Q3184121 (instance of: municipality of Brazil). In the second
column being able to set P17 (country). A third column being able to set
P131 (located in the administrative territorial entity).
It should be possible to sort the overview (table) by column, so that it is
possible to group all municipalities by state (and if none added empty, so
these missing ones can be fixed).
It should also be possible to import data from an external source to match
the items on Wikidata with the external source (and to see where the
differences are).
Example: it would be great to be able to match the municipality names of
Brazil with data from an external source. Like for example population
numbers, height of the town, km², ID code, and so on.
A8. tool request: having a list of items with coordinates but no country
added, and a tool that gives a suggestion based on the coordinates in which
country this subject is situated, a map of the coordinate on the country
and buttons to confirm/reject the country. (Likewise WikiData Game)
*B. Labels/descriptions section*
B1. labels in other languages: make it possible (again) to see the full
label in other languages, instead of only a part ending with ... .
Especially with long labels we should not press edit and do something
clumsy to be able to read the full label
Example: less than half of the label is visible:
https://www.wikidata.org/wiki/Q18032311?uselang=de
Or: https://www.wikidata.org/wiki/Q18032986
B2. missing labels/descriptions: make it possible to have one click in an
empty field ("No label defined yet" or "No description defined yet") to
edit that field (and making that section full edibale), without having to
press [ edit ] on top first.
B3. confusion: I still keep having that I click in the title field to add a
title to an item in my language. (Maybe something so solve in combination
with B2.)
B4. other languages: maker it easier to change fields for other languages
which are not set in my preferences. Maybe this can be done by a second
collapsible box, so that "In more languages" shows the languages a user has
set in his preferences, and in the collapsible part of "All languages" (new
part) all other languages can be changed. (instead of not so handy label
lister)
B5. auto adding: make it possible to add the name of the person or the
scientific name of species to all of those languages with the same
alphabet/script.
Example: https://www.wikidata.org/wiki/Q7298502 -> The name of this genus
is for many languages the same and results in the same label for all of
these languages.
*C. Statements section*
C1. order: make it possible to set a certain order of how statements are
shown.
Example: someone who adds often Commons categories if those are missing,
that that user can set that Commons is shown first on every item. (And
second Commons gallery, third ..., etc.) Having to search for a statement
that varies on what position it is shown is really annoying after some
items.
C2. order: make it possible to alphabetize the order of the statements
(like a wikitable).
C3. duplicate: give a notice with selecting the right property, when a
certain property is already used on that item.
C4. suggester: if no statements have been added, the suggester should
suggest P31 (instance of) and P373 (Commons category) as those two are
(almost?) always needed or possible and are generic.
C5. suggester: if I select a certain property as statement, and I start
typing the first character(s) of the name of the subject, the suggester
should actually suggest something that matches with the property. This is
especially wanted with the use of P17 (country).
Example: if I want to add the statement that a certain subject is located
in country (P17) Albania, and I type an A, the suggester should give me
first actual names of countries that start with an A instead of starting
with A (letter of the alphabet), ampere, etc. If other suggestions are
given when no (known) country names begin with the characters typed in the
field it is fine, but please start first with actual names of countries
that begin with the A, etc.
C6. suggester: better suggestions requested. Example: when I have added as
statement P31:Q62832 (instance of: observatory), the next suggestion for a
property should be P717 (Minor Planet Center observatory code).
C7. feature/gadget request: having one or more buttons on a page to repeat
the one of the last 5 actions I have done on another page, like adding [
country: US ], or just [ country: ]. This would be especially handy for
statements, but maybe elsewhere too.
C8. character sensitive: make characters not sensitive for diacritics
<https://en.wikipedia.org/wiki/Diacritic>, ligatures
<https://en.wikipedia.org/wiki/Typographic_ligature>, dashes (-), etc.
Maybe this suggestion does not count for some other languages, but in Dutch
the a and the ã (as example, also ä á à â etc) are considered to be the
same character. For many words it is not always clear if it is written with
or without diacritic. For example: is it Sao Paulo or São Paulo in Dutch?
Also with or without a dash, example: Sint Maarten or Sint-Maarten. The
second one (with -) is how we normally write it but the island name is
without - is the official name. This issue concerns thousands of pages, but
if we type it with or without a diacritic/ligature/dash/etc while it should
be with it, or the other way round, the suggester does not suggest it.
C9. missing statements: give a notice on items which statements are
missing. On the talk pages of properties (example P717
<https://www.wikidata.org/wiki/Property_talk:P717>) it is stated with what
statements a property must be combined. If on an item a notice is shown or
invitation is given to add the missing statements, it would make it easier
to have more items more complete.
C10. quick adding: have a tool/pop-up/etc to be able to add quickly the
country to an item based on a suggestion.
*D. Sitelinks section*
D1. language codes: if someone adds a language code in the field to add a
new sitelink to an item, please interpret this actual as a language code.
Users are used to use the language code and then add the sitelink.
The language code is more common for many users: it is used in the url of
the wiki, it is used with local interwikis, users have set in their
preferences to show the language code only, have preferences to show the *In
other languages* section on Wikipedia in their own language, and more.
Users are used to use the language code, and this works also fast. Also the
name of the language a language code is belonging to, differs for each
language. Users often only add the language code and then press tab to go
to the next field, but then get the wrong language.
Examples:
If you add "es" for Spanish to add a page on a Spanish language wiki,
however the language suggester gives first the suggestion "Esperanto" (eo)
and not "Spanish" (es). (Also español is normally alphabetized before
Esperanto!)
If you add "ro" for Romanian to add a page on a Romanian language wiki,
however the language suggester gives first the suggestion "Romani" (rmy)
and not "Romanian" (ro).
If you add "ru" for Russian to add a page on a Russian language wiki,
however the language suggester gives first the suggestion "Runa Simi" (qu),
second "rumantsch" (rm) and not "Russian" (ru).
D2. language suggester localisation: have the language suggester using the
language a user has set in his own preferences. Many users are
multilingual, but not everywhere in the world people speak/write in English
well.
D3. adding new sitelink: on the Dutch Wikipedia multiple times various
users have indicated that they find themselves unable to add a new sitelink
when they wrote an article, as they experience this section too difficult
to use. Even while multiple other users have helped and explained this to
them. Maybe this is because 1. users need to squeeze in a code and 2. the
field to add the actual pagename is too few remarkable, users do not see
where to add the sitelink..
D4. faster saving: make saving faster, long list of interwikis slow down
adding an interwiki very much. Now it seems the software checks if any
field has been changed, every row. Saving can maybe go faster when only
those fields are checked where the cursor has been. This is also something
that consumes much time in the labels and descriptions section if someone
has 3+ languages set in his preferences.
(In the previous version of the labels and descriptions section the alias
fields coloured white when the cursor was put in that field.)
D5. easier Commons selection: make it easier to add a new Commons sitelink
to the section *Other sites. *
*E. Outside Wikidata*
E1. Commons: show on Commons somewhere when a category, gallery page,
institution page, template, file etc is used in a statement on Wikidata. If
a page is renamed or deleted, this must be changed on Wikidata as well, but
noticing where a page is used is not easy.
If an image is linked in a statement on Wikidata, on the image page this is
shown. Somehow this should also be implemented for categories, gallery
pages, institution pages, templates, and others.
This should be added to pages like Special:WhatLinksHere, Special:MovePage,
Special:GlobalUsage
E2. Wikipedia/other wikis: develop an extension, that communities can
enable, that shows on the bottom of articles, in the style of the category
box, an automatic box with all the identifiers used for authority control
<https://www.wikidata.org/wiki/Q18614948> to replace templates like
https://www.wikidata.org/wiki/Q5153934.