Wiki-research-l April 2016

wiki-research-l@lists.wikimedia.org

25 participants
21 discussions

SEMANTiCS 2016, Leipzig, Sep 12-15, Extended Submission Deadline May 10, 2016

by Sebastian Hellmann

**** DEADLINE EXTENSION**** 3rd Call for Research & Innovation Papers SEMANTiCS 2016 - The Linked Data Conference Transfer // Engineering // Community 12th International Conference on Semantic Systems Leipzig, Germany September 12 -15, 2016 http://2016.semantics.cc Important Dates (Research & Innovation) * Abstract Submission Deadline: extended: May 3, 2016 (11:59 pm, Hawaii time) * Paper Submission Deadline: extended: May 10, 2016 (11:59 pm, Hawaii time) * Notification of Acceptance: extended: June 7, 2016 (11:59 pm, Hawaii time) * Camera-Ready Paper: extended: July 1, 2016 (11:59 pm, Hawaii time) Submissions via Easychair: https://easychair.org/conferences/?conf=semantics2016research As in the previous years, SEMANTiCS’16 proceedings are expected to be published by ACM ICP. The annual SEMANTiCS conference is the meeting place for professionals who make semantic computing work, who understand its benefits and encounter its limitations. Every year, SEMANTiCS attracts information managers, IT-architects, software engineers and researchers from organisations ranging from NPOs, through public administrations to the largest companies in the world. Attendees learn from industry experts and top researchers about emerging trends and topics in the fields of semantic software, enterprise data, linked data & open data strategies, methodologies in knowledge modelling and text & data analytics. The SEMANTiCS community is highly diverse; attendees have responsibilities in interlinking areas like knowledge management, technical documentation, e-commerce, big data analytics, enterprise search, document management, business intelligence and enterprise vocabulary management. The success of last year’s conference in Vienna with more than 280 attendees from 22 countries proves that SEMANTiCS 2016 will continue a long tradition of bringing together colleagues from around the world. There will be presentations on industry implementations, use case prototypes, best practices, panels, papers and posters to discuss semantic systems in birds-of-a-feather sessions as well as informal settings. SEMANTICS addresses problems common among information managers, software engineers, IT-architects and various specialist departments working to develop, implement and/or evaluate semantic software systems. The SEMANTiCS program is a rich mix of technical talks, panel discussions of important topics and presentations by people who make things work - just like you. In addition, attendees can network with experts in a variety of fields. These relationships provide great value to organisations as they encounter subtle technical issues in any stage of implementation. The expertise gained by SEMANTiCS attendees has a long-term impact on their careers and organisations. These factors make SEMANTiCS for our community the major industry related event across Europe. SEMANTiCS 2016 will especially welcome submissions for the following hot topics: * Data Quality Management * Data Science (Data Mining, Machine Learning, Network Analytics) * Semantics on the Web, Linked (Open) Data & schema.org * Corporate Knowledge Graphs * Knowledge Integration and Language Technologies * Economics of Data, Data Services and Data Ecosystems Following the success of previous years, the ‘horizontals’ (research) and ‘verticals’ (industries) below are of interest for the conference: Horizontals * Enterprise Linked Data & Data Integration * Knowledge Discovery & Intelligent Search * Business Models, Governance & Data Strategies * Big Data & Text Analytics * Data Portals & Knowledge Visualization * Semantic Information Management * Document Management & Content Management * Terminology, Thesaurus & Ontology Management * Smart Connectivity, Networking & Interlinking * Smart Data & Semantics in IoT * Semantics for IT Safety & Security * Semantic Rules, Policies & Licensing * Community, Social & Societal Aspects Verticals * Industry & Engineering * Life Sciences & Health Care * Public Administration * Galleries, Libraries, Archives & Museums (GLAM) * Education & eLearning * Media & Data Journalism * Publishing, Marketing & Advertising * Tourism & Recreation * Financial & Insurance Industry * Telecommunication & Mobile Services * Sustainable Development: Climate, Water, Air, Ecology * Energy, Smart Homes & Smart Grids * Food, Agriculture & Farming * Safety, Security & Privacy * Transport, Environment & Geospatial Research / Innovation Papers The Research & Innovation track at SEMANTiCS welcomes the submission of papers on novel scientific research and/or innovations relevant to the topics of the conference. Submissions must be original and must not have been submitted for publication elsewhere. The Research & Innovation track at SEMANTiCS is a single-blind review process (author names are visible to reviewers, reviewers stay anonymous). The submitted abstract and the topics are leveraged to find adequate reviewers for submitted papers. Please write an email to semantics2016researchtrack(a)easychair.org, if you have any questions. Papers should follow the ACM ICPS guidelines for formatting and must not exceed 8 pages in length for full papers and 4 pages for short papers, including references and optional appendices. The layout templates can be found here: http://www.acm.org/sigs/publications/proceedings-templates All accepted full papers and short papers will be published in the digital library of the ACM ICP Series. Research & Innovation papers should be submitted through EasyChair at: https://easychair.org/conferences/?conf=semantics2016research. Papers must be submitted in PDF (Adobe's Portable Document Format) format. Other formats will not be accepted. For the camera-ready version, the source files (Latex, WordPerfect, Word) will also be needed. Important Dates (Research & Innovation) * Abstract Submission Deadline: extended: May 3, 2016 (11:59 pm, Hawaii time) * Paper Submission Deadline: extended: May 10, 2016 (11:59 pm, Hawaii time) * Notification of Acceptance: extended: June 7, 2016 (11:59 pm, Hawaii time) * Camera-Ready Paper: extended: July 1, 2016 (11:59 pm, Hawaii time) Research and Innovation Chairs: * Anna Fensel, University of Innsbruck * Amrapali Zaveri, Stanford University Contact email address: semantics2016researchtrack(a)easychair.org Research and Innovation Deputy Chairs: * Bernhard Haslhofer, Austrian Institute of Technology * Artem Revenko, Semantic Web Company Conference Chairs: * Sebastian Hellmann, AKSW/KILT, InfAI, Leipzig University * Tassilo Pellegrini, UAS St. Pölten Senior Program Committee: * Paul Buitelaar, Insight - National University of Ireland, Galway * Oscar Corcho, Universidad Politécnica de Madrid * Claudia D'Amato, University of Bari * Brian Davis, National University of Ireland, Galway * Victor de Boer, VU Amsterdam * Christian Dirschl, Wolters Kluwer Germany * Michel Dumontier, Stanford University * Agata Filipowska, Department of Information Systems, Poznan University of Economics * Bernhard Haslhofer, AIT-Austrian Institute of Technology * Sebastian Hellmann, AKSW/KILT, InfAI, Leipzig University * Andreas Hotho, University of Wuerzburg * Jose Emilio Labra Gayo, Universidad de Oviedo * Peter Mika, Yahoo! Research * Axel-Cyrille Ngonga Ngomo, University of Leipzig * Josiane Xavier Parreira, Siemens AG Österreich * Heiko Paulheim, University of Mannheim * Tassilo Pellegrini, University of Applied Sciences St. Pölten * Marta Sabou, Vienna University of Technology * Harald Sack, Hasso-Plattner-Institute for IT Systems Engineering, University of Potsdam * Pierre-Yves Vandenbussche, Fujitsu * Ruben Verborgh, Ghent University - iMinds * Maria Esther Vidal, Universidad Simon Bolivar, Dept. Computer Science

8 years

Finding the most viewed Wikipedia articles on education

by john cummings

Hi all I'm doing some work with colleagues from the education sector at UNESCO to look at improving some of the most viewed education articles on English language Wikipedia. I'm trying to use TreeViews to get information on what are the most viewed articles in Category:Education, unfortunately such large categories just crash my browser, it means I will have to split the query up into at least 50-100 smaller queries. Does anyone know of a less manual way around this? Ideally the output would be spreadsheet of the article title and the number of page views of the article for a 30, 60 or 90 period in the recent past. I will use Treeviews if it is the only way but I'd really love to save myself from half a day of data entry. I imagine this would also be useful for people working with other organisations for other subjects. Thanks John

8 years

Re: [Wiki-research-l] [Wikimedia-l] Gender gap on "classical" encyclopedias

by Tilman Bayer

On Wed, Apr 20, 2016 at 12:39 AM, <alexhinojo(a)gmail.com> wrote: > Hi, as some of you may know, the Wikipedia gender indicator [1] tells us how many articles are biographies about women x language/country/culture. > > In order to compare these numbers...Does anyone knows if there is an existing comparison with gender balance in classical encyclopedias? (Britannica, Larousse...) or, if not, could someone prepare a WD query about it? > > I think it could be a good argument for us to use: e.g "at cawiki 12% of bios are about women, compared to 5% in GEC, Our most famous encyclopedia". > > We could compare it also for temathic encyclopedias or other databases existing in projects like Mix and match. > > Can someone help? thanks in advance > > > [1]http://wigi.wmflabs.org/ > > > Àlex Hinojo > User:Kippelboy > Amical Wikimedia Programme manager Interesting question. There may be more suitable venues for it, e.g. the research mailing list (CCed). Anyway, to start with two examples: http://reagle.org/joseph/pelican/social/gender-bias-in-wikipedia-and-britan… https://meta.wikimedia.org/wiki/Research:Newsletter/2015/May#Notable_women_… Comparison of Wikipedia with, among other sources, "Human Accomplishment", a 2003 "ranking of geniuses throughout the ages and around the world based on their prominence in contemporary encyclopedias" (NYT) -- Tilman Bayer Senior Analyst Wikimedia Foundation IRC (Freenode): HaeB

8 years

Re: [Wiki-research-l] [Analytics] Unique Devices data available on API

by Kevin Leduc

Here's another useful link to a form that helps you construct the API call: https://wikimedia.org/api/rest_v1/?doc#!/Unique_devices_data/get_metrics_un… On Tue, Apr 19, 2016 at 12:17 PM, Nuria Ruiz <nuria(a)wikimedia.org> wrote: > Hello! > > The analytics team is happy to announce that the Unique Devices data is > now available to be queried programmatically via an API. > > This means that getting the daily number of unique devices [1] for English > Wikipedia for the month of February 2016, for all sites (desktop and > mobile) is as easy as launching this query: > > > https://wikimedia.org/api/rest_v1/metrics/unique-devices/en.wikipedia.org/a… > > You can get started by taking a look at our docs: > https://wikitech.wikimedia.org/wiki/Analytics/Unique_Devices#Quick_Start > > If you are not familiar with the Unique Devices data the main thing you > need to know is that > is a good proxy metric to measure Unique Users, more info below. > > Since 2009, the Wikimedia Foundation used comScore to report data about > unique web visitors. In January 2016, however, we decided to stop > reporting comScore numbers [2] because of certain limitations in the > methodology, these limitations translated into misreported mobile usage. We > are now ready to replace comscore numbers with the Unique Devices Dataset . > While unique devices does not equal unique visitors, it is a good proxy for > that metric, meaning that a major increase in the number of unique devices > is likely to come from an increase in distinct users. We understand that > counting uniques raises fairly big privacy concerns and we use a very > private conscious way to count unique devices, it does not include any > cookie by which your browser history can be tracked [3]. > > > [1] https://meta.wikimedia.org/wiki/Research:Unique_Devices > [2] [https://meta.wikimedia.org/wiki/ComScore/Announcement > [3] > https://meta.wikimedia.org/wiki/Research:Unique_Devices#How_do_we_count_uni… > devices.3F > > > _______________________________________________ > Analytics mailing list > Analytics(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > >

8 years

Wikimedia Research: Q3 quarterly reviews and Q4 goals

by Dario Taraborelli

Hey all, heads up that we published the quarterly review slidedeck for Q3 <https://commons.wikimedia.org/wiki/File:WMF_Research_&_Design_Research_&_An…> (January - March, 2016) for the Wikimedia Research <https://www.mediawiki.org/wiki/Wikimedia_Research> department (slides 2-18): check them out for a high-level overview of our main accomplishments over the last three months. The Q4 goals (April - June 2016) for each team are listed on this page <https://www.mediawiki.org/wiki/Wikimedia_Research/Goals#April_-_June_2016_.…>. Dario

8 years

Research announcement: Learning from article revision histories

by Pierce Edmiston

I'd like to announce an IEG proposal I'm working on titled "Learning from article revision histories" [1]. If anyone who has studied the evolution of Wikipedia articles (the extent to which articles always improve in quality) is interested in the project, please consider getting in touch with me as I'd love to hear your thoughts. I'm excited about the possible usefulness of this sort of research for the Wikipedia community, but I'm new to Wikipedia and I know I am not the first one to ask some of these questions. If more experienced Wikipedians would like to weigh in on the usefulness of addressing the efficiency of the collaborative editing process, I've posted some discussion topics on the IEG proposal's talk page [2]. Pierce Edmiston [1]: https://meta.wikimedia.org/wiki/Grants:IEG/Learning_from_article_revision_h… [2]: https://meta.wikimedia.org/wiki/Grants_talk:IEG/Learning_from_article_revis…

8 years, 1 month

Updates to ORES service & BREAKING CHANGE on April 7th

by Aaron Halfaker

Hey folks, we have a couple of announcements for you today. First is that ORES has a large set of new functionality that you might like to take advantage of. We'll also want to talk about a *BREAKING CHANGE on April 7th.* Don't know what ORES is? See http://blog.wikimedia.org/2015/11/30/artificial-intelligence-x-ray-specs/ *New functionality* *Scoring UI* Sometimes you just want to score a few revisions in ORES and remembering the URL structure is hard. So, we've build a simple scoring user-interface <https://ores.wmflabs.org/ui/> that will allow you to more easily score a set of edits. *New API version* We've been consistently getting requests to include more information in ORES' responses. In order to make space for this new information, we needed to change the structure of responses. But we wanted to do this without breaking the tools that are already using ORES. So, we've developed a versioning scheme that will allow you to take advantage of new functionality when you are ready. The same old API will continue to be available at https://ores.wmflabs.org/scores/, but we've added two additional paths on top of this. - https://ores.wmflabs.org/v1/scores/ is a mirror of the old scoring API which will henceforth be referred to as "v1" - https://ores.wmflabs.org/v2/scores/ implements a new response format that is consistent between all sub-paths and adds some new functionality *Swagger documentation* Curious about the new functionality available in "v2" or maybe what the change was from "v1"? We've implemented a structured description of both versions of the scoring API using swagger -- which is becoming a defacto stanard for this sort of thing. Visit https://ores.wmflabs.org/v1/ or https://ores.wmflabs.org/v2/ to see the Swagger user-interface. Visithttps://ores.wmflabs.org/v1/spec/ or https://ores.wmflabs.org/v2/spec/ to get the specification in a machine-readable format. *Feature values & injection* Have you wondered what ORES uses to make it's predictions? You can now ask ORES to show you the list of "feature" statistics it uses to score revisions. For example, https://ores.wmflabs.org/v2/scores/enwiki/wp10/34567892/?features will return the score with a mapping of feature values used by the "wp10" article quality model in English Wikipedia to score oldid=34567892 <https://en.wikipedia.org/wiki/Special:Diff/34567892>. You can also "inject" features into the scoring process to see how that affects the prediction. E.g., https://ores.wmflabs.org/v2/scores/enwiki/wp10/34567892?features&feature.wi… *Breaking change -- new models* We've been experimenting with new learning algorithms to make ORES work better and we've found that we get better results with gradient boosting <https://en.wikipedia.org/wiki/Gradient_boosting> and random forest <https://en.wikipedia.org/wiki/Random_forest> strategies than we do with the current linear svc <https://en.wikipedia.org/wiki/Support_vector_machine> models. We'd like to get these new, better models deployed as soon as possible, but with the new algorithm comes a change in the range of probabilities returned by the model. So, when we deploy this change, any tools that uses hard-coded thresholds on ORES' prediction probabilities will suddenly start behaving strangely. Regretfully, we haven't found a way around this problem, so we're announcing the change now and we plan to deploy this *BREAKING CHANGE on April 7th*. Please subscribe to the AI mailinglist <https://lists.wikimedia.org/mailman/listinfo/ai> or watch our project page [[:m:ORES <https://meta.wikimedia.org/wiki/ORES>]] to catch announcements of future changes and new functionality. In order to make sure we don't end up in the same situation the next time we want to change an algorithm, we've included a suite of evaluation statistics with each model. The filter_rate_at_recall(0.9), filter_rate_at_recall(0.75), and recall_at_fpr(0.1) thresholds represent three critical thresholds (should review, needs review, and definitely damaging -- respectively) that can be used to automatically configure your wiki tool. You can find out these thresholds for your model of choice by adding the ?model_info parameter to requests. So, come breaking change, we strongly recommend basing your thresholds on these statistics in the future. We'll be working to submit patches to tools that use ORES in the next week to implement this flexibility. Hopefully, all you'll need to do is worth with us on those. -halfak & The Revision Scoring team <https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service>

8 years, 1 month

Fwd: [Wikitech-l] statistics about frequent section titles

by Jonathan Morgan

Cross-posting this request to wiki-research-l. Anyone have data on frequently used section titles in articles (any language), or know of datasets/publications that examined this? I'm not aware of any off the top of my head, Amir. - Jonathan ---------- Forwarded message ---------- From: Amir E. Aharoni <amir.aharoni(a)mail.huji.ac.il> Date: Sat, Jul 11, 2015 at 3:29 AM Subject: [Wikitech-l] statistics about frequent section titles To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org> Hi, Did anybody ever try to collect statistics about frequent section titles in Wikimedia projects? For Wikipedia, for example, titles such as "Biography", "Early life", "Bibliography", "External links", "References", "History", etc., appear in a lot of articles, and their counterparts appear in a lot of languages. There are probably similar things in Wikivoyage, Wiktionary and possibly other projects. Did anybody ever try to collect statistics of the most frequent section titles in each language and project? -- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com ‪“We're living in pieces, I want to live in peace.” – T. Moore‬ _______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>

8 years, 1 month

Wikimedia Research's Goals for April - July 2016

by Aaron Halfaker

Hey folks! I'm happy to announce that the Wikimedia Research team's goals <https://www.mediawiki.org/wiki/Wikimedia_Research/Goals#April_-_June_2016_.…> for the next quarter (April - July 2016) are up on the wiki. The Research and Data <https://www.mediawiki.org/wiki/Wikimedia_Research#Research_and_Data> team will deploy the ORES extension <https://www.mediawiki.org/wiki/Extension:ORES> that makes revision scoring as a service <https://meta.wikimedia.org/wiki/R:Revscoring> available within the wiki to Wikidata and Persian Wikipedia as a beta feature. We're also digging into studies of discussion modeling. We'll be releasing datasets of on-wiki discussions and starting our own modeling work to try to detect harassment & personal attacks. We'll be hosting three workshops: WikiCite <https://meta.wikimedia.org/wiki/WikiCite> in Berlin and the Wiki Workshop 2016 <http://snap.stanford.edu/wikiworkshop2016/> at WWW and ICWSM. We'll be extending our work surveying readers <https://meta.wikimedia.org/wiki/R:Research:Characterizing_Wikipedia_Reader_…> last quarter by digging into data analysis. The Design Research <https://www.mediawiki.org/wiki/Wikimedia_Research#Design_Research> team will support the use of pragmatic personas <https://www.mediawiki.org/wiki/Personas_for_product_development> in product development and analyze data on on Reader, New Editor and New Reader personas. We'll also continue supporting ongoing user-facing projects in Editing, Reading, Search and Community Engagement. We'll perform a new set of deep-dive interviews in Nigeria to extend the deep-dive interviews recently completed for Mexico. We'll deploy surveys to University of Washington students <https://meta.wikimedia.org/wiki/Research:Publicly_available_online_learning…> and complete the analysis of results. We'll also create a project plan for benchmarking UX characteristics of Wikimedia software and to create a stable testing environment to support our user studies of software changes. We're constantly looking for contributors and as usual we welcome feedback on these projects via the corresponding talk pages on Meta. You can contact us for any question on IRC via the #wikimedia- <http://webchat.freenode.net/?channels=wikimedia-research>research <http://webchat.freenode.net/?channels=wikimedia-research> channel and follow @WikiResearch <https://twitter.com/WikiResearch> on Twitter for the latest Wikipedia and Wikimedia research updates hot off the press. -Aaron (on behalf of the team)

8 years, 1 month

[IEG] Sharing of my Individual Engagement Grant application, please comment on it

by DJAKN

Hello all, I'm prepared to participate in Individual Engagement Grant (IEG) and has an idea closely linked to the Accuracy Review Project raised by James Salsman. Here is a brief summary of my proposal: Out-of-date information and references are common in Wikipedia articles, especially in Chinese Wikipedia. Therefore, I would like to evaluate some existed solutions of identifying those out-of-date contents, and create a new bot to identify the information based on the results of testing. More detailed tests will be arranged after that by selected articles from Wikipedia and the cases that we compile. And here is the URL of the project proposal: https://meta.wikimedia.org/wiki/Grants:IdeaLab/Searching_for_out-of-date_in… And please comment on the proposal in the discussion board of it: https://meta.wikimedia.org/wiki/Grants_talk:IdeaLab/Searching_for_out-of-da… Li Linxuan

8 years, 1 month

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Wiki-research-l April 2016