Hello Sophie,
Thank you for reaching out and sharing this interesting project with us. I
am intrigued by the potential of using Wikidata to generate content in
different languages and appreciate your efforts in developing open-source
systems for Natural Language Generation.
I am particularly interested in the tool you are designing to assist people
in writing Wikipedia articles. The idea of providing seed texts generated
from structured data sources like DBpedia and Wikidata aligns well with the
goals of Wikipedia to provide accurate and reliable information.
I would love to learn more about the project and explore potential
collaborations.
Thank you again for sharing this exciting initiative. I look forward to
hearing from you soon.
Best regards,
Aliyu Shaba.
On Tue, Mar 26, 2024, 4:50 PM Sophie Fitzpatrick <
sophiefitzpatrick(a)wikimedia.ie> wrote:
Hello Wikimedians,
My name is Sophie and I am the Project and Communications Manager at
Wikimedia Community Ireland.
I am reaching out to draw your attention to an interesting Natural
Language Processing Project that uses Wikidata to generate content in
different languages, at DCU here in Ireland. We have been collaborating
with Simon Mille from the Adapt Centre <https://www.adaptcentre.ie/> recently
and I thought it might be good to make some connections with the wider Wiki
Community who are specifically interested in or involved with AI.
Please let me introduce the project below and if you would like to learn
more or connect with Simon I would be delighted to introduce you.
Kind regards,
Sophie Fitzpatrick
*Project description:* At DCU-NLG <https://dcu-nlg.github.io/>, one of
our main research topics is the automatic generation of text from
structured data. We work with structured repositories such as DBpedia and
Wikidata (among other resources), which contain millions of triples that
can be used to generate texts about targeted entities in a particular
language. A lot of techniques exist for generating text from triple sets,
the most famous (and probably best) one being prompting a GPT model.
However, closed-source models such as the GPT series have some important
drawbacks: they are very much resource-hungry, they are not easily
controllable, and they do not give researchers access to their code. At
DCU-NLG, we develop open-source systems that aim to address these issues in
the domain of Natural Language Generation. We build (i) generators based on
Large Language Models (LLMs), which can achieve very high-quality results
but still require a large amount if energy to work, (ii) fully rule-based
systems, which are extremely energy-efficient but struggle to get to the
quality level of LLMs, and (iii) hybrid systems, which aim at combining the
strengths of LLMs, rule-based systems and neural systems. We are also
interested in the real-world use of these systems, and are currently making
a tool that could help people write Wikipedia articles: we are designing an
interface that, given an entity and a language, returns small seed texts
generated using several techniques mentioned above, always using DBpedia or
Wikidata information to ensure the traceability of the source. People can
then use these seed texts as a starting point for editing a new Wikipedia
page.
*Some resources:*
- RTE brainstorm article
<https://www.rte.ie/brainstorm/2023/1206/1420417-gaeilge-irish-translation-ai-grammar-pronunication/>
(by
the way it's funny how they use the word "translate" in their title,
knowing the time I spend talking about how NLG is not translation xD)
- Papers from our group about using GPT
<https://aclanthology.org/2023.mmnlg-1.9/> and a rule-based system
<https://aclanthology.org/2023.pandl-1.4/> for the generation of
Irish text from DBpedia.
- The GEM shared task <https://gem-benchmark.com/shared_task> about
generation from DBpedia and WIkidata, which I co-organise.
--
Sophie Fitzpatrick
*Project and Communications Manager*
[image: Wikimedia Community Ireland Bi-Lingual Logo]
Pobal Wikimedia na hÉireann | Wikimedia Community Ireland
https://wikimedia.ie/
_______________________________________________
Languages mailing list -- languages(a)lists.wikimedia.org
To unsubscribe send an email to languages-leave(a)lists.wikimedia.org