Deal all,


Recently I met few of the Assamese Wikipedians in Dibrugarh. We discussed regarding many of the plans of the Assamese Wikimedia community. I'm summarizing our discussions in this mail.


Minutes of discussion

Meeting with Assamese Wikipedians, Faculty members of Dibrugarh University

Venue: Dibrugarh University

Time: 6:30 - 8:30 pm, 11.02.2013

Participants:



  1. Gitartha Bordoloi, M.D. Student, Assam Medical College, Dibrugarh


  1. Abhijit Kalita, Manager, NEEPCO Ltd, Duliajan


  1. Gunadeep Chetia, Asst. Professor, Centre for Computer Studies, Dibrugarh University, Dibrugarh

  2. Dr. Bhaskarjyoti Sarma, Asst. Professor, Department of Assamese , Dibrugarh University, Dibrugarh

  3. Dr. Mridul Bordoloi, Asst. Professor, Department of English, Dibrugarh University, Dibrugarh

  4. Subhashish Panigrahi, Programme Officer, A2K, Centre for Internet and Society, New Delhi


Agenda:



  1. Creating resources for Assamese Wikimedia communities

  2. Solving technological challenges

  3. Facilitating wikipedians for outreach


Discussion outline:



  1. Subhashish addressed the  meeting by explaining the role of A2K team in terms of catalyzing the Indian language communities across the nation.

  2. Gitartha briefly explained how Assamese community flourished in the recent past and has been trying to build it organically.

Gunadeep Chetia briefly discussed about the technical roadblocks for Assamese communities and the initiatives. Gunadeep, Bhaskarjyoti, Abhijit, Mridul and few other Assamese language enthusiasts have formed a non-profit organization “Society for Language Technology Development, Assam (SLTD,Assam) and have released a few typing tools for typing in Assamese. The society has 3 teams working collaboratively; A) Technology team B) Language team and C) Design team. Faculty members from various departments are involved in the society. Some of the students are also involved as volunteers in content generation project.


Challenges:



  1. Technology gap: Most of those who have experience and expertise are not computer handy and it’s becoming a roadblock for them to contribute to Wikipedia.

  2. Standardization of Assamese language:

  3. Lack of volunteers: Most of the volunteers die out after some time. If some kind of remuneration could be allocated more people could get involved and a huge repository could be built in a short span of time.

Accomplishments & Plans of the Society:
Accomplishments:



  1. Rodali: SLTD indigenously worked for creating the first Assamese  Phonetic typing tool (Offline and online) “Rodali”. It is distributed in a free license and constantly being upgraded.


Plans:



  1. Assamese Spell checker: This is a tool which could be useful for correcting typos

  2. Assamese word library (Some funding is needed to involve some people to type and create a library of Assamese words which could be used for adding the spell check feature.)

  3. Pronunciation library: Samples of various Assamese dialects were collected and analyzed using voice analyzing softwares. The average value of the samples was taken as the standard pronunciation of a particular word. Most Indian language lack such libraries and once built it could be used in multiple ways;

A) Voice command for voice based command for computer

B) Text to speech

Primary needs of the community:



  1. Digitization: Assamese community needs many of the out-of copyright books to be digitized in text format so they could be used for WikiSource. The community feels it will be useful to get the books typed by local DTP operators and distribute them for WikiSource and other portals. WikiSource has has very few active contributors and it is being difficult for the community to gather more people. A content creation drive could help.


4) Font related issues

I. Ambiguity of characters:

Assamese and Bengali scripts are broadly same, but some of the characters make the two scripts unique. Unicode consortium calls the characters for Assamese as Bengali characters. This issue has been taken to the Unicode Consortium and they were requested to change the name to Assamese/Bengali but never addressed. This mistake gave rise few more issues:



A) Assamese Wikipedia uses a Bengali Phonetic keyboard layout called Avro. As Avro uses the Bengali characters and conjuncts few of the frequently used conjuncts display incorrectly (In Assamese). The issue has taken a larger time of the community to correct. There were cases of the same article being created twice with two different spellings.


II. Non-availability of good quality Assamese fonts

Assamese Wikipedia and other language projects often rely on Bengali fonts. A good quality Assamese font would be of great use.


5) TTS: Text to Speech software development work:

Needs two major libraries: Pronunciation library and word library for the TTS project


6) Typing tool:

Rodali: Rodali was developed as an indigenous phonetic & Inscript typing tool for Assamese. It is available offline for Linux and Windows and online. The Windows application is built on C++ (Coding) and .NET (Interface) and the Linux application is written using iBus and Lisp. The online version uses JavaScript and is compatible with jQuery.IME (Used for Wikimedia projects for typing). Rodali development included feedback from linguists, common users and many test versions were released for testing. Suggestions were made to replace the Avro Bengali typing tool with Rodali so that the same tool is used across platforms.


7) OCR (Optical character recognition) software

Optical character recognition is used for converting text from scanned images of books/documents. There are handful of OCR softwares made in few of the Indian languages which are more or less inaccurate. OCR for Bengali has multiple bugs and it was assumed it would work for Assamese as well. But because of few of the Assamese characters the engine for OCR doesn’t work properly for Assamese. There is a need of an Assamese OCR. The Society is currently planning to invest some time for OCR as well.



--
Best!
Subhashish Panigrahi
Programme Officer, Access To Knowledge
Centre for Internet and Society
@psubhashish