You are here: Home Corpus-driven isiZulu spellchecker launched
Document Actions

Corpus-driven isiZulu spellchecker launched

A concerted effort resulted in the isiZulu spellchecker successfully being launched Thursday, 10 November 2016 at the “Launch of the UKZN isiZulu Books and Human Language Technologies” event, which was also featured on SABC 2 Morning Live and e-news in the afternoon.

The spellchecker is a culmination of research and development of last year’s UCT Computer Science Honours project of Mr. Balone Ndaba, supervised by Prof. Hussein Suleman and Dr. Maria Keet at UCT, which was turned into a user-usable app by 3rd year CS@UCT student Mr. Norman Pilusa this semester, and it benefited from input by Dr. Langa Khumalo and colleagues at UKZN.

It is currently the only functioning spellchecker for isiZulu.

This spellchecker has as distinguishing feature that it is data-driven with a statistical language model as the back-end. The isiZulu National Corpus from UKZN was used to train the model, which takes all those words and computes the tri-grams (e.g., yebo is split into yeb and ebo) and their probability of occurring is then calculated. Unusual successive characters are detected as very probably incorrect, resulting in the word being flagged as misspelled. The user can correct this or add the word to the dictionary. More details about the spellchecker.

The isiZulu spellchecker can be downloaded from the ULPDO@UKZN website and the source code is available from Pilusa’s GitHub repository. It being data-driven, adaptation to closely related languages—notably isiXhosa, isiNdebele, and siSwati—should be relatively easy to do with sufficient text documents.

The launch itself saw an impressive line-up of speeches and introductions. Among others: the keynote address was given by Dr Zweli Mkhize, UKZN Chancellor and member of the ANC NEC; Mpho Monareng, CEO of PanSALB gave an address and co-launched the human language technologies; and UKZN’s VC Andre van Jaarsveld provided the official welcome. In addition to the launch of the spellchecker, the isiZulu National Corpus, an isiZulu Term Bank, and the ZuluLex mobile-compatible application (Android and iPhone) were also launched, as well as two books in isiZulu.

last modified 2016-11-17 13:11