An EMNIST + KMNIST classifier

2026-03-08

#project #python #ml



github code

Introduction

The aim of this project is to develop simple models that can recognise handwritten characters. The models can recognise the following Latin alphabet characters:

ABCDEFGHIJKLMNOPQRSTUVWXYZ

as well as the following Japanese hiragana characters:

あいうえおかきくけこさしすせそたちつてとなにぬねのはひふへほまみむめもやゆよらりるれろわをん

For the Latin characters, although lowercase characters are also recognised, these are grouped together with the uppercase characters.

As well as actually working on the models, another important goal of this project is to sexperiment with adding more interactive features to the website.

Demo

Loading model...
Loading model...

Development

The datasets used to train the models are the following:

After combining, the occurrences of each character in the datasets are as follows:

dataset summary

For training, the following was used:

Two CNN models were developed: mnist-large (see architecture) and mnist-small (see architecture). Both models use a similar architecture, consisting of a combination of Conv → ReLU → MaxPool layers and fully connected layers. Both models also use dropout layers to reduce overfitting.

The following plot shows the training loss for mnist-large:

small training loss

and the following plot shows the training loss for mnist-small:

small training loss