| |||||||||||||||||||||
| |||||||||||||||||||||
|
|||||||||||||||||||||
How To Populate a Foveola Shape Database: A User Tutorial
This article will detail how to create a functional Foveola database. As a case study we will look at Japanese numerical characters, as they provide an excellent range of shapes and should be largely unfamiliar. Creating a Foveola database does not require intricate knowledge of how the system works, but it does require careful preparation, choice of template shapes, and a little patience. Firstly though, a quick Japanese lesson. Below are two diagrams showing the twelve main Japanese characters (kanji) for numbers:
For example, 158 is written in kanji as. As you can see, they are an interesting mix of shapes. One through three are simple whilst four and five are relatively complex. Six and eight have structural similarities, as do seven, ten and a thousand. They are also a collection of curved, straight, enclosed and latticed areas. These properties make them an ideal (albeit, slightly difficult) case study. PreparationDatabase preparation is highly dependent on the type of database and shapes you are using. Ideally, you will need to ask questions like, “Does my database need to support...”
For our case study, we will be using the standard Gothic font with no variations, although we will require a degree of robustness to allow noisy input characters. Next, we must prepare our images to import into Foveola. While the Foveola GUI does support simple drawing functions, creating images and importing them allows a greater degree of accuracy in the reproduction of the kanji. Foveola uses a very simple format called PBM, that saves image information in a flat ASCII format. Creating and saving these files will vary between the supported Foveola platforms. Linux users have ImageMagick and convert, whereas Windows users might need to download additional software that supports the P*M formats. The images for this case-study were created in Adobe Photoshop 6.01 and saved using a freeware PBM exporter. Once you have set-up your system to export PBMs, there are some important considerations when creating your template images: size, shape and dithering.
Importing into Foveola: Command LineImporting shapes into a database from the command line is very simple. For example, to create our database and add the kanji for '1' to it: addShape Ichi_0 1 -f 1-ichi.pbm -d JapaneseNumbers.db To breakdown this command into its individual components:
Running addShape with these parameters should yield: WARNING Database not found. Attempting to create new database JapaneseNumbers.db. OK Shape Ichi_0 added to database with id 1. Repeating this for your other shapes will create your database. The command line utility is excellent for scripting purposes, and for adding shapes en masse. For example, a shell script could be created to automatically populate your database using a pre-defined directory structure to ascertain the shape name and type. For the purposes of this case-study though, the GUI will be the predominant tool used throughout. Importing into Foveola: GUIFoveola ships with a Java-based GUI that can be used to perform many of the command-line utility functions. To run the GUI, double-click the foveola.jar icon in Windows, or type java -jar foveola.jar under Linux. This assumes you have the Java Runtime Engine (JRE) installed. Please see the Foveola release notes if you have trouble starting the GUI. When the GUI starts, it automatically loads its default database. We want to create a new database, so select “Database, New DB” and create a database called “JapaneseNumbers.db”. Note you may need to replace or rename any database you created with the command line utilities previously. To add an image, simply select “File, Load Image File” and select “1-ichi.pbm”. You will see the image displayed in the Shape Editor grid. To add the shape to the database select “Database, Add Image to DB” and type “Ichi_0” in the Shape Name and “1” in the Shape Type. Add the shape and note how the Database Viewer updates with the new shape and information. Repeat the process with the other twelve shapes, or simply open up JapaneseNumbers-Stage1.db from the project directory. Testing the DatabaseNow you have a database, with 12 shapes all denoting the main Japanese counting kanji. In the Foveola GUI, try testing the functionality of the database by drawing several of the characters in the Shape Editor grid and hitting 'Classify'. Try the first three numbers, since these are easy to draw and Foveola should classify them all correctly.
As you try the other kanji, you'll quickly realize that Foveola will often not match the shapes for the more complicated characters. This is because Foveola's classification algorithm categorizes the shapes by using their unique structures, which will not always be present in the test shapes. For example, Figure 4 demonstrates how Foveola's confidence system can be used to understand this concept further. The bottom-most image is unrecognized by Foveola, despite its similarity to kyu, or 9. Next, the middle image adds a small 'tail' to the shape, and Foveola recognizes it as kyu, but reports the confidence as 3 (the lower the better). The top-most image lengthens the tail to produce the highest confidence. Foveola can in fact return multiple results ranked by confidence. This is very important, since large databases can feature many shapes that are slightly similar, but have drastically different meanings. In keeping with our case-study, here are two Japanese examples that highlight this:
Having said this, it is important to create a robust database to increase functionality. After all, very rarely will you want to present Foveola with an accurate shape to classify. Reinforcing Foveola DatabasesReinforcing your Foveola database is the essence of improving its functionality and robustness. You should ask yourself the same questions that were discussed during the preparation phase: what are your target input classes? Will your input images be noisy, different orientations or perhaps low-resolution? For our case-study, we are looking at a specific font, but we require a degree of robustness to allow for slight distortions in the characters. The GUI is one of the best ways of understanding the robustness of the existing templates. Load each template into the Shape Editor (do this by using the 'Grid' icon in the Database Viewer), and experiment with the various morphological features such as Blunt, Erode and Dilate. If Foveola rejects a modified shape that you feel it should classify, add the modified version to the database using the naming convention we have discussed. For example, looking at the images below, the yon on the left was the original shape added to the database. By using the eraser tool, features were erased until Foveola did not recognize the character (right):
Nevertheless, the shape should still be classified as an instance of yon. Therefore, we select “Database, Add Image to DB” and type “Yon_1” as the name and “4” as the type. Be sure to work through the database systematically, testing each template for possible shape variations that should be added. Never assume a shape will capture all variations. For example, when testing the case study database, the hachi (8) shape proved to be deceptively limited. While the shape only consists of two simple strokes, experimenting within the GUI showed that the model Foveola used was closely tied to the length of lefthand stroke. Therefore, a second template was added (hachi_1) that allowed for shorter lefthand strokes. Similarly, it is important to understand there is no fixed number for the variations required for a given shape. It is also important to keep your input target classes in mind when testing the templates. For example, while Foveola correctly classifies both typeset and hand-drawn inputs with just one template for many of the characters (all but 4, 5, 8, and 9), yon (4) required seven templates for reasonable typeset recognition. Handwritten recognition was dubious even with this number of templates. Test images are available in the 'set1' subdirectory of the project. Testing our DatabaseTesting is an important part of any development cycle, and building a Foveola database is no exception. Imagine the following greyscale image was scanned from a document, complete with creases and interfering text from the other side of the paper:
Firstly, we must turn the 8-bit image into a 1-bit bitmapped image. To do this, we must intelligently threshold the image as discussed previously. Choosing a thresholding point is important, since we want to retain the features (remembering the 5-pixel restriction Foveola imposes), while ensuring as little noise as possible remains. Setting the threshold point too low (0) makes the characters thin and weak, setting the threshold too high (255) highlights the creases and makes the characters on the opposite page stand out, creating an unreadable image. The optimum threshold for this image is about 220, where the character features remain strong. Granted, the creases are also highlighted, but their width causes Foveola to disregard them. Next, each character is enclosed in a 50x50 window and exported to a PBM file. The characters have been saved in the 'set2' subdirectory. All characters are correctly classified apart from char2, char4 and char12. These characters are not present in our database, so this is the desirable result. Success! References[1] Matthews, James. An Introduction to Image Segmentation. Available online at: http://www.generation5.org/content/2003/segmentation.asp.
Submitted: 03/10/2004 Article content copyright © James Matthews, 2004.
|
|
||||||||||||||||||||
All content copyright © 1998-2007, Generation5 unless otherwise noted.
- Privacy Policy - Legal - Terms of Use -