I'm trying to extract letters from a game board for a project. Currently, I can detect the game board, segment it into the individual squares and extract images of every square.
The input I'm getting is like this (these are individual letters):
At first, I was counting the number of black pixels per image and using that as a way of identifying the different letters, which worked somewhat well for controlled input images. The problem I have, though, is that I can't make this work for images that differ slightly from these.
I have around 5 samples of each letter to work with for training, which should be good enough.
Does anybody know what would be a good algorithm to use for this?
My ideas were (after normalizing the image):
Any help would be appreciated!
I think this is some sort of Supervised Learning. You need to do some feature extraction on the images and then do your classification on the basis of the feature vector you've computed for each image.
Feature Extraction
On the first sight, that Feature Extraction part looks like a good scenario for Hu-Moments. Just calculate the image moments, then compute cv::HuMoments from these. Then you have a 7 dimensional real valued feature space (one feature vector per image). Alternatively, you could omit this step and use each pixel value as seperate feature. I think the suggestion in this answer goes in this direction, but adds a PCA compression to reduce the dimensionality of the feature space.
Classification
As for the classification part, you can use almost any classification algorithm you like. You could use an SVM for each letter (binary yes-no classification), you could use a NaiveBayes (what is the maximal likely letter), or you could use a k-NearestNeighbor (kNN, minimum spatial distance in feature space) approach, e.g. flann.
Especially for distance-based classifiers (e.g. kNN) you should consider a normalization of your feature space (e.g. scale all dimension values to a certain range for euclidean distance, or use things like mahalanobis distance). This is to avoid overrepresenting features with large value differences in the classification process.
Evaluation
Of course you need training data, that is images' feature vectors given the correct letter. And a process, to evaluate your process, e.g. cross validation.
In this case, you might also want to have a look at template matching. In this case you would convolute the candidate image with the available patterns in your training set. High values in the output image indicate a good probability that the pattern is located at that position.
5px
by 5px
- Blender 2012-04-06 04:22
This is a recognition problem. I'd personally use a combination of PCA and a machine learning technique (likely SVM). These are fairly large topics so I'm afraid I can't really elaborate too much, but here's the very basic process:
I had a similar problem few days back. But it was digit recognition. Not for alphabets.
And i implemented a simple OCR for this using kNearestNeighbour in OpenCV.
Below is the link and code :
Simple Digit Recognition OCR in OpenCV-Python
Implement it for alphabets. Hopes it works.
You can try building a model by uploading your training data (~50 images of 1s,2s,3s....9s) to demo.nanonets.ai (free to use)
1) Upload your training data here:
2) Then query the API using the following (Python Code):
import requests
import json
import urllib
model_name = "Enter-Your-Model-Name-Here"
url = "http://images.clipartpanda.com/number-one-clipart-847-blue-number-one-clip-art.png"
files = {'uploadfile': urllib.urlopen(url).read()}
url = "http://demo.nanonets.ai/classify/?appId="+model_name
r = requests.post(url, files=files)
print json.loads(r.content)
3) the response looks like:
{
"message": "Model trained",
"result": [
{
"label": "1",
"probability": 0.95
},
{
"label": "2",
"probability": 0.01
},
....
{
"label": "9",
"probability": 0.005
}
]
}
Since your images are coming off a computer screen of a a board game, the variation can't be 'too crazy'. I just got something working for the same type of problem. I normalized my images by cropping right down to the 'core'.
With 5 samples per letter, you might already have complete coverage.
I organized my work by 'stamping' the identifier at the start of the image filename. I then could sort on the filename (=identifier). Windows Explorer allows you to view the directory with Medium Icons turned on. I would get the identifier by a 'fake-rename' action and copy it into the Python program.
Here is some working code that can be revamped for any of these problems.
def getLetter(im):
area = im.height * im.width
white_area = np.sum(np.array(im))
black_area = area - white_area
black_ratio = black_area / area # between 0 and 1
if black_ratio == .740740740740740 or \
black_ratio == .688034188034188 or \
black_ratio == .7407407407407407:
return 'A'
if black_ratio == .797979797979798:
return 'T'
if black_ratio == .803030303030303:
return 'I'
if black_ratio == .5050505050505051 or \
black_ratio == .5555555555555556:
return 'H'
############ ... etc.
return '@' # when this comes out you have some more work to do
Note: It is possible that the same identifier (here we are using black_ratio
) might point to more than one letter. If it happens, you'll need to take another attribute of the image to discriminate between them.