## Shape Detection and Identification

As human beings reading right now (bots trawling this page don't count), you exercise an amazing faculty just getting the letters into your brain and recognizing them, as task that is still, decades into the work, an open problem in the field! It is amazing, especially considering how we (people in general and scientists in particular) don't really know how our brains pick up the letters out of the images in the first place. If you ask a neuroscientist how vision works, you'll get a lovely explanation of the lensing properties of the eye, and how the proteins in optic nerves are light sensitive and the electrical potential gradient set up by potassium and sodium and how photon strikes cause the proteins to fold differently which opens channels that allow those ions to flow which generates a current, which triggers more neurons to fire until the signal reaches the brain and then... some... other stuff happens... the visual cortex is like a 3/10 of the cortex, they finish, not so triumphantly. And that's even not mentioning the part where you turn the letters into words and the words into ideas! But let's not get into that right now, I want to talk about kindergarten problems.

So what I want to illustrate there is that the input mechanism- eyes, optic nerves, and such- are pretty well understood. In fact, doctors are just about able to build artificial eyes (http://en.wikipedia.org/wiki/Argus_retinal_prosthesis, e.g). But the stuff that turns sight into vision is a little fuzzy. Similarly, optics and camera manufacture are very mature sciences, but computer vision is still a hot topic. The basic question, perhaps the most fundamental process of vision- recognition, is still pretty wishy-washy. There are dozens, perhaps hundreds, of algorithms that are designed to learn how to detect certain objects, based largely on interpretation of many small abstract metrics for portions of an image. Though there are many options for 'training' the various and sundry identifiers out there, they are usually highly involved, slow, and too abstract for my taste. Being a pragmatic person, I prefer a more constructive approach most of the time. This is where the problem a toddler can solve comes in.

Take a look at this picture:

So what I want to illustrate there is that the input mechanism- eyes, optic nerves, and such- are pretty well understood. In fact, doctors are just about able to build artificial eyes (http://en.wikipedia.org/wiki/Argus_retinal_prosthesis, e.g). But the stuff that turns sight into vision is a little fuzzy. Similarly, optics and camera manufacture are very mature sciences, but computer vision is still a hot topic. The basic question, perhaps the most fundamental process of vision- recognition, is still pretty wishy-washy. There are dozens, perhaps hundreds, of algorithms that are designed to learn how to detect certain objects, based largely on interpretation of many small abstract metrics for portions of an image. Though there are many options for 'training' the various and sundry identifiers out there, they are usually highly involved, slow, and too abstract for my taste. Being a pragmatic person, I prefer a more constructive approach most of the time. This is where the problem a toddler can solve comes in.

Take a look at this picture:

I imagine that you almost instantly knew that there was a blue triangle, a green rectangle, and a red rectangle in that picture. So easy, a toddler can do it. Really: https://www.youtube.com/watch?v=44ShKW_j3p0. But then we ask how. How did you identify the blue shape as a triangle? If you say you saw it and you knew, well... that's fine, I guess, but since we want a machine to be able to do it, that isn't very helpful.

Maybe instead you could say you compared it to all triangles you've ever seen, and could tell it was really similar to that. Ok, maybe that's a little better, but then you had to scan through all your memories of triangles, determine what similarity means, do the comparisons; and even after all that, you can say, the blue thing is something like 93% in agreement with triangles... but how rectangular is it? Is it more rectangularish than triangularish? Seems like it's too complicated. Well, maybe instead of actually comparing all triangles you know (which also comes with the problem that you need to know a bunch of triangles first), you just come up with some abstract measure of triangleness and rectangleness and use those to judge the type of shape. There are many ways to do this, some even automatically. In terms of simplicity, practicality, and clarity, I like to develop algorithms which are not totally abstracted.

Typically this kind of thinking starts by asking about definitions. We ask ourselves, "How do we define a triangle? As opposed to a rectangle, or a pentagon?". By looking closely at the defining properties, we can hopefully determine exactly how we extract the information we need from the image and, if we're lucky, turn that knowledge into an algorithm.

Fortunately, this is pretty simple for basic shapes: A shape is defined by the number of edges and/or corners it has. For now, we neglect funky shapes that have curved edges or inward bends. Just the basics for now. That bit of information gives the basic outline of the strategy- count the number of corners a shape has. Now, our problem is broken down into two parts: 1- find the corners on a shape; and 2- count them. The second part is so easy, we aren't even going to talk about it again. The first is a little more tricky. If you think about it, identifying the corners can also be broken into two parts- selecting out the shape itself, and picking out the corners. It's important to select out a single shape. If, for instance, we had some method of identifying corners arbitrarily in an image (hint: we will), we'd need to know which ones belonged to the triangle and which to the rectangle.

Although there are a number of options for picking out regions of an image (the process of which is often called image segmentation) based on any number of differing criteria, we will be using blob detection in this case. Blob detection basically runs through an image and finds the boundaries of different areas, based on image intensity differences, which is to say, the image is converted to gray scale and then the value of each pixel is used. Using these boundaries, the inside areas of different regions are labeled, allowing individual examination of each. This is what the shapes image looks like after blob segmentation:

Maybe instead you could say you compared it to all triangles you've ever seen, and could tell it was really similar to that. Ok, maybe that's a little better, but then you had to scan through all your memories of triangles, determine what similarity means, do the comparisons; and even after all that, you can say, the blue thing is something like 93% in agreement with triangles... but how rectangular is it? Is it more rectangularish than triangularish? Seems like it's too complicated. Well, maybe instead of actually comparing all triangles you know (which also comes with the problem that you need to know a bunch of triangles first), you just come up with some abstract measure of triangleness and rectangleness and use those to judge the type of shape. There are many ways to do this, some even automatically. In terms of simplicity, practicality, and clarity, I like to develop algorithms which are not totally abstracted.

Typically this kind of thinking starts by asking about definitions. We ask ourselves, "How do we define a triangle? As opposed to a rectangle, or a pentagon?". By looking closely at the defining properties, we can hopefully determine exactly how we extract the information we need from the image and, if we're lucky, turn that knowledge into an algorithm.

Fortunately, this is pretty simple for basic shapes: A shape is defined by the number of edges and/or corners it has. For now, we neglect funky shapes that have curved edges or inward bends. Just the basics for now. That bit of information gives the basic outline of the strategy- count the number of corners a shape has. Now, our problem is broken down into two parts: 1- find the corners on a shape; and 2- count them. The second part is so easy, we aren't even going to talk about it again. The first is a little more tricky. If you think about it, identifying the corners can also be broken into two parts- selecting out the shape itself, and picking out the corners. It's important to select out a single shape. If, for instance, we had some method of identifying corners arbitrarily in an image (hint: we will), we'd need to know which ones belonged to the triangle and which to the rectangle.

Although there are a number of options for picking out regions of an image (the process of which is often called image segmentation) based on any number of differing criteria, we will be using blob detection in this case. Blob detection basically runs through an image and finds the boundaries of different areas, based on image intensity differences, which is to say, the image is converted to gray scale and then the value of each pixel is used. Using these boundaries, the inside areas of different regions are labeled, allowing individual examination of each. This is what the shapes image looks like after blob segmentation:

Different shades of gray represent different labels. Now, since each shape is labeled individually, we can look at each one without having to worry about interference from the others.

Now we can turn our attention to finding the corners. Luckily, there is already a good tool available for this, appropriately called corner detection. Through the use of a number of different algorithms, the most common being the Shi-Tomasi method. Again, I won't bore you with the details, but the idea is that the algorithm will report a selection of possible 'corners'. This candidate list will not always exclusively contain the actual corners, as you can see in the image below- candidate points are marked in red, the centers of individual shapes are purple:

Now we can turn our attention to finding the corners. Luckily, there is already a good tool available for this, appropriately called corner detection. Through the use of a number of different algorithms, the most common being the Shi-Tomasi method. Again, I won't bore you with the details, but the idea is that the algorithm will report a selection of possible 'corners'. This candidate list will not always exclusively contain the actual corners, as you can see in the image below- candidate points are marked in red, the centers of individual shapes are purple:

Here, the rectangles' corners were both precisely selected, but the triangle has just a few extra points marked (about 22 extra, actually). And this is a bit of a canned image, one that was a photograph of some shapes might be even more off-kilter! But, we do notice that all the right points are marked, so really, we just want to prune the set down to the correct values. The question is How?

To start, notice how all the corners were detected on an edge- a line bounding the outside of the shape from the inside. Also, all the points we would like to get rid of are not on corners (obviously- but mathematical thinking requires us to be sure of simple facts, you see). That means that all the non-valid candidates must lie between two valid points, since they are on an edge which must be bounded by corners. We can therefore eliminate invalid candidates by asking whether they are collinear (as in, on the same straight line) with two other points, and in between those two points. For instance:

To start, notice how all the corners were detected on an edge- a line bounding the outside of the shape from the inside. Also, all the points we would like to get rid of are not on corners (obviously- but mathematical thinking requires us to be sure of simple facts, you see). That means that all the non-valid candidates must lie between two valid points, since they are on an edge which must be bounded by corners. We can therefore eliminate invalid candidates by asking whether they are collinear (as in, on the same straight line) with two other points, and in between those two points. For instance:

The green point is an invalid candidate for a triangle (I've made it a little off-line because real world false positives won't play as nice as our canned image). We will compare it to the other points and check whether it is collinear with them:

So here we've drawn the lines between the candidate (in the green pants) and all the comparisons (in red dresses). How can we use these lines to check collinearity? The trick is to use the angles between pairs of lines. See, when two line segments are precisely in line (as below, #1) the angle formed is 180* in measure. Other orientations form different angles (as below, any but #1):

Again, we are lucky, because there is an easy way to determine the angle between two line segments mathematically using dot products, which are a vector operation. Once more, your don't need to worry over the details, just know we can do it. Now, what we do is look at all the pairs of lines for each candidate point, and if the angle is close to 180*, we can throw out the point. We say 'close to' because, as I mentioned earlier, the real world is cruel and will not always put false points right on the connecting lines. So- the comparisons:

Now, this last one is interesting. If I had to eyeball it, I'd say it was about 165*. Is that close enough to 180* to qualify as invalid? Well, that's a place where we get to some trouble. The definition of 'closeness' is pretty vague, and it is difficult to put into an algorithmic context. For instance, the shape above is probably a triangle, but what if it's actually a wonky four sided thing? This is where we, as scientists, pawn off the responsibility of drawing those fine distinctions on the end user via what we call a 'tuning parameter'. This is a variable, set on a case-by-case basis, which tells the algorithm how off-triangular a triangle can get before we want to think of it as a funky rectangle. In any case, I'm deciding my triangles can be about 20* off line before they become weird rectangles, so green pants has to leave the party, at which point (assuming it is the last invalid candidate in the list to check) we're left with the three red-dressed candidates, which are pretty easily seen as very non-colinear, and therefor a three-cornered shape: a triangle.

We can apply this method to other shapes, and....

...voila! Shapes. Since we separated each shape from the others, the algorithm will report out a list of 'good' corners, the number of which represents the type of shape.

It looks here like we're done here! Nope. So it's good that it works on this MSpaint drawing, but nothing works if it only works in simulation. So, let's see it working in the real world, yeah? Well, I set up some cardboard shapes with colored tape on them, took photos, and processed them with the same algorithm! Except with some added stuff to restrict the field of view to the area with the shapes, maybe we'll talk about that later... probably not, it's much more boring than this stuff. Anyway, here are the end results:

It looks here like we're done here! Nope. So it's good that it works on this MSpaint drawing, but nothing works if it only works in simulation. So, let's see it working in the real world, yeah? Well, I set up some cardboard shapes with colored tape on them, took photos, and processed them with the same algorithm! Except with some added stuff to restrict the field of view to the area with the shapes, maybe we'll talk about that later... probably not, it's much more boring than this stuff. Anyway, here are the end results: