One of the most basic means of tracking a robot can employ is to assume it is in a maze of some description, and behave as though it needs to traverse as such. This is a useful formulation because the robot need not know much about the area in which it finds itself, and may employ a simple navigational rule to explore its environment. Though this assumption of no global knowledge does make the robot less efficient in navigation, it allows for compliance to changing routes and variant environments, which makes it a valuable tool.

When I talk about perception in a maze, I'm thinking of a three sensor model- the robot is detecting walls to the left, right, and front of it, like so:

When I talk about perception in a maze, I'm thinking of a three sensor model- the robot is detecting walls to the left, right, and front of it, like so:

Each sensor can report either the presence or absence of a wall, resulting in 8 possible states. We're also going to define 3 actions the agent can take- turning to face left, turning to face right, or driving forward. We're assuming the agent cannot strafe left and right, essentially. Now, using these three actions and 8 states, we can create a table of action sequences that we will want the agent to take from each state, upon arriving in that state.

Let's notice that we can assume the space behind the agent is open, since arrival in a new cell requires us to have driven from a previous one. Thus any time we drive forward, our new state can be approached as a cell with no wall in the 4th area, and thus when we determine what states will be caused by turns within a cell, we can work with the assumption that the space behind the agent initially is empty.

For instance, let's look at the following sequence of rotations within a cell:

Let's notice that we can assume the space behind the agent is open, since arrival in a new cell requires us to have driven from a previous one. Thus any time we drive forward, our new state can be approached as a cell with no wall in the 4th area, and thus when we determine what states will be caused by turns within a cell, we can work with the assumption that the space behind the agent initially is empty.

For instance, let's look at the following sequence of rotations within a cell:

In each phase, the upward direction represents the current orientation of the robot, and though it cannot 'see' behind itself, we keep the information of what is there in order to see what states future movements will generate. In this case, we have a 111 state initially, assuming we've just arrived here. By rotating to the right, we convert the perceived state to 110. Another right turn converts this to 101 space.

Now, the idea for this maze navigation algorithm is that it will follow a left-most going path (you could also do it right going, if you wanted), meaning that it should take every left turn it encounters. This assumes that a map of the space is open, which it so say, there are no loops within it which are independent of the wall against which the agent begins its routine. If there are such internal loops, then the agent will never visit certain cells, and thus fail to reach certain destinations. A functional graph theoretic definition of such mazes would be a graph such that any cycle within the graph contains no subgraphs which are themselves cycles.

Working with the perceptual model we put forward earlier, what we need to do is determine actions to take from the various states which assure that we will always take a leftmost path when the option to do so is available. However, consider a segment of the maze like the following:

Now, the idea for this maze navigation algorithm is that it will follow a left-most going path (you could also do it right going, if you wanted), meaning that it should take every left turn it encounters. This assumes that a map of the space is open, which it so say, there are no loops within it which are independent of the wall against which the agent begins its routine. If there are such internal loops, then the agent will never visit certain cells, and thus fail to reach certain destinations. A functional graph theoretic definition of such mazes would be a graph such that any cycle within the graph contains no subgraphs which are themselves cycles.

Working with the perceptual model we put forward earlier, what we need to do is determine actions to take from the various states which assure that we will always take a leftmost path when the option to do so is available. However, consider a segment of the maze like the following:

On the right, the red triangle represents a starting position for the robot. Following our loosely defined rule, we know we want it to move into the next cell, turn left, and drive into the cell located one unit left and one unit above its current location, which seems simple enough. Let's say it knows that in it's current state: 101; to move forward. This is simple enough to set up as a general rule, as it can go neither left nor right here, and there's no sense in turning twice to go right back where we came from, given that there's unexplored territory just ahead.

So we move forward, into a new state 001. The natural action here is to turn left, to face the leftmost open hallway, so that's what we do:

So we move forward, into a new state 001. The natural action here is to turn left, to face the leftmost open hallway, so that's what we do:

And here is where it gets a little more complicated. The new observed state is 000, but if we'd just arrived in a 000 block, we'd want to turn left, since it looks like there's a new leftmost path to take. We want to drive forward now, instead. So, obviously, the action choice is not only dependent on what the current state is, but also to a certain degree on what has come before.

We can parameterize this with an internal state variable, which we will call s. Essentially, we will, when turns transform the space (which is, recall, predictable under our assumption set) create different action profiles for some states which are selected for by this internal state, allowing us to have a marker, essentially, of whether we arrived in the current state via forward movement or turning. This will allow the agent to decide what action to take in a full sequence, rather than forgetting where it has just come from.

The desired sequences, which have been deduced by examination, are depicted below:

We can parameterize this with an internal state variable, which we will call s. Essentially, we will, when turns transform the space (which is, recall, predictable under our assumption set) create different action profiles for some states which are selected for by this internal state, allowing us to have a marker, essentially, of whether we arrived in the current state via forward movement or turning. This will allow the agent to decide what action to take in a full sequence, rather than forgetting where it has just come from.

The desired sequences, which have been deduced by examination, are depicted below:

The top row represents the canonical states, and rows in each column are the desired progression. In each progressive state, a line is drawn between the actual state (with the hidden 4th wall) to the perceived state. The line is contiguous if the action from the canonical state is the one we'd want in the current actual state, and dotted if not.

So for example, the turn in states 011 and 010 take us into perceived state 001. However, in canonical state 001, we'd want to turn left, but in the actual states that result from 011 and 010, we'd want to go forward, so a turn from 011 or 010 should modify the internal state. By contrast, all transitions starting from 111 agree with those prescribed for the perceived state, so we have no need of the internal state in these actions.

Writing out these combinations with the state s and the sensor readings in as the input; and the actions ({0,1,2} 0 and 2 left and right turns respectively, and 1 being drive forward) and new state s' as the outputs:

So for example, the turn in states 011 and 010 take us into perceived state 001. However, in canonical state 001, we'd want to turn left, but in the actual states that result from 011 and 010, we'd want to go forward, so a turn from 011 or 010 should modify the internal state. By contrast, all transitions starting from 111 agree with those prescribed for the perceived state, so we have no need of the internal state in these actions.

Writing out these combinations with the state s and the sensor readings in as the input; and the actions ({0,1,2} 0 and 2 left and right turns respectively, and 1 being drive forward) and new state s' as the outputs:

Where the Xs in the s column indicate that the internal state has no impact on the action taken.

You may notice the table is broken up into sections. These are convenient blocks which share relationships for writing out a closed mathematical form of the rule, which is the next part we're going to look at. Although as a table, the functional representation is technically complete, it's usually nice to have a more compact form of a rule. To this end, the different divisions are made to group convenient sections together.

To start, we note that all cases in which i0 is 1 result in s' being 0. Additionally, in all other cases, if s is 1, then s' will be 0 and vice versa. From these observations, we can write a very simple rule for s':

s' = (1 - i0)*(1 - s(1 - i1))

We can readily see that for i0 = 1, this will always return 0, and otherwise, we will invert s to get s' if s is 0, and return i1 if s is 1.

Turning our attention to the actions, a, we can see that a is equal to s when i0 and i1 are both 0, and 0 when i0 is 0 and i1 is 1. When i0 is 1, however, it's a little more complicated. However, we can recognize that a is 2 when i1 and i2 are the same, and is equal to i2 when they are different. Using this collection of relations, we can, with a little contortion, write out an expression for the action:

a = (1 - i0)*(1 - i1)*(s) + (1 - i0)*(i1)*(0) + (i0)*[ (1-i1)*(1-i2)*(2) + (i1)*(i2)*(2) + (1-i1)*(i2)*(i2) + (i1)*(1-i2)*(i2) ]

... which by expansion and simplification becomes:

a = (1 - i0)*(s)*(1 - i1) + (i0)*(2 + 3i1i2 - 2i1 - i2)

Which gives us our final policy in compact form as:

s0 = 0

s' = (1 - i0)*(1 - s(1 - i1))

a = (1 - i0)*(s)*(1 - i1) + (i0)*(2 + 3i1i2 - 2i1 - i2)

As a further extension, we can actually use this empirical formula to generalize out to continuous wall states in lieu of a discrete threshold limit, by registering a ratio, scaled 0-1 to feed into the expression as in. In conjunction with this, we also make the state continuous, and the update be an averaging procedure, so that it does not rapidly ambulate between states, and convert the 0-2 action space into a proportionate space, wherein the directions of rotation of each motor smoothly transition from total leftward turns, to forward motion, to rightward turns as the action variable shifts from 0 to 2.

You may notice the table is broken up into sections. These are convenient blocks which share relationships for writing out a closed mathematical form of the rule, which is the next part we're going to look at. Although as a table, the functional representation is technically complete, it's usually nice to have a more compact form of a rule. To this end, the different divisions are made to group convenient sections together.

To start, we note that all cases in which i0 is 1 result in s' being 0. Additionally, in all other cases, if s is 1, then s' will be 0 and vice versa. From these observations, we can write a very simple rule for s':

s' = (1 - i0)*(1 - s(1 - i1))

We can readily see that for i0 = 1, this will always return 0, and otherwise, we will invert s to get s' if s is 0, and return i1 if s is 1.

Turning our attention to the actions, a, we can see that a is equal to s when i0 and i1 are both 0, and 0 when i0 is 0 and i1 is 1. When i0 is 1, however, it's a little more complicated. However, we can recognize that a is 2 when i1 and i2 are the same, and is equal to i2 when they are different. Using this collection of relations, we can, with a little contortion, write out an expression for the action:

a = (1 - i0)*(1 - i1)*(s) + (1 - i0)*(i1)*(0) + (i0)*[ (1-i1)*(1-i2)*(2) + (i1)*(i2)*(2) + (1-i1)*(i2)*(i2) + (i1)*(1-i2)*(i2) ]

... which by expansion and simplification becomes:

a = (1 - i0)*(s)*(1 - i1) + (i0)*(2 + 3i1i2 - 2i1 - i2)

Which gives us our final policy in compact form as:

s0 = 0

s' = (1 - i0)*(1 - s(1 - i1))

a = (1 - i0)*(s)*(1 - i1) + (i0)*(2 + 3i1i2 - 2i1 - i2)

As a further extension, we can actually use this empirical formula to generalize out to continuous wall states in lieu of a discrete threshold limit, by registering a ratio, scaled 0-1 to feed into the expression as in. In conjunction with this, we also make the state continuous, and the update be an averaging procedure, so that it does not rapidly ambulate between states, and convert the 0-2 action space into a proportionate space, wherein the directions of rotation of each motor smoothly transition from total leftward turns, to forward motion, to rightward turns as the action variable shifts from 0 to 2.