__Experiment 1: XOR simulations__

Of course, we will be building up to the exciting stuff- we start with a somewhat boring experiment- the XOR problems. The XOR problem set is a useful benchmark for machine learning algorithms. The problem space consists of a simulation wherein the agent is attempting to learn to mimic the action of (at least) an XOR logic circuit (but possibly more elements). The main notion behind this test being that the mapping of inputs onto the output of an XOR gate is a linearly inseparable problem- that is, one cannot draw a line in the input space which partitions the space into classified regions (see below). This is important, because the ability to cope with linear inseparability is crucial to effective machine learning algorithms- many, if not most, interesting problems have components which defy linear classification. Now, just because an algorithm can learn and XOR simulation does not mean it can solve nonlinear problems in general, of course, but if it fails on this simple case, there is certainly an inadequacy somewhere.

Additionally, it is fairly simple to increase the complexity of a logic simulation test by adding additional outputs corresponding to other logic functions. In order to preserve the known inseparability, prefer to retain the original XOR output and add additional outputs without modifying the first function.

Below we have two diagrams- on the left, an illustration of the input/output relationship of the XOR logic circuits that will be simulated- fors the simple XOR case, and below it, and expanded 2XOR circuit, which contains and XOR and an inverted XOR output. On the right, there is a graphical depiction of the output of an XOR gate as a function of its two inputs- blue representing a 0 output and red representing a 1. One can see that it is not possible to draw a line which divides exactly the red dots from the blue (a line running through either color does not count in this context, nor does a surface in higher dimensions- making that surface is actually kind of the point)

The experiment is actually fairly straightforward, in this case. We will select various inputs, which are binary vectors of the form <b1,b2>. These inputs will be fed into the agent, and the output thereof recorded. Additionally, we will record the output of the logic circuit for those inputs. If the agent's output and that of the circuit match, then the agent will receive positive reinforcement, negative otherwise. We will continue this process until observing a certain number of consecutive correct outputs (50 or so, though we will truncate the data that we plot before this consistent rise for visual cleanliness).

As a control for this experiment, we also train a standard Q Learning implementation. The expectation is that, on such a simple problem, the added complexity in the effective input to the Murin agent's RL algorithm (due to the concatenation of state with action). In point of fact, this is exactly what we observe:

As a control for this experiment, we also train a standard Q Learning implementation. The expectation is that, on such a simple problem, the added complexity in the effective input to the Murin agent's RL algorithm (due to the concatenation of state with action). In point of fact, this is exactly what we observe:

The graph above shows learning curves for QL (in Blue) and Murin (in Yellow) . As predicted, the Murin agent learns more slowly than QL, reaching the meteoric rise at 2-3 times the run length (depending on where you measure from) that the QL agent takes to learn the problem. Notable is the confirmation that both agents do indeed demonstrate learning- handling the linear inseparability- and that the Murin agent does in fact learn within a comparable time frame to the QL baseline.

Next, we apply the same experimental procedure to the 2XOR expanded problem mentioned above, with the intent to examine the behavior of the agents as the complexity of the problem at hand increases. In this case, we've added a !XOR gate to the simulated system, but other than the additional output, all experiment features remain the same:

Next, we apply the same experimental procedure to the 2XOR expanded problem mentioned above, with the intent to examine the behavior of the agents as the complexity of the problem at hand increases. In this case, we've added a !XOR gate to the simulated system, but other than the additional output, all experiment features remain the same:

This graph shows the result- QL in Blue, Murin in Orange. In this case, the behavior is far more interesting- we have clear peaks and troughs in learning, where the agents make mistakes and readjust. Though the data aesthetics are fascinating, far more interesting is the reduced performance gap between the two, indicating that the Murin agent can, as complexity increases, possibly take advantage of other pattern information- the hypothesis being that the temporal elements introduced to the design naturally lead to this additional temporal advantage.

That is, admittedly, a pretty bold claim. However, to demonstrate learning of this type, we have one additional test within the XOR domain. In this final test, we compare the learning performance of the agents under differing sampling policies- in one case, with randomly delivered input on which to train, and on ordered delivery in the second case- inputs generated simply by counting up (mod 4, since we have two binary inputs).

The first half of this test is performed with the QL agent, performance curves shown below (note that in this case, we are displaying learning as window average of percent correct, so the curves level out at 100%):

That is, admittedly, a pretty bold claim. However, to demonstrate learning of this type, we have one additional test within the XOR domain. In this final test, we compare the learning performance of the agents under differing sampling policies- in one case, with randomly delivered input on which to train, and on ordered delivery in the second case- inputs generated simply by counting up (mod 4, since we have two binary inputs).

The first half of this test is performed with the QL agent, performance curves shown below (note that in this case, we are displaying learning as window average of percent correct, so the curves level out at 100%):

The first graph shows the QL agent learning on randomly sampled data, the second on ordered delivery sampling, and the last image is the superimposition of the two methods. It is clear that there is no notable difference between the two different delivery methods, and so we conclude that the agent is not sensitive to temporal patterns (note that this is not necessarily generally true for Q Learning- in cases in which there are problem epochs for which training occurs at the end, the exact path through parameter space can be influenced by ordering, since the agent modifies its own future decisions via training- if it observes a solution early, positive feedback may restrict it to this solution, and limit exploration).

Now, we test the Murin agent using the same protocol:

Now, we test the Murin agent using the same protocol:

As before, the first image illustrates performance when data are sampled randomly, the second for the orderly data, and the final a superposition of the two. We can see a definite difference between the two trials- not only are the two curves different, but the test in which the data bear temporal correlations, the Murin agent learns

So, to conclude this section, we managed to demonstrate that the Murin agent is capable of learning on problems which display linear inseparability, and learns at a rate that is scale comparable to QL on the same problems. We also observed evidence that increases in complexity can reduce the performance gap between the Murin agent and the QL agent when compared amongst related classes of problems. Building on this by extrapolating on the hypothesis that the Murin agent is able to take advantage of patterns that the QL agent is restricted from, we subjected both to a situation in which we could inject additional patterns, and observed improvements in Murin's performance on the patterned case, and no comparable improvement for the QL agent under the same circumstance, which lends credence to the temporal linking hypothesis.

*faster*, lending support to the hypothesis that the elements introduced allow it to take advantage of temporal patterns which the QL agent appears to ignore.So, to conclude this section, we managed to demonstrate that the Murin agent is capable of learning on problems which display linear inseparability, and learns at a rate that is scale comparable to QL on the same problems. We also observed evidence that increases in complexity can reduce the performance gap between the Murin agent and the QL agent when compared amongst related classes of problems. Building on this by extrapolating on the hypothesis that the Murin agent is able to take advantage of patterns that the QL agent is restricted from, we subjected both to a situation in which we could inject additional patterns, and observed improvements in Murin's performance on the patterned case, and no comparable improvement for the QL agent under the same circumstance, which lends credence to the temporal linking hypothesis.