In the process of over analyzing a puzzle I encountered, I've fallen pretty deep into some graph theory and algorithmic analysis. I'm not going to talk about the puzzle here, because I'm in the process of writing a loooong treatment independent of this post, but the tiny puzzle (for which I took about 6 seconds to solve the original, no hyperbole- it's a real baby little one) has been an amazingly and unexpectedly rich starting point that's lead me to a remarkable number of little discoveries, one of which I'm going to put here, because I think it's kind of neat.

The focus is a graph. At one point, I asked myself when it was possible to find within a connected graph a bipartite subgraph, which is a graph that can be divided into two subsets of vertices, in which each set is connected to at least one element in the other set, and to no element within its own set. The principal motivation for this case was an observation about the puzzle space that implied that any graph that contained a solution contained a partition that split elements within the graph that were paired into two sets, which resembles in some ways a bipartite graph (the resemblance that inspired this investigation was mostly superficial, but inspiring I suppose).

The approach I took to this problem is the main reason I felt it would support an independent post. I figured to start with a tree (T), since trees are usually much more amenable to analysis than general graphs.

The focus is a graph. At one point, I asked myself when it was possible to find within a connected graph a bipartite subgraph, which is a graph that can be divided into two subsets of vertices, in which each set is connected to at least one element in the other set, and to no element within its own set. The principal motivation for this case was an observation about the puzzle space that implied that any graph that contained a solution contained a partition that split elements within the graph that were paired into two sets, which resembles in some ways a bipartite graph (the resemblance that inspired this investigation was mostly superficial, but inspiring I suppose).

The approach I took to this problem is the main reason I felt it would support an independent post. I figured to start with a tree (T), since trees are usually much more amenable to analysis than general graphs.

The question was then, how can you tell if a tree contains a connected bipartite subgraph? One way to try and figure this is to try and construct such a graph. This approach demonstrates one of the greatest utilities of trees in graph theoretic analysis: if you have a tree, you cannot remove any edges and still have a connected graph. So, by looking at a tree, we know that all edges must be included in out constructed bipartite graph, and thus the tree itself is a bipartite graph, as is.

We can proceed by looking at our two partitions, call them A and B. Now, w.l.g, we can let some vertex in the tree be an element of A (we can say this freely, because it's going to be in A or B, and the label doesn't matter). Once this vertex (call it c) is in A, we can deduce that all vertices connected to it must be in B, since nothing in A can share a vertex with c. At this point, we have c in A, and all vertices of distance 1 to c: d(c,vi) = 1, in B. Next, we know that we'll have all vertices connected to the newly added ones in B must go in A. We have to ask if this leads to a contradiction- can any of them be adjacent to c? Since we're looking at a tree (T), this cannot be so, as it would mean that we can travel from c to an element of B, from that element to another in A which is not c, and then from there to c. This would imply the existence of a loop in T, which is a contradiction, so nothing we just added to A can be connected to c, nor, by the same reasoning, to anything else added at this step.

We can continue this process, with each next step adding the vertices adjacent to the most recently allocated set which have not already been assigned, to the opposite set of the one most recently added to. In this way, we are placing, from T, elements which have d(c,v) = 2n in set A, and those of set d(c,v) = 2n+1 into B. Because there can be no loops within T, and the distance between any two elements in either set is even (eg. from a d = 2 to a d = 6 in A, or from d = 1 to d = 5 in B), we may not have distance 1 between any elements within the same set.

We can proceed by looking at our two partitions, call them A and B. Now, w.l.g, we can let some vertex in the tree be an element of A (we can say this freely, because it's going to be in A or B, and the label doesn't matter). Once this vertex (call it c) is in A, we can deduce that all vertices connected to it must be in B, since nothing in A can share a vertex with c. At this point, we have c in A, and all vertices of distance 1 to c: d(c,vi) = 1, in B. Next, we know that we'll have all vertices connected to the newly added ones in B must go in A. We have to ask if this leads to a contradiction- can any of them be adjacent to c? Since we're looking at a tree (T), this cannot be so, as it would mean that we can travel from c to an element of B, from that element to another in A which is not c, and then from there to c. This would imply the existence of a loop in T, which is a contradiction, so nothing we just added to A can be connected to c, nor, by the same reasoning, to anything else added at this step.

We can continue this process, with each next step adding the vertices adjacent to the most recently allocated set which have not already been assigned, to the opposite set of the one most recently added to. In this way, we are placing, from T, elements which have d(c,v) = 2n in set A, and those of set d(c,v) = 2n+1 into B. Because there can be no loops within T, and the distance between any two elements in either set is even (eg. from a d = 2 to a d = 6 in A, or from d = 1 to d = 5 in B), we may not have distance 1 between any elements within the same set.

Therefore T has been arranged into a connected bipartite graph. That is to say,

Now, since every tree can be partitioned into a connected bipartite graph (I'm getting sick of typing this- it's a CBG now), we know that if we can find a spanning subgraph of G which is a tree, G has a subgraph which is a CBG. But, if G is connected, then we can always remove edges which do not disconnect the graph until there are |G|-1 edges in G. More clearly, every connected graph G contains a connected subgraph which is a tree (a spanning tree), and that tree, as per the above, is a CBG. Thus, we can conclude that every graph contains a CBG. Furthermore, finding such a graph is only as computationally complex as identifying a spanning tree within G (linear time in the number of vertices: O(n)), picking a starting vertex (constant time), and sorting its tree neighbors into sets (linear time in the number of edges O(m)); which is to say, very very fast.

*every*tree is a bipartite graph, because there were no assumptions about T or c made, aside from T being a tree, and c being an element of T.Now, since every tree can be partitioned into a connected bipartite graph (I'm getting sick of typing this- it's a CBG now), we know that if we can find a spanning subgraph of G which is a tree, G has a subgraph which is a CBG. But, if G is connected, then we can always remove edges which do not disconnect the graph until there are |G|-1 edges in G. More clearly, every connected graph G contains a connected subgraph which is a tree (a spanning tree), and that tree, as per the above, is a CBG. Thus, we can conclude that every graph contains a CBG. Furthermore, finding such a graph is only as computationally complex as identifying a spanning tree within G (linear time in the number of vertices: O(n)), picking a starting vertex (constant time), and sorting its tree neighbors into sets (linear time in the number of edges O(m)); which is to say, very very fast.