CS290I Lecture notes -- Ramsey Numbers: Top Hits Played at Parties Thrown by Paul Erdos

Rich Wolski
Lecture notes: http://www.cs.ucsb.edu/~rich/class/cs290I-grid/Ramsey/index.html
The Ramsey Number Fan Club: http://mathworld.wolfram.com/RamseyNumber.html
The EveryWare paper describing how we've gone about this in the past.

You may not think about it much, but like most demanding disciplines, Math and Science has its share of heros. During the 20th century, Paul Erdos (pronounced "err-dosh") was clearly one of them. He was an extremely prolific researcher. So prolific, in fact, that they invented a metric which describes the publishing relationship that the math community as a whole has with him: The Erdos Number. If you have published a paper with Paul, your Erdos number is 1. If you have published a paper with someone who as published a paper with Paul, your Erdos number is 2, and so on. Paul also had no home. Everything he owned, he carried in a single suitcase. He would arrive at the home of some mathematician he admired, and simply move in for two or three months. Then he'd leave and move on. It was considered a tremendous badge of recognition to be visited by Erdos and, by all accounts, he was a terrible house guest. Toward the end of his life, he spent a great deal of time staying with Ron Graham while he was in North America.

Paul lived for mathematics.

While he was a genius and had a tremendous capacity for collaborating (he'd literally flit from problem to problem "helping" other mathematicians with their work), he had a knack for posing extremely simple problems that defy mathematical solution. The purpose of this lecture is to introduce you to one of them called The Party Problem.

We'll begin by illustrating The Party Problem with an example. Imagine that you are throwing a party.

What is the smallest number of people you can invite such that either there must be a group of three people at the party, all of whom know each other, or there must be a group of three people at the party, all of whom are complete strangers?

You have to get your head around what the question is actually asking, before we can go further. Say you invite 15 people and you can invite anyone form the world's population. Surely you can invite 15 strangers or 15 mutual friends since you are free to pick anyone. The question is,

Is it possible to invite a set of people so that there is NOT a group of three complete strangers and there is NOT a group of three mutual friends?

If you can make the invitations such that both of these NOTs are true, then you know that the smallest number referred to in the question is at least 15.

More formally, the question can be stated thus:

Find the minimum number of guests that must be invited so that at least m will know each other or at least n will not know each other. The solutions are known as Ramsey numbers. . This definition is due to Wolfram Research. On this class, we will study only symmetric Ramsey numbers: where m == n and in the example, both are equal to 3.

You can and should bend your brain around the English language statements of the problem, which sound to me like those brain teaser puzzles in Reader's Digest, or you can resort of graph theory for a picture. Let's define a node to be a person, and an edge to represent the property of acquaintance between two people represented by nodes at the ends. Notice that, for any group of people (nodes) there are edges connecting them pairwise since two people either know each other or do not know each other. That is, the graph is fully connected. We'll further stipulate that the relationship is symmetric: if I know you, you know me. The graph is, therefore, undirected. Another term for such a graph is clique.

Further, we can color the edges so that they signify the binary acquaintance relationship. We'll let red indicate that the people (nodes) at the ends of an edge are strangers, and green indicate that they are acquainted. The party question for these five nodes can be restated as

Is it possible color the edges of this 5-clique such that there is no subclique of size 3 in which all of the edges in the subclique are the same color?

If the answer to this question is "yes" then you know that the 3rd symmetric Ramsey number R(3,3) is bigger than 5. Why? Because if such a coloring exists, then it is conceivable that there are 5 people who could be invited, there would be no group of 3 complete strangers (which would be signified by a red subclique of size 3) and no group of 3 mutual friends (which would be signified by a green subclique of size 3). Therefore, the smallest number for which one or the other must be true must be bigger than 5.

On the LHS of this figure, a coloring like the one described above is depicted. Notice that it does not contain a triangle (also called a 3-clique) that is all one color. As such, R(3,3) is not 5.

Now let's ask the same question about 6. Since there are six nodes and two colors, at least three edges originating at any node must be the same color (red or green). Pick a node (1 in the example) and a color for those edges (say green). Consider the colors of the dotted edges in the figure. If any one of them is also green, then they form a green triangle with the three green edges (solid lines we said were green). If none of them can be green, then they all must be red, forming a red triangle. It is, therefore, impossible to color the edges of a complete graph on six nodes without introducing a monochromatic 3-clique. Since it is possible on 5, and not possible on 5, R(3,3) is 6.

What this means is that if you throw a party and you can invite anyone, from any time, who has ever lived, on any planet, there must be a group of three complete strangers or there must be a group of three mutual acquaintances or both. Neat, huh?

It Gets Hard Fast

While the problem statement is relatively simple and the proof of R(3,3)=6 straightforward, it is a problem in graduate-level combinatorics to prove that R(4,4) is 18. Anecdotally, two very capable Ph.D.-level mathematician friends of mine decided to work out the proof for R(4,4) as a twisted form of entertainment. It took about 6 hours without the help of a combinatorics book on the subject.

R(5,5) is currently unknown as is R(k,k) for any integer value of k greater than 4. In Math terms, this situation is called an "open problem." You should think of it as a really open problem. When Math knows the answer for 3 (it is easy), 4 (it is much harder) and does not know 5 or greater (but knows that the number must exist -- unlike Fermat's Last Theorem) you know you have just stepped into the deep end of the pool. Indeed in the book "The Man Who Loved Only Numbers," (a book about Paul Erdos) many of the mathematicians interviewed said that they believed a new kind of mathematics (as yet unseen) was necessary to attack this problem successfully. One even claimed that it would be 50 years at least, before it would come to fruition.

The Current State of Things

The current Ramsey score for various colors and subclique sizes is given on this lovely page by Eric Weisstein and Wolfram Research. There are a couple of generalizations to the symmetric problem that people have worked on as well. The first is to consider asymmetric colorings using only two colors. The number R(m,n) refers to having a subclique of size m of one color and/or one of size n of another. The other generalization is to add colors. By far the coolest Ramsey numbers, though, are the symmetric ones.

At present, R(5,5) is known to be between 43 and 49. I do not know how those bounds were determined, but knowing it to be one of 7 numbers and not being able to say which one is a remarkable situation in mathematics. In 1997, two friends of mine and I generated counter examples for all sizes up through 42 for R(5,5) as a diversion at SC97. Some day, we'll get serious about it. Some day soon.

I was born a coal miner's daughter

So what does this have to do with you and this class? For the purposes of illustration, we will consider counter-examples for R(5,5) of size 42 to be rare objects that must be mined from the combinatorical earth. Your job will be to use the cloud infrastructures available to you to try and find as many of these objects as you can. Each one you will deposit in a bank that will register you as the owner. The bank will check to make sure no other depositor has already claimed your counter example and, similarly, prevent others from claiming the ones you have found.

Thus the bank will generate a global timestamp for computationally hard proofs-of-work. It isn't Bitcoin since there is a centralized authority (the Bank), but it is logically the same abstraction. It is believed (but it hasn't been proved) that there are 328 such "coins" to be mined. We'll discuss how their uniqueness will be determined, but before we can do that we need to talk a little about what mining might look like.

Please hand me that ore hauler

Logically, each graph can be represented by the upper or lower triangle in a square adjacency matrix where each element of the matrix is a binary value indicating the edges color. For example, in my mining operation


0 0 1 1 1 0 1 1 0 0
0 0 0 1 0 1 1 1 0 1
0 0 0 0 1 1 0 1 1 0
0 0 0 0 1 0 1 0 1 1
0 0 0 0 0 0 1 1 1 1
0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1 1 0
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0

represents a two-color graph on 10 nodes. A "0" is a red edge and a "1" is a green edge. Notice that only elements on one side of the diagonal matter. That is because [2,6] and [6,2] mean the same thing. That is again, an edge from [2,6] is the same edge that goes from [6,2] because the graph is undirected. Also -- the entries along the diagonal do not matter since nodes are not connected to themselves by an edge.

To figure out whether this graph is a counter-example for R(5,5) or not you simply need to run a set of nested loops to count the number of monochromatic subcliques of size 5 that it has. The following routine written in C uses integers to represent node adjacency color.


int CliqueCount(int *g,int gsize)
{
    int i;
    int j;
    int k;
    int l;
    int m;
    int count = 0;
    int sgsize = 5;	/* for R(5,5) */

    for(i=0;i < gsize-sgsize+1; i++)
    {
        for(j=i+1;j < gsize-sgsize+2; j++)
        {
            for(k=j+1;k < gsize-sgsize+3; k++)
            {
                if((g[i*gsize+j] == g[i*gsize+k]) &&
                   (g[i*gsize+j] == g[j*gsize+k]))
                {
                    for(l=k+1;l < gsize-sgsize+4; l++)
                    {
                        if((g[i*gsize+j] == g[i*gsize+l]) &&
                           (g[i*gsize+j] == g[j*gsize+l]) &&
                           (g[i*gsize+j] == g[k*gsize+l]))
                        {
                            for(m=l+1;m < gsize-sgsize+5; m++)
                            {
                                if((g[i*gsize+j] == g[i*gsize+m]) &&
                                   (g[i*gsize+j] == g[j*gsize+m]) &&
                                   (g[i*gsize+j] == g[k*gsize+m]) &&
                                   (g[i*gsize+j] == g[l*gsize+m]))
                                {
                                     count++;
				}
			    }
			}
		     }
		}
	     }
	}
     }

     return(count);
}

This routine takes 2 arguments:

a pointer to an integer array containing adjacency colors as 1s and 0s in its upper triangle
the dimention of the array (it is assumed to be square)

It returns the number of monochromatic subcliques of size 5. If this number is 0 then the graph is a counter example for R(5,5).

Your job will be to find graphs where the dimension is 42 and the counts is 0. Only counter examples of size 42 will be considered valuable in this class.

For example, the graph on 10 nodes shown above is a counter example on R(5,5). That is, the number of single-color subcliques of size 5 in this graph is 0. However, because it is size 10 and not size 42 it isn't a "coin" from the perspective of this class.

Some helpful hints from Levi Strauss

Ramsey counter examples have some interesting properties. For example, a counter example on n nodes has embedded in it a counter example on n-1 nodes. To see why, imagine that you remove one node and all of the edges incident on it from a counter example on n nodes. Removing edges cannot create a monochromatic subclique so the remaining graph on n-1 nodes must also be a counter example.

However, not all counter examples on n nodes are embedded in a counter example on n+1 nodes. Some are, but some are not.

None the less, one startegy to consider is to try and find a counter example on a small number of nodes to start out with and then to add a node and a set of edges to make a graph one dimension bigger. If the smaller counter example is embedded in a larger one, then only the new edges need be recolored.

Another thing to realize is that the count of the number of monochromatic subcliques is a kind of "fitness function." That is, a graph with a smaller number of monochromatic subsliques is, in some sense, "better" than one with a larger number, the best being 0 subcliques. A search startegy is then

count the number of monochrmomatic subcliques
flip an edge color
did the count go down?
- yes => that was a good move
- no => that was a bad move
keep the best moves and repeat until the count goes to zero

a greedy search won't get very far before no move decreases the count. In that case you need to make an "uphill" move so that you can explore a different part of the space.

However the strategy or strategies you use are up to you. Any algorithm you wish to use is fine as long as it finds counter examples on 42 nodes.