PARALLEL!

It's a pun. Get over it. Anyways, now it's time to talk a little parallel GA theory. There's a lot of variations, but essentially two different ways to parallelize the genetic algorithm.

Single GA Parallelism

GA is a great algorithm for parallelization. During the evaluation step, where each member of the population is given a value, the same function is being called on each member of the population, one at a time. This is referred to as embarrassingly parallel, in that parallelizing the process takes no separation of information or sophisticated communication between function calls. All that is required is that each function call be given its own process in which to operate. Here is some sample c-like pseudo code:

void evaluatePopulation(pop)
{
    for( each member of the population i )
    {
        create pipe[i]
        open(pipe[i]);
    }

    for( each member of the population j )
    {
        fork();
        if(this is the child process)
        {
            break;
        }
    }

    if(I am a child process)
    {
        TheReturn = pop[ME].evaluate();
        send(pipe[ME], TheReturn);
        exit(0);
    }

    else//I am the parent

    {

        for( each member of the population j )

        {

            pop[j].value = recv(pipe[j]);
            close(pipe[j]);
        }

    }

}
Multiple GA Parallelism

Another common use for parallelization is simply to scale one's application. GA is an interesting case of this, because the bulk of the work is done by "random" variation. Thus, the more "random" you have, the better chance you have of finding your optimal solution. The plan then is to create multiple instances of your GA and have each run in a separate process. Here is some sample c-like pseudo code:

void main()
{
    GeneticAlgorithm GA;
    GA.init();

    for( 10 times [i] )
    {
        create pipe[i]
        open(pipe[i]);
    }

    for( 10 times )
    {
        fork();
        if(this is a child process)
        {
            break;
        }
    }

    if( I am a child process )
    {

        for(int i = 0; i < 100; i++)
        {
            GA.step() // runs one generation of GA
        }

        send(pipe[ME], GA.pop.BestMember());
    }
    else
    {
        for( 10 times [i] )
        {
            cout << pipe[i] << " = " << pipe[i].value << endl;
            close(pipe[i]);
        }
    }
}

To improve upon this code, you could allow each process to share their best member(s) incrementally. Here is some sample c-like pseudo code:

void main()
{
    GeneticAlgorithm GA;
    GA.init();

    for( 10 times [i] )
    {
        create pipe[i]
        open(pipe[i]);
    }

    for( 10 times )
    {
        fork();
        if(this is a child process)
        {
            break;
        }
    }

    if( I am a child process )
    {
        for( 10 times )
        {
            for(int i = 0; i < 10; i++)
            {
                GA.step() // runs one generation of GA
            }

            send(pipe[ME], GA.pop.BestMember);
            GA.pop.WorstMember = recv(pipe[(ME+1)%10]);
        }
        send(pipe[me], GA.pop.BestMember);

    }
    else
    {
        for( 10 times [i] )
        {
            cout << pipe[i] << " = " << pipe[i].value << endl;
            close(pipe[i]);
        }
    }
}

The Implementation Differences

Single GA parallelization is easy to implement in the way shown in that you do not need to pass any genetic information, just the calculated values of each member of the population. It is also very easy to distribute across a Mosix type cluster, given one small drawback: the evaluation function needs to be very processor intensive in order to be distributed by the Mosix kernel. A fork()ed process must be in existence for about 3-5 seconds to give the kernel time to identify and migrate it. So, for little tiny calculations (like the airline example from my previous post, or from TSP if you're more familiar with these types of problems), your processes will not migrate and you're wasting your time.

Implementation of this style of parallelization in a Beowulf cluster requires you to specify a node for a member to be calculated on, send all genetic material, and then await a response. This also requires writing a separate worker application on all other nodes, to accept the incoming genetic material and process it, as was mentioned in the post "Clustering, A Continuation…".

If you're already using a Beowulf cluster, Multiple GA parallelization is the easy way to go. A distribution like LAM/MPI will create identical copies of your GA on each slave node, so you don't need to fork() off any processes. For communication between slave nodes, it depends on whether or not you use blocking I/O.

For blocking I/O, you'll need to use your root node as a server to send material from slave node to adjacent nodes, in order to avoid nodes idling while waiting for adjacent nodes to get to an accept state.

For non-blocking I/O, you can use your root node to set up something called a gene-pool, where each process intermittently drops a little genetic material into the pool, then checks to see if there's any new material in the pool to grab. If not, it can continue on without waiting. This strategy can also be implemented with blocking I/O, but it's a little bit more convoluted.

That just about covers the basics for GA parallelism. Next time, I'll post my research from last semester, and then perhaps get into my new clustering scheme. Until next time…

So the Genetic Algorithm (GA, since I'm a lazy immediately-post-college student), is an optimization tool. It's capable of solving all kinds of really difficult problems. The GA is best used for problems where the best answer can't be found by traditional means in a reasonable amount of time. Thus, when using the Genetic Algorithm, it's best to be looking for a good-enough solution.

"Solutions," and why that's in bunny ears:

Imagine you're trying to get from LA to NY. There's a lot of ways you could get there, or "solutions" to your problem. Different airlines, different connections, first class vs. coach; the list goes on and on. If you ask someone how to get from LA to NY, you'll get a myriad of answers, so it's not a problem where you can't find a solution, it's a matter of whether or not that solution is very good. It's not a good plan to book a first class non-stop flight when you only have $45 to spend, and if you're on a deadline, you'll need to be there without a three-day layover in Cleveland. Especially if you're allergic to Cleveland. I heard about a guy once….

Anyways, there's a lot of different variables that go into choosing the best way for you to get to NY. Now imagine this as a graph theory problem, LA and NY are at different ends of the graph, and all the different cities you could stop in on your way there are points on the graph inbetween the start and destination. The edges in the graphs represent where you can go (probably completely interconnected; every point connected to every other point). Now imagine on each of these edges, there's a weight or cost (associate with airline ticket costs, for our example problem). Since the graph is fully connected, there's a lot of ways to get from LA to NY, but which is best (AKA cheapest)?

Graph Theory and Why It Sucks So Bad

Graph problems, like the one described above, are notorious for being a royal pain. If you were to do the math a little, you would notice that the number of paths increases at an alarming rate when you increase the number of points in the graph. These problems typically fall into a category of problems called NP-Complete. The exact definition of an NP-Complete problem is several pages long, but it goes something like this:

A problem is NP-Complete iff there does not exist a polynomial time algorithm to find the optimal solution (least cost path from above).

A lot of optimization problems (problems with more than one feasible solution, and possibly more than one optimal solution), also fall into this class, as well as a similar class called NP-hard, which basically means it's very similar to some NP-complete problem, but has yet to be proven NP-complete by the full definition. Typical proofs to prove an NP-hard problem to be NP-Complete involve reducing an NP-hard problem to a form of an already proven NP-Complete problem.

On to GA, already…

So, GA is a handy-dandy little tool for solving lots of problems like these. First, you randomly generate a bunch of "solutions" (which for the sake of my math profs, I'll now call feasible solutions), and call them your population. Then you'll rank each of these members of your population.

Ranking, Elitism at its Most Useful
The ranking system can be very simple, or very complicated. For our example above, the simple version would be to add the costs on the edges in our path from LA to NY. This is known as a single-objective GA. A more complicated version might include shortest travel time, in which case the GA is attempting to minimize ticket price as well as travel time. This is known as a multiple-objective GA (MOGA).

Now that we've ranked our population, we need to use this information to our advantage, much like the young Hitlers we all want to be. Now that we've assigned each member a value and put them in their proper place, it's time to start the breeding.

Mating, as dirty as your nerdy mind wants it to be

There's a lot of algorithms for mating, but they all essentially do the same thing: take two members of your population (parents), and make two more (children). As in biological sexual mating, the children will share the traits of both parents.

Depending on the type of problem, you can make your mating algorithms simpler or more complex, but there are some industry standards: simple crossover, which simply takes a random combination of genes from each of the parents; and blended crossover, which takes the two sets of genes from each parent and blends them with a random weight (which works well cause sometimes kids look more like their dad than their mom, which can really unfortunate for little Sally with the hairy back).

Matching, dating for bits

The key part to dating for a bit is to put your best foot forward. Unfortunately, since its been a long time since ram manufacturers have included the "foot" option (though SanDisk is getting back to it with their new MP3 players), that's not always as simple as it sounds. So, we take more simplistic, and unfortunate methods. Each of these algorithms has its benefits and drawbacks (both real and theoretical), but I won't address them here. If you read between the lines, you can probably determine my preferences. I'm not a subtle writer.

Random Pairing
I really don't feel the need to go too indepth into this. It's random. Deal with it.

Best-First
Again, kinda simplistic. Take the two best and mate them, the third and fourth, the fifth and sixth, etc. Every generation, you end up with a new set super-jocks and down at the bottom your unfortunate group that has epilepsy with a side of polio.

Tournament Pairing
Create a tournament bracket NCAA March Madness style. Say that each member has a certain percent chance of winning each round in its bracket, based on its rank compared to its opponent. Each member is assigned a random number between 0 and their percentage. Whichever of the members has the larger number moves on to the next round. The better valued members will always have a better chance of winning, but statistically, the lower members still have a shot at being that cinderalla team.

Plutonium in the Water
Now I don't actually remember if plutonium is one of those horribly feared elements that'll cause you to grow a third limb if you look at it crosseyed, but everyone knows what I'm talking about.

Everyone in their family has that red-headed kid that nobody knows where his red-hair came from and suspects the milk man cause he's always been a little off and never looks mom in the eye, but maybe mom's a standup lady and that's just a random mutation. It can happen. Really.

After you've finished mating, the idea is to go through and randomly mutate a few genes, just to keep things interesting. There's actually a lot of reasons for mutation which I will address at a later date, but this will do for now.

Lather, rinse, repeat.

Now that you've done all this, it is highly recommended that you do it again. And again. And again. Thousands of times, actually. Each of these iterations is called a generation, and the more generations (typically), the better your answer will get. This isn't entirely true, but again, this discussion goes beyond the scope of a first time reader.

Now that everyone has a good idea of what GA is all about, I can continue my fireside chats about my research. Until next time…

So, I left off last time with a brief explanation of what I want to do (clustering-wise), and then continued with why that won't work like I want.

So now, some more concepts.

I'm now limited by my limited knowledge of systems administration to a lam/mpi cluster. Not a shabby thing; MPI is a very powerful tool, and is much easier to configure than a mosix cluster.

The plan then? Worker apps!

Worker Applications

So, you've got this root node and these worker nodes. All process are created at runtime and stay in existence throughout the run of whatever it is you're doing. With GA, to mimic the work I did in mosix clusters, I would usually want to create a thread to compute each chromosome. Unfortunately, this isn't so hot for beowulf, cause that means creating popSize*numGens threads on startup. So instead, I creat a worker application, and run an instance of it on each worker node. This node will sit in an endless loop, like a server application, waiting for chromosomes to be passed to it, and it will then pass back the cost or value or fitness, whatever you like to call it, of the chromosome it evaluated, signalling that it can now accept a new chromosome.

The Problem

With mosix-style clustering, forking a process automatically creates a new process with all the chromosome information it needs, and then it needs to pass back a single variable (usually a float). With MPI style clustering, you have to pass all of the genetic material to the worker application using mpi_send() commands, much like with sockets, which means you have to specify the size of the data being transmitted, which in the case of large dynamic arrays for chromosomes means variable sized data transmissions are being sent and received, which makes your sending (and worker application) problem specific. This means that you're not writing library code (which means that it can be used for any old type of genetic algorithm problem), but instead writing problem specific code that you would have to one-off to make work for another problem.

So, I don't really like this plan much at all. I've also noticed that I've started talking alot about the genetic algorithm, so my next post will be a breif runthrough of that.

It's now the 13th of June, and alas, I still don't have internet access at home. I could spend hours judging the people at Comcast, but instead, I'll simply state that I hate them and wish horibbly itchy diseases on their children.

That aside, I'm still making progress. My main hangup has been the lack of similarity in the machines I'm using for my cluster. The problem then, is that one machine will finish its work faster than the other, and sit idle waiting for the other machines to catch up, so they can pass information between processes at the same stage in development. Solutions?

There's a couple. I guess I should start with a brief overview of clustering.

Beowulf Clustering:

Beowulf clustering is simple. You create a process on a machine, and it does lots of work, while communicating that work back to a root machine.

The Problem

My plan is to give each machine (minus the root) large amounts of data to process in the same way. With machines with different processor speeds, memory sizes, and bus speeds, the same amount of work can take very different amounts of time. Since the machines will work in cycles, all machines must finish one cycle before the next can start. Thus, the slowest machine will hold back the group (something one is usually trying to avoid in parallel processing).

Mosix Clustering:

A brilliant plan, though poorly implemented; Mosix clustering is what every programmer wants when parallel programming. Using the standard unix system calls, a programmer can fork() a process in two, and the two processes will migrate to the machine that the OS determines has the least load or can accomplish the task in the least amount of time.

How this solves my problem:

When a machine finishes its assigned work and is sitting idle, the operating system would then automatically send it work, decreasing the work left to be done on other nodes, and decreasing the amount of time necessary to finish all distributed work.

Problems with Mosix:
Mosix has yet to be ported to the 2.6 linux kernel. There's a beta version out there, but as far as I can tell, its not gonna boot. This means a trip down memory lane to Fedora 1, a stable, but obviously older, OS.

Obviously, while Mosix clustering is the ideal solution to my problem, it's not very feasible.

Next time, more infeasible solutions, and possible a feasible one.

I'll bet you're wondering what it is that might come here…. Who uses crappy computers to solve big problems?

Well, the answer is, I do.

I just graduated with a BS in CS from GU, and I've been doing research for the last year in AI and Parallel Processing. For the AI, I've been studying GA and Neural Nets and am working up a new way to look at combining the two (my "Big Problem"). I realized this was to be hugely computationially intensive from the get-go, so I got interested in parallel processing, and involved myself in a clustering project on campus. We had 20-ish 800mhz machines and roughly the same number of 333mhz machines (my "Crappy Computers") to work with, and did some really interesting work.

The point of this blog is to review some of that research, discuss some of my thought processes in doing what I did, and to continue documenting my research. I've never been one for academic writing, yet I feel like my research deserves some credit, so why not blog, eh? This process is going to start out a little slow, as I a: don't have internet at home yet, and b: am in the process of building my own cluster since the university didn't see fit to let a lowly undergrad snag a few of the machines he meticulously worked on all semester post-graduation, but I digress… The point is, be patient, and I will show you great things.

I should let you know now, I have a habit of coming up with metaphors, and more than likely this blog will be full of them. Perhaps they'll be entertaining, but let's not get ahead of ourselves.

Until next time…