GANet Implementation

For my clustering solutions, I’ve chosen live cd’s (clusterknoppix to be specific). I purchased about $2500 in old hardware (Pentium 850 Dual Core’s with 1GB of RAM running at 2.66ghz per core), and combining that with my existing hardware and I end up with a cluster with 15 cores running combined at a little over 40ghz and 6GB of ram. It’s not too shabby, if I do say so myself. It is an openMosix based cluster, so the load balancing is awesome. I’ve been using it thus far to do some prime number factoring.

One of the nice things about mosix clustering is its ease of use, while one of the nice things about beowulf clustering is the ease of data sharing and segregation. I’ve discussed previously in GA In Parallel, some more interesting thoughts… that load balancing is a problem with Beowulf clusters, and even with my crazy scheme of Multi-Level Hybrid Clusters (MLHC). The basic idea behind MLHC is to create a beowulf cluster whose slave nodes are actually mosix clusters. It was an invention more of necessity than anything else, but it solved my problem nonetheless.

My plan now is to install LAM/MPI (beowulf software, essentially) onto one of my mosix nodes and then create all of the MPI processes on a single node and allow the mosix kernel to load-balance the cluster. At that point, I’ll have a load-balanced beowulf cluster. My main issue until now has been creating multiple instances of complete populations of my GANet software, because a single member of a population can be up to 100mb. Since some of my nodes have only 256mb of RAM, they can’t even run one instance of GANet with a decent sized population. The solution then is to create multiple small populations that share information, with each population being small enough to run on a single node. Also, I’ve learned that my population members don’t need to be quite as large as I’m making to start, especially considering that I allow them to grow dynamically.

Today’s lesson, beowulf is nice, but mosix load-balances. So take both.

So, these past couple of years, there’s been a few big courses I took to help me acquire the knowledge necessary for doing any kind of significant computer science research, and I can only recommend that all CS students take these:

1) Operating Systems

If you’re going to do any kind of research, chances are your software is going to run for a long time, and is going to be a series of complicated processes, as opposed to your standard “Hello, world!” program.

Things I got:

Parallel processing, inter-process communication, sheduling, file systems.

2) Data Communications

Again, like OS, if you’re going to do research, chances are you’re going to need more than one machine, so it helps to know how to do networking. This was the class that gave me my basic foundation of knowledge to build my cluster.

Things I got:

Network structure, network administration, basic sockets programming, client-server architecture, multi-threaded server design.

3) Artificial Intelligence

There were two courses to our AI program at GU, and I feel like they didn’t hold the same weight. The first studied classical AI, which can be summed up as:

If A, then B. A. Thus, B.

Not particularly stimulating, am I right? There was a bit of game theory, and some state space traversal, but nothing too horribly complicated. And for some reason, none of the state-space stuff we generated worked real well anyways….

Things I got:

Overview of Genetic Algorithms, introduction to neural networks. Overview of past failures of AI.

Now I suppose AI isn’t a course that you really need to be a well rounded CS student, but I enjoyed it.

What I was supposed to be talking about…

My research; It’s complicated, kinda convoluted, and totally time consuming. Good thing I don’t have a life. As I’ve discussed before, GA is a great tool for optimization. As I haven’t discussed before, neural networks are a great tool for recognizing patterns. Neural networks can come in many different structures, and the plan of my research is to use GA to “evolve” the structure of a neural network, based on how well it learns a given training set. I’ve yet to decide what kind of training set I will use, but I’m leaning towards natural language processing.

A neural network can have several layers, and I’ve chosen to represent the links between each layer as a two-dimensional array of booleans (true signifying that a link exists, false that one does not). Since there will be multiple layers, then there will also be more than one of these two-dimensional arrays, thus giving birth to the three-dimensional boolean array that is the bulk of my genome (bool *** adjacencyMatrices).

I would love to use the standard templated 3DArray_Genome from GALib, but alas, I wanted more scalability. The adjacencyMatrices have the ability to “grow” in number of layers (height), and number of nodes in any individual layer (width). In 3DArray_Genome, genomes are of a fixed size as of GA initialization.

I suppose that’s enough of a start for now, so until next time…