Article Single


Genetic Programming

Genetic programming is a computer algorithm which designs and optimises programs using a process modelled upon biological evolution. This chapter introduces the family of algorithms to which genetic programming belongs, introduces genetic programming, discusses its behaviour and limitations, and reviews derivative approaches.

  Evolutionary Computation



Figure 6.1: One generation of an evolutionary algorithm. Having evaluated the solutions within the population, selection removes the relatively poor solutions (shown black) and breeding generates new solutions (white) from the surviving, relatively good, solutions (grey).


Evolutionary computation (EC) is an approach to search, optimisation and design which uses a family of approaches called evolutionary algorithms (EAs) that use evolutionary processes to evolve any kind of entity which may be represented on a computer. Most evolutionary algorithms model the biological process of evolution. Using terms from Chapter 1, this kind of process is group-based and involves both competitive and co-operative mechanisms during the evolution of the group: and in this respect is related to other group-based search algorithms such as particle swarm, memetic, tabu, and ant algorithms.

Central to an evolutionary algorithm is the data structure which holds evolving entities: the population. At the start of an evolutionary run, this structure is initialised to contain a group of candidate solutions to some problem. Typically, these solutions are randomly generated. Consequently, most will be far from optimal, but nevertheless all are evaluated using some fitness function which determines their relative ability to solve the problem and assigns each candidate solution a respective fitness value. Following evaluation, the evolutionary algorithm applies a selection mechanism which removes the least fit candidate solutions from the population. After this, a breeding process fills the vacated population slots with new candidate solutions which it derives from existing solutions. At this point the population consists only of relatively fit solutions from the initial population and their derivatives. Accordingly, it is expected that the average fitness of the population will have increased. The process of evaluation, selection, and breeding is repeated over and over again - with hopefully a higher average fitness after each iteration - until either the population contains an optimal solution, or the entire population converges upon sub-optimal solutions. Figure 6.1 illustrates this process.

Different classes of evolutionary algorithm differ in the way in which they represent solutions and the way in which they derive new solutions from existing solutions. Individual algorithms also differ markedly in the size of populations which they use, the way in which they carry out selection, and in the proportion of the population which they replace during each selection-breeding cycle. It is common to classify particular evolutionary algorithms as being either genetic algorithms [Holland, 1975], evolution strategies [Rechenberg, 1973], evolutionary programming [Fogel et al., 1966], or genetic programming [Koza, 1992]: although this form of classification arguably introduces artificial barriers to common issues. Whilst this thesis is concerned primarily with genetic programming, the following chapters also make significant mention of genetic algorithms; and for this reason, they are briefly reviewed in the following section.

 Genetic Algorithms

The genetic algorithm (GA) is an evolutionary algorithm which mimics both the representation and variation mechanisms of biological evolution. A GA candidate solution is represented using a linear string in analogy to a biological chromosome. Accordingly, GA practitioners use biological terms to describe the components of candidate solutions: whereby each position within a GA chromosome is known as a gene locus (or just a gene) and its value is an allele. Historically, GA chromosomes are fixed length binary strings (although some GAs use variable length strings and higher-cardinality alphabets - including real numbers). New solutions are constructed from existing solutions using operators whose effects resemble the action of mutation and recombination upon biological chromosomes. Mutation operators randomly change the allele of a particular gene locus whereas crossover recombines groups of contiguous genes from two or more existing chromosomes. The mechanics of individual GAs vary considerably. However, most GAs are generational: meaning that the entire population is replaced during each iteration; and most have a selection mechanism that allocates each solution's contribution to the next generation roughly in proportion to its relative fitness in the current generation. Consequently, the best solutions have their genes well represented in the next generation whilst the worst solutions tend to give no contribution to the next generation.

Crossover is traditionally considered the dominant search operator in GAs, and mutation is considered a background operator which replenishes genetic diversity lost during selection. Crossover proceeds as follows: two chromosomes are aligned, a number of crossover points are selected randomly along the chromosome lengths, and groups of genes between every other pair of crossover point are swapped between the two chromosomes. Genetic linkage is said to occur between alleles at different gene loci that remain together following recombination. Given the way that crossover works, genetic linkage is most likely to occur between alleles at gene loci which have proximal positions within the chromosome. This means that GAs must function as a result of propagating small groups of alleles found close together within chromosomes: groups that are known as building blocks within the GA literature [Goldberg, 1989]. However, this is only useful if chromosomal building blocks correspond to building blocks found within the problem domain. If this is not the case, or particularly if the problem's building blocks are represented by groups of highly spaced alleles, performance is likely to be impaired. Such problems can sometimes be solved by re-arranging gene loci within the chromosome. GAs which do this automatically are known as linkage learning GAs and are discussed in section 7.5.

 Conventional Genetic Programming

Genetic programming (GP) is a generic term used to mean an evolutionary computation system which is used to evolve programs. Early forms of GP can be traced back to Friedberg [1958] and Cramer [1985]. The first GP system to bear the name `Genetic Programming' was devised by Koza15 [Koza, 1992]; and this, by virtue of being the first successfully applied and widely recognised form of GP, forms the basis of conventional GP systems.

Koza's genetic programming represents programs by their parse trees. A parse tree is a tree-structure which captures the executional ordering of the functional components within a program: such that a program output appears at the root node; functions are internal tree nodes; a function's arguments are given by its child nodes; and terminal arguments are found at leaf nodes. A parse tree is a particularly natural structure for representing programs in LISP; the language Koza first used for genetic programming. This is one reason why the parse tree was chosen as a representation for genetic programming. A problem, in GP, is specified by a fitness function, a function set, and a terminal set. The function and terminal sets determine from which components a program may be constructed; and the fitness function measures how close a particular program's outputs are to the problem's required outputs. The initial population is filled with programs constructed randomly from components in the function and terminal sets16.


Figure 6.2: Sub-tree crossover creating new programs from existing programs. Sub-trees are selected randomly from two existing parse trees and are then swapped to produce child parse trees.

Conventional GP derives new programs from existing programs using three different methods. Point mutation randomly changes the functions or terminals found at a proportion of the nodes within a parse tree. The number of nodes to be mutated are determined by a probabilistic parameter called the mutation rate. Sub-tree mutation, by comparison, randomly changes entire sub-trees; building new sub-trees out of randomly chosen functions and terminals. Sub-tree crossover, which is traditionally considered the most important GP variation operator, constructs new program parse trees by swapping randomly chosen sub-trees between existing parse trees: an example of which is depicted in figure 6.2.

Genetic programming has been applied within a huge variety of problem domains. A number of these are used primarily for comparing the performance of particular GP approaches: algebraic symbolic regression, for example. In other domains, GP has been used to re-discover solutions which had previously been invented by humans; as well as discovering new solutions of comparable performance (see, for example, [Miller et al., 2000]). In yet other domains, GP has evolved solutions to problems which had not previously been solved by humans. This includes certain quantum computing algorithms and problems in biological understanding (e.g. [Koza, 2001]).

 Problems with Recombination

Despite its successes, GP is known to have behavioural problems which limit its applicability and performance. Most significant of these is the manner in which parse trees are affected by sub-tree crossover. First, it has been argued that sub-tree crossover does not perform meaningful recombination and does not therefore aid search performance. This argument is reviewed in this section. Second, sub-tree crossover is thought instrumental in causing program size bloat; an issue which is discussed in the following section.

In [Koza, 1992] it is argued that sub-tree crossover is the dominant operator within genetic programming: responsible for exploiting existing genetic material in the search for fitter solutions. However, experimental evidence [Angeline, 1997,Luke and Spector, 1997,Luke and Spector, 1998] suggests otherwise. In [Angeline, 1997], the author compares the performance of sub-tree crossover against an operator - headless chicken crossover - which resembles crossover, but which actually carries out a behaviour more akin to sub-tree mutation. This operator works by performing a sub-tree exchange between an existing parse tree and a randomly generated parse tree of a similar size. Across three problem domains, the difference between the performance of sub-tree crossover alone and headless chicken crossover alone is statistically insignificant: suggesting that the behaviour of sub-tree crossover is no better than that of macro-mutation. However (despite what the author suggests) the behaviour of sub-tree crossover is not equivalent to headless chicken crossover; given that sub-tree crossover is only able to use genetic material which already exists within the population. It could be argued that sub-tree crossover performed badly in these experiments because of the lack of mutation, upon which it would ordinarily rely to introduce new genetic material and maintain diversity within the population.

However, more recent work by Luke and Spector [1997],Luke and Spector [1998] also suggests that sub-tree crossover performs little better than macro-mutation. Luke and Spector compare runs with 90% sub-tree crossover and 10% sub-tree mutation with runs of 10% sub-tree crossover and 90% sub-tree mutation across a range of problem domains and parameter values. Their results indicate that whilst crossover does perform slightly better than sub-tree mutation overall, the benefit is slight and in most cases the difference in performance is statistically insignificant. The authors also note that for certain problems (including symbolic regression), crossover performs better for large populations and few generations whilst sub-tree mutation performs better for small populations and large numbers of generations. However, this does not hold for all problems.

This poor performance has perhaps less to do with crossover than it has to do with the parse tree representation. For GAs, it is fairly well accepted that crossover improves performance for problem domains with reasonably low levels of epistasis and with a reasonable arrangement of genes within a chromosome i.e. in situations where building blocks can be readily processed by crossover. After all, recombination is logically a useful operation - it enables co-operative evolution by allowing information exchange within a population. Furthermore, it appears to play a primary role within eukaryotic evolution.


Figure 6.3: Loss of context following sub-tree crossover.

Sub-tree crossover is a natural recombination operator for parse trees. However, by exchanging randomly chosen sub-trees between programs, it does not carry out a logically meaningful operation. This is illustrated in figure 6.3. In this example, a random sub-tree is selected in an existing parse tree and is replaced by a sub-tree which is randomly selected from another parse tree. These parent parse trees have significant shared behaviours: both apply an AND function to the outputs of OR and XOR functions, and both have a left-hand branch which calculates the OR of a number of input terminals; so in principal it would seem that they have compatible information to share. However, because the exchanged sub-trees are selected from different positions and have different size, shape and functionality, the behaviour of the resulting child solution bears little resemblance to either parent. The behaviour of each of these programs is determined by the output of the AND function at the top of the program. The output of a function depends upon both the function it applies and upon its input context: the inputs to which it applies the function. Consequently, if its input context changes considerably, then so does its output. Given that most possible programs will generate very poor solutions to a problem, and that the parent programs are presumably relatively good at solving the problem, a considerable change in output behaviour is likely to result in a program with low fitness. In the example, one of the AND function's inputs changes from being the OR of three inputs to the AND of a single input with itself: leading to a significant change in output behaviour.

Since it is unlikely that sub-tree crossover will exchange sub-trees with similar position, size, shape, or behaviour, most GP sub-tree crossovers will lead to child programs which are considerably less fit than their parents. In essence, this happens because a component's context is recorded in terms of its position within a parse tree; and because sub-tree crossover does not preserve the positions of components. Later in this thesis it will be argued that this failing is as much the fault of the program's representation as it is the fault of the sub-tree crossover operator.

  Solution Size Evolution and Bloat

Parse trees are variable length structures and, in the absence of any mechanisms of constraint, may change size freely during the course of evolution. Unfortunately, in GP there is a tendency for parse trees to grow, rather than shrink, over time. This leads to a phenomenon called bloat whereby programs become larger and larger without significant improvement in function. In standard GP, program bloat has nearly quadratic complexity [Langdon, 2000a]. This causes a number of problems. First, it places a strain on both storage and executional resources. Second, it leads to programs which are large, complex, inefficient and hard to interpret. Third, bloat usually takes the form of an increase in non-functional code; and given that this code comprises most of the program, it becomes the target of most crossover and mutation events: protecting the functional code from change and ultimately reducing the effectiveness of evolution. The role of non-functional code in GP is discussed in more detail in section 7.4.1. Although the exact cause of bloat is not known, a number of mechanisms have been identified which appear to contribute to the effect. These are replication accuracy, removal bias, search space bias, and operator bias; and are discussed below.

The replication accuracy theory of bloat [Blickle and Thiele, 1994,McPhee and Miller, 1995,Nordin and Banzhaf, 1995] follows from the protective role of non-functional code outlined above. Large programs containing a lot of non-functional code are more likely to survive crossover or mutation with no change to their function than smaller programs or programs with less non-functional code. Therefore, these programs have a higher replication accuracy. Given that their behaviour is less likely to change as a result of crossover; and given that behavioural changes resulting from crossover tend to be negative; these programs are more likely to be the parents of viable children; and hence will become better represented in the next generation than smaller programs and programs with less non-functional code. Consequently, there is selective pressure for programs to become larger. Furthermore, child programs derived from programs with high replication accuracy are themselves likely to be good replicators (they have high effective fitness, to use Banzhaf et al.'s terminology) and will pass this trait on to their descendants.

The removal bias theory of bloat [Soule et al., 1996] also appeals to the disruptive effect of sub-tree crossover. Following crossover, the behaviour of a child program is more likely to be like its parent (and therefore more likely to be viable) if the sub-tree being replaced occurs near the bottom of the parent parse tree. This is because: (a) components lower in a parse tree have less influence upon the program's outputs; and (b) non-functional code tends to occur at the bottom of parse trees. Consequently, the sub-tree which is replaced is likely to have below average size. However, there are no similar constraints upon the size of the sub-tree with which it is replaced; meaning that the existing sub-tree is more likely to be replaced with a larger sub-tree. Accordingly, viable child programs are more likely to be larger than their parents.

The search space bias theory of bloat [Langdon and Poli, 1997] does not appeal to the behaviour of variation operators. To use Langdon and Poli's own words: äbove a certain size threshold there are exponentially more longer programs of the same fitness as the current program than there are of the same length (or shorter)." Accordingly, it is more likely that evolutionary search will explore those programs which are longer than those which are of the same length or shorter. A similar argument is made by Miller [2001], who believes that bloat is caused by evolution exploring a program's neutral variants: most of which will be larger than the existing program. The operator bias theory of bloat suggests that certain variation operators aggravate this effect through innate imbalances which cause them to sample larger programs (see, for example, [McPhee and Poli, 2001]).


Expressiveness is the capacity for a representation to express appropriate programs. Evolvability is the capacity for a representation to evolve appropriate programs. For GP, these two issues are related: it is not possible to evolve a program which cannot be expressed and it is not possible to express a program if it cannot be evolved by the GP system. Clearly, both of these issues are important. For GP to be successful, it must be able to both search the appropriate region of the search space, i.e., express programs in that region; and be able to get to the appropriate region of the search space from those areas covered by an arbitrary starting population, i.e., evolve programs in that region. Furthermore (and certainly in the software engineering community), expressiveness is seen as a property which encourages evolvability; since if a program can be expressed, it can also be evolved by making a certain number of changes to an existing program. Nevertheless, expressiveness in itself is not sufficient to produce evolvability; since although it may make it possible to evolve a certain program, it does not necessarily make it easy to evolve the program.

Despite these relations, in GP research expressiveness and evolvability are often treated as different, sometimes conflicting, design goals. Improving expressiveness can increase the scope of GP by making it possible to evolve a greater range of programs or particular types of programs. However, in doing so it changes the search space of GP; and this can make it harder to evolve a program to solve a particular problem. Indeed, for a particular kind of problem, there is probably a trade-off between the expressiveness of the representation and its evolvability - even though in practice a knowledge of the problem domain would be required to make this trade-off.

Nonetheless, expressiveness is an important topic in GP and it does share a number of issues in common with the study of evolvability in GP. In recognition of these facts, this section provides a flavour of some of the approaches which have been taken to improve the expressiveness of GP. However, it does not attempt to be an exhaustive reference and, although the issue of evolvability is discussed at length in the next chapter, it does not attempt to artificially separate the concerns of expressive and evolvable representations.

So far, the term representation has been used somewhat generically to refer to the way in which a program is presented to the evolutionary mechanism of the GP system. For the remainder of this chapter, the term is used in a more restricted sense to mean the format or structure of the program rather than the language it is written in. The term representational language (taken from Altenberg [1994]) is used to refer to the language within which a program is expressed. This section is organised into two parts; separating those approaches which are primarily motivated by a desire to change the representational language from those which are primarily motivated by a desire to change the representation of GP.

Genetic programming (GP) is an evolutionary computation approach to automatic programming: using a computational model of evolution to design and optimise computer programs and other executable structures. GP is mechanistically similar to the genetic algorithm and other population-based search algorithms. Whilst GP has been successfully applied within a wide range of problem domains, the method has certain behavioural problems which limit its performance and scalability. Perhaps the most significant of these is the failure of its recombination operator, sub-tree crossover, to carry out meaningful recombination; limiting the capacity for co-operative interactions between members of the population. Sub-tree crossover is also thought to play a substantial role in the un-controlled program size growth phenomenon known as bloat.

In addition to reviewing conventional GP, this chapter has introduced a selection of derivative GP approaches which attempt to improve the behaviour or scope of program evolution by increasing the expressiveness of the representation used to represent programs. The next chapter continues this look at derivative GP approaches by presenting a range of techniques which aim to improve the evolvability of the GP program representation.