Researching Creation

January 18, 2010

Information Theory / My New Paper on Irreducible Complexity

JB

January 6th was a big day for me.  Huge, actually.  I had finally gotten a paper published (titled "Irreducible Complexity and Relative Irreducible Complexity: Foundations and Applications") that I had been working on for the last 3.5 years.  You might be wondering why it took me so long to come out and announce it.  The reason is simple - most people who have read it misunderstood what I was trying to say.  Therefore, I wanted to take the time to explain the points I am trying to get at in the paper, and a little personal history on how it came about.

In the 2006 BSG meeting, I was a complete unknown.  I knew absolutely no one from before the meeting.  I had come to do a presentation over some interesting overlaps between computer metaprogramming and the way that antibody genes rearrange themselves.

At the meeting was a reporter, who was doing a book on the intersection between fundamentalism and science.  Between meetings the reporter would ask various people questions about their beliefs, and what followed was usually a stimulating conversation.  At one of these conversations, we were talking about evolution, and I (perhaps naively) stated, “it is impossible for new information to be generated by evolution”.  One of the other creationists in the conversation quickly retorted, saying that I was absolutely wrong, and strongly implying that I was foolish for even saying so.  Those of you who are in creation research could probably guess who this was.

I thought this was odd (both the idea that natural selection might offer a way to create information by itself and that I was so roughly thrown under the bus by a fellow creationist).  And pondered it in the back of my mind for awhile.  Did I know for sure that information could not be created?  How did I know this?  The idea that information could not be created by natural selection seemed correct to the engineer in me, but was this really correct?

Another event hit upon the same question.  I had recently purchased a copy of the 1984 Oxford Union debate between Arthur Anderson, A. E. Wilder-Smith, John Maynard-Smith, and Richard Dawkins.  I thought that the creation side (Anderson and Wilder-Smith) was well-argued, save for one detail.  Dawkins (I think) came up with an example of information being created (I forget what it was), and Wilder-Smith (I think) argued that this was not an instance of information being created, but rather of already-existing information being merely “shuffled around”.  Dawkins retorted that since there were only four nucleotides available, all information in the genome arose through “shuffling around” of genetic information.  While my intuition sided with Wilder-Smith, I realized that his argument hinged on a separation between the creation of information and the rearrangement of information.

Again, I intuitively agree with Wilder-Smith’s assessment.  Similar things happen in the rearrangement of antibody gene parts for the creation of novel antibodies.  About 90% of the work comes from shuffling well-defined pieces of information around, and about 10% of the work comes from a series of focused rounds of mutations.  It seemed that most of the information was already existing, and merely being shuffled around.  However, the problem was that there was no objective way of making the assessment between something being created and something being rearranged.

This reminded me of an old friend of mine from my days at Wolfram Research, Chris Knight.  Chris was a computer genius.  I met him when I had just graduated from college and he was just turning 16 -- and he was light years ahead of me in computer programming skills.  One thing that enamored Chris (which enamors a lot of computer science types) is the idea that there is not a clean separation between computer code and computer data.  Computer data can be treated like a code.  And computer code can be represented as data for manipulation.  There are even some languages, such as Scheme and LISP, which elevate such intertwinings of code and data into an art form.

The distinction between “information shuffling” versus “information creation” is similar to the distinction between “code” and “data”.  No person doubts that data can be created without intelligence, but can code?  The idea that code and data can be intertwined so much easily leads a person to conclude that there is no such divide.  But yet, the ability to apply this usefully seems limited to only cases where the code/data is very simple.  But yet, I could not yet see the dividing line between the two.

When I was at Wolfram Research, Stephen Wolfram was just about to finish his magnum opus, A New Kind of Science (hereafter referred to as NKS).  At the time, I was uninterested.  His work in cellular automata did not seem to have any impact on my life and work, so I basically ignored NKS for the time I worked there.  But later, I picked up a copy from the library.  And in those pages, I found the answer to my dilemma.

For the gory details, you can see my paper.  But here’s what I want you to think about.  Imagine a computer program.  There is a difference between these levels of customizability within a program:

  1. The program has no customizations, you just have to go with what the package offers.
  2. The program has a range of customization settings open to the user, and they can set these to a variety of interesting settings appropriate for their business.
  3. The program makes itself available for a programmer to customize it using a general-purpose programming language.

What you see here is not just a rising level of configurability, but also a rising level of intelligence required for making the configurations.  #1 can be configured by an idiot.  #2 can be configured by trial-and-error, and #3 can be configured only by a professional (or perhaps it would be more exact to say that there are aspects of the configuration which can only be used by a professional).

It turns out that there is a type of programming system called a Universal computer.  What makes a Universal computer interesting is that it can, given the right program, compute any computable function.  So it is open-ended.  Here’s the other interesting insight -- Universal computation only arises in chaotic environments.  What makes this so interesting is that a chaotic system, in general, does not give gradual output changes to gradual changes in its programming.  Therefore, to have something “evolve” on a Universal computer it would, by necessity, have to make several leaps to work.  In order to get smooth output changes, which are required by natural selection, one would have to propose a coordinated system of changing the code - something not allowed by naturalistic scenarios, because the changes would have to be coordinated to match the desired gradualistic output.

This provides an answer to my question about information creation versus information shuffling.  If the input domain is open-ended - that is, it is flexible enough to hold the solution to any problem given the right code - then the solution cannot be reached by gradual configurational changes alone, because that is the nature of the way Universal computers behave.  Now, you can design a programming systems where gradual changes to the code lead to gradual changes in the output, and as such would be open to natural selection.  However, these are not Universal computers, and therefore the potential range of results is not open-ended.

Thus, the dichotomy is not necessarily between code and data, but between parameterized programming systems and open-ended programming systems.  If the system is parameterized, then change only happens within the specified parameters.  There may be genuinely new things happening there, but the parameters for their occurrence were specified in advance.  Thus, you can see that the common ID and Creationist claim that “information cannot be created by natural selection” is both true and false.  It is true that open-ended information cannot be created, but if the solution domain is appropriately parameterized, then information can arise within those parameters.

Obviously, this is not a rigorous proof, and if you want a more nuanced version, you should refer to the paper.  But nonetheless, I think that this should give you an idea of the questions that I was attempting to answer and the approach that I took.

There’s a lot more to say about this, but I think this is enough for now.  See the paper for a lot more information, as well as numerous applications.  I especially liked how this related to the evolutionary software Avida in section 3.4.  In any case, this background should help you make sense of what the paper is about and where I am going with it, should you decide to read it.