Researching Creation

May 10, 2012

Information Theory / Engineering and Theology Unite


For those interested in engineering and theology and philosophy, this conference is for you!  I've got two talks slated for the conference - come and listen!  Lots of fascinating stuff from a number of disciplines:

You can see the abstract list here and the conference flyer here.

June 17, 2010

Information Theory / Sanford Publishes New Bioinformatics Tool


John Sanford, a young-earth creationist biology professor at Cornell, just published a bioinformatics paper describing his new genomics tool, called skittle with a bioinformatics graduate student Josiah Seaman.  You can read the paper here.  The tools allows you to color the genome and experiment with alignments to visualize patterns that are not detectable by other methods. 

You can download the program from Skittle's website on sourceforge., or find more information about the program at

It runs on Mac, Windows, and Linux.

This tool allows us to detect a number of new patterns in the genome.  Not only does it help to find tandem repeats, it also helps to find structured variations in those repeats.

This holistic approach to genome analysis is precisely the sort of research that IDers and creationists are interested in.  The reductionist approaches of the last century were useful for digging deeper, but they often blinded researchers to the larger-scale activities of what was happening.

From the paper:

As we have been able to better visualize tandem repeats using Skittle, we have seen a surprising amount of internal complexity. Some of this complexity seems to be easily understood in terms of point mutations and indels, but a great deal of the complexity appears to provide an intriguing array of "puzzles" which invite further study. These puzzling patterns include co-varying deviations from a repeating theme, and internal patterns that are not simply "repeats within repeats". For lack of a better term we are referring to these patterns as structured variation.

If tandem repeats have any function, the "structured variation"
described above could conceivably carry information. A perfect repeat cannot contain any information beyond the base sequence and copy number. However, a repeat with variation can contain considerably more information. Each of the three types of observable variation (substitutions, indels, and alternating repeats) has a direct analog in electronic information technology (amplitude modulation, phase modulation, and frequency modulation, respectively).

And then later, he mentions something interesting about the alignments:

Interestingly, the self-adjusting cylinder alignment, which was designed to simply optimize local alignment as would be expected in vivo, causes a marked increase in the visual coherence of all complex tandem repeats. This suggests to us that such coherence might reflect a minimal energy state, and may reflect actual structure in vivo, and might even reflect an unknown biological function. Logically, such coils could change circumference in multiples of the repeat length and so might modulate local genomic architecture.

Anyway, I am really excited about this, and hope to dig more into this as I have time.

Thanks to Sal for pointing this out to us!

February 27, 2010

Information Theory / Information Theory, Physics, and Free Will


I just finished reading a paper that is both fantastically interesting, and a little disheartening.  It is disheartening only because I thought that my senior paper for seminary was going to be freshly novel, but it turns out that someone else already made 90% of my arguments 11 years ago, and actually made most of them better than I could.  The paper is "Algorithmic Information Theory, Free Will, and the Turing Test" by Douglas Robertson (Complexity 4(3): 25-34).

Here are some quotes from the paper (note that AIT is "Algorithmic Information Theory"):

" will appears to create new information in precisely the manner that is forbidden to mathematics and to computers by AIT" (26)

"There would be no reason to prosecute a criminal, discipline a child, or applaud a work of genius if free will did not exist.  As Kant put it: "There is no 'ought' without a 'can'" (26)

"A 'free will' whose decisions are determined by a random coin toss is just as illusory as one that may appear to exist in a deterministic universe" (26)

"AIT appears to forbid free will not just in a Newtonian universe, or in a quantum mechanical universe, but in every universe that can be modeled with any mathematical theory whatsoever.  AIT forbids free will to mathematics itself, and to any process that is accurately modeled by mathematics, because AIT shows that formal mathematics lacks the ability to create new information." (26)

"The fundamental secret of inspired mathematical practice lies in knowing what information should be destroyed or discarded, and what rearrangement of available information will prove to be most useful." (30)

"The very phrase "to make a decision" strongly suggests that the information is created on the spot." (31)

"If...we do accept this definition of free will, then an immediate corollary from AIT is that no combination of computer hardware and sofware can exercise free will, because no computer can create information." (31)

"There is perhaps no clearer demonstration of the ability of free will to create new information than the fact that mathematicians are able to devise/invent/discover new axioms for mathematics.  This is the one thing that a computer cannot do.  The new axioms produced by mathematicians contain new information, and they cannot be derived from other axioms.  If they could, they would be theorems rather than axioms." (31)

"it has long been accepted that free will is impossible in a Newtonian deterministic universe.  But now the impossibility is seen to carry over into all possible physical theories, not just Newtonian theories, because it is inherent in mathematics itself.  According to AIT, no physical model (i.e. no mathematical model for a physical process) can allow the creation of information.  In other words, free will is impossible in any physical universe whose behavior can be accurately modeled by a computer simulation." (33)

"All theory is against the freedom of the will; all experience for it" (33 citing Samuel Johnson)

"The idea that all physical processes can be modeled is an assumption that is so deeply ingrained in physics that it is seldom questioned, seldom even noticed." (33)

"It may be that physicists since the time of Newton have been exercising a careful (but generally unconscious) selection proess.  Physicists may have studied only those physical processes that happen to be susceptible of mathematical modeling.  This would immediately explain the reason behind Eugene Wigner's famous remark about the "unreasonable effectiveness of mathematics."  But if it  should turn out that many physical processes are not susceptible to mathematical modeling, just as nearly all numbers cannot be expressed with any mathematical formula, this would represent as deep a shock to physics as Godel's theorem was to mathematics, and one that is far greater than the shock that resulted from the loss of Newtonian determinism when quantum mechanics was developed or the loss of Euclidean geometry when general relativity was discovered." (34)

"The possibility that phenomena exist that cannot be modeled with mathematics may throw an interesting light on Weinberg's famous comment: "The more the universe seems comprehensible, the more it seems pointless."  It might turn out that only that portion of the universe that happens to be comprehensible is also pointless" (34)

"The existence of free will and the associated ability of mathematicians to devise new axioms strongly suggest that the ability of both physics and mathematics to model the physical universe may be more sharply limited than anyone has believed since the time of Newton." (34)


January 18, 2010

Information Theory / My New Paper on Irreducible Complexity


January 6th was a big day for me.  Huge, actually.  I had finally gotten a paper published (titled "Irreducible Complexity and Relative Irreducible Complexity: Foundations and Applications") that I had been working on for the last 3.5 years.  You might be wondering why it took me so long to come out and announce it.  The reason is simple - most people who have read it misunderstood what I was trying to say.  Therefore, I wanted to take the time to explain the points I am trying to get at in the paper, and a little personal history on how it came about.

In the 2006 BSG meeting, I was a complete unknown.  I knew absolutely no one from before the meeting.  I had come to do a presentation over some interesting overlaps between computer metaprogramming and the way that antibody genes rearrange themselves.

At the meeting was a reporter, who was doing a book on the intersection between fundamentalism and science.  Between meetings the reporter would ask various people questions about their beliefs, and what followed was usually a stimulating conversation.  At one of these conversations, we were talking about evolution, and I (perhaps naively) stated, “it is impossible for new information to be generated by evolution”.  One of the other creationists in the conversation quickly retorted, saying that I was absolutely wrong, and strongly implying that I was foolish for even saying so.  Those of you who are in creation research could probably guess who this was.

I thought this was odd (both the idea that natural selection might offer a way to create information by itself and that I was so roughly thrown under the bus by a fellow creationist).  And pondered it in the back of my mind for awhile.  Did I know for sure that information could not be created?  How did I know this?  The idea that information could not be created by natural selection seemed correct to the engineer in me, but was this really correct?

Another event hit upon the same question.  I had recently purchased a copy of the 1984 Oxford Union debate between Arthur Anderson, A. E. Wilder-Smith, John Maynard-Smith, and Richard Dawkins.  I thought that the creation side (Anderson and Wilder-Smith) was well-argued, save for one detail.  Dawkins (I think) came up with an example of information being created (I forget what it was), and Wilder-Smith (I think) argued that this was not an instance of information being created, but rather of already-existing information being merely “shuffled around”.  Dawkins retorted that since there were only four nucleotides available, all information in the genome arose through “shuffling around” of genetic information.  While my intuition sided with Wilder-Smith, I realized that his argument hinged on a separation between the creation of information and the rearrangement of information.

Again, I intuitively agree with Wilder-Smith’s assessment.  Similar things happen in the rearrangement of antibody gene parts for the creation of novel antibodies.  About 90% of the work comes from shuffling well-defined pieces of information around, and about 10% of the work comes from a series of focused rounds of mutations.  It seemed that most of the information was already existing, and merely being shuffled around.  However, the problem was that there was no objective way of making the assessment between something being created and something being rearranged.

This reminded me of an old friend of mine from my days at Wolfram Research, Chris Knight.  Chris was a computer genius.  I met him when I had just graduated from college and he was just turning 16 -- and he was light years ahead of me in computer programming skills.  One thing that enamored Chris (which enamors a lot of computer science types) is the idea that there is not a clean separation between computer code and computer data.  Computer data can be treated like a code.  And computer code can be represented as data for manipulation.  There are even some languages, such as Scheme and LISP, which elevate such intertwinings of code and data into an art form.

The distinction between “information shuffling” versus “information creation” is similar to the distinction between “code” and “data”.  No person doubts that data can be created without intelligence, but can code?  The idea that code and data can be intertwined so much easily leads a person to conclude that there is no such divide.  But yet, the ability to apply this usefully seems limited to only cases where the code/data is very simple.  But yet, I could not yet see the dividing line between the two.

When I was at Wolfram Research, Stephen Wolfram was just about to finish his magnum opus, A New Kind of Science (hereafter referred to as NKS).  At the time, I was uninterested.  His work in cellular automata did not seem to have any impact on my life and work, so I basically ignored NKS for the time I worked there.  But later, I picked up a copy from the library.  And in those pages, I found the answer to my dilemma.

For the gory details, you can see my paper.  But here’s what I want you to think about.  Imagine a computer program.  There is a difference between these levels of customizability within a program:

  1. The program has no customizations, you just have to go with what the package offers.
  2. The program has a range of customization settings open to the user, and they can set these to a variety of interesting settings appropriate for their business.
  3. The program makes itself available for a programmer to customize it using a general-purpose programming language.

What you see here is not just a rising level of configurability, but also a rising level of intelligence required for making the configurations.  #1 can be configured by an idiot.  #2 can be configured by trial-and-error, and #3 can be configured only by a professional (or perhaps it would be more exact to say that there are aspects of the configuration which can only be used by a professional).

It turns out that there is a type of programming system called a Universal computer.  What makes a Universal computer interesting is that it can, given the right program, compute any computable function.  So it is open-ended.  Here’s the other interesting insight -- Universal computation only arises in chaotic environments.  What makes this so interesting is that a chaotic system, in general, does not give gradual output changes to gradual changes in its programming.  Therefore, to have something “evolve” on a Universal computer it would, by necessity, have to make several leaps to work.  In order to get smooth output changes, which are required by natural selection, one would have to propose a coordinated system of changing the code - something not allowed by naturalistic scenarios, because the changes would have to be coordinated to match the desired gradualistic output.

This provides an answer to my question about information creation versus information shuffling.  If the input domain is open-ended - that is, it is flexible enough to hold the solution to any problem given the right code - then the solution cannot be reached by gradual configurational changes alone, because that is the nature of the way Universal computers behave.  Now, you can design a programming systems where gradual changes to the code lead to gradual changes in the output, and as such would be open to natural selection.  However, these are not Universal computers, and therefore the potential range of results is not open-ended.

Thus, the dichotomy is not necessarily between code and data, but between parameterized programming systems and open-ended programming systems.  If the system is parameterized, then change only happens within the specified parameters.  There may be genuinely new things happening there, but the parameters for their occurrence were specified in advance.  Thus, you can see that the common ID and Creationist claim that “information cannot be created by natural selection” is both true and false.  It is true that open-ended information cannot be created, but if the solution domain is appropriately parameterized, then information can arise within those parameters.

Obviously, this is not a rigorous proof, and if you want a more nuanced version, you should refer to the paper.  But nonetheless, I think that this should give you an idea of the questions that I was attempting to answer and the approach that I took.

There’s a lot more to say about this, but I think this is enough for now.  See the paper for a lot more information, as well as numerous applications.  I especially liked how this related to the evolutionary software Avida in section 3.4.  In any case, this background should help you make sense of what the paper is about and where I am going with it, should you decide to read it.




February 06, 2009

Information Theory / RNA Editing and Data Encapsulation Formats


I was thinking about data encapsulation today.  In a computer program, if I want to pass the words "hello world" to a website, I can't just stick it in the URL - spaces aren't allowed in URLs - they serve a different function there.  Instead, in URLs, spaces get translated to %20s, so I would pass it as "hello%20world".  Different formats have different rules for encapsulation, so if I want to take a single set of characters, and move them from one system to another, it is possible I may have to encapsulate/de-encapsulate multiple times. 

So, I was thinking about this with regards to RNA editing.  Before I start to make this analogy, let me start by saying the instances of RNA editing I know about don't seem to be working in this way.  Nonetheless, I think it is an interesting angle to research to be sure.

What I am wondering is if there are times when the DNA code might be "encapsulated" in a slightly different format, which then gets de-encapsulated by RNA editing to be passed on to the next phase.  In computers, this happens when additional control information must be passed on using the same alphabet.  In our previous example, in the control system alphabet for web requests, the space has a special use.  Therefore, if we need to use a space within the URL itself, we have to encapsulate it so that it doesn't get confused with its special use.

Anyway, just wondering out loud (a) if this happens at all, and (b) whether it occurs through RNA editing or some other mechanism, and (c) what are the different levels of control information and how are they designated.

January 29, 2009

Information Theory / An Interesting Perspective on Intelligent Design


While I didn't agree with everything he said, I think everyone will find this talk by Kirk Durston fascinating.  The one thing that I don't think he properly took into account was that the "fitness function" on computers is necessarily finite, while the "fitness function" in real life does not necessarily have to be either specified nor finite.

Kirk paints the problem as having a smart enough fitness function - therefore Darwinism is only plausible if the fitness function of life has sufficient information to form life as we know it.  However, I think the key he misses is that natural selection is not a fitness function in the same veing as a genetic algorithm fitness function.  Natural selection requires that something be usable now, while an appropriate fitness function could select for future optimality (Dawkin's WEASEL is an excellent example, but there are also much more subtle ways of doing this).  While Durston makes some great points, the problem, as I see it, will always be the generation of diversity, not its selection.

December 31, 2008

Information Theory / Randomness in Creation Biology


After a year and a half of waiting, my paper on randomness and Creation Biology has finally been published in the current CRSQ!  The paper is titled "Statistical and Philosophical Notions of Chance in Creation Biology".  The main points of the paper follow:

More Than One Meaning for Random

There are different notions of chance and randomness which have very different implications, but we tend to lump them all together.  This causes sloppy thinking and can blind the way we look at things.  This is true in both Creationary and Evolutionary literature.

One type of randomness is statistical randomness.  A process is statistically random if events in that process occur with fixed percentage frequencies over every "normal" infinite subset of those events.  Another type is philosophical randomness.  A process is philosophically random if it occurs outside the constraints of a system.  The state of a slot machine after pulling its lever is the result of statistical randomness.  The state of a slot machine after being hit by a meteor is the result of philosophical randomness.

Statistical randomness is often used in engineering to great benefit.  It offers a way of counteracting unknowns.  I'll probably devote a post to statistical randomness at a later date.


The Luria/Delbrück and Lederberg experiments (which are normally used to prove the randomness of mutations) are not by themselves evidence of an unplanned process.  Another possibility explored in the paper are that many of these are pre-adaptive mechanisms.

Basically, I think that many "spontaneous" mutations are actually the result of a process to increase a population's future fitness.  Basically, it forces alternate biochemical configurations into the population at a controlled rate, so that a catastrophic environmental change does not wipe out the entire population.  If a cell uses a statistically random process to create these alternate configurations, the population can keep fixed percentages of alternate configurations without any individual cell needing to know how many of each configuration are already in the population.

Therefore, I think that many (but not all) of the "spontaneous" mutations we see are not haphazard just because they occur in the absence of selection, but instead are planned mechanisms to introduce alternate biochemical configurations into the population (which are likely less fit in the current environment) to prepare for extreme changes to the environment in the future.

Testing for Whether Mutational Hot Spots are Planned or Haphazard

Mutational "hot spots" within the genome may be either the result of a planned mutational mechanism or just the happenstance physical interactions of biochemistry.  I proposed that the proper test for this would be whether or not mutations within the hot spot are more or less likely to be biologically meaningful than a statistically random mutation with uniform probability over the entire genome.  Based on Dembski's work, if a hot spot repeatedly gives us biologically meaningful mutations more frequently than statistically random mutations over the whole genome, then this is evidence that the hot spot is part of a designed mechanism.

November 26, 2008

Information Theory / Optimality of the Genetic Code


Teleomechanist has a great post up with a literature review about the optimality of the genetic code.

The genetic code seems to be designed with the following features:

  • Error minimization in protein construction on simple genetic mutations
  • Minimizing bad effects of frameshift errors
  • Error detection/correction through parity checking
  • Ability to layer on additional codes (the original paper for this is worthwhile) to the protein sequence

Anyway, Teleomachinist lists a lot of other interesting aspects about the genetic code, but I thought these were the top ones.  I especially like the fact that the genome seems to be optimized for having additional codes layered on.  That is very interesting indeed.

August 27, 2008

Information Theory / Mathematics Points to Design


Progetto Cosmo has a fantastic article which deals with some of the interesting problems that have been encountered in the last century in mathematics, and how those problems relate to intelligence and design. 

They document the progress of work in Godel, Chaitin, von Neumann, and others, and show how each of them came to the conclusion that, fundamentally in life (not just biology), more does not come from less.  It is a great tour of some of the thinking that is fundamental to design yet is often times ignored in sound-bite debates.

July 27, 2008

Information Theory / Metaprogramming as a model for VDJ Rearrangements


Two years ago I gave a talk to the BSG on how the genome acts as a metaprogramming system during VDJ recombination (see R14).  I was wanting to get someone to do additional testing on this, or at least write it up as a proper paper before presenting it on the blog, but since I haven't had the time or resources for either, I figure I'll just go ahead and post it.

A Quick Introduction to Metaprogramming

All computer programmers are inherently lazy - that's why the only thing we are good at is getting computers to do things for us.  In fact, we're so lazy, if we can write a program for the computer to generate code for us, we will.  Such programs - programs to generate programs - are called metaprograms.  If you're interested in the computer programming aspect of metaprograms, I wrote three-part tutorial series on it (Part 1 | Part 2 | Part 3).

But the key parts of metaprogramming are these:

  • The programmer specifies solutions in a domain-specific manner - that is, in a format that is specialized to the task at hand
  • The metaprogramming system then rearranges, rewrites, and otherwise remakes the domain-specific program into a program in the standard language which is to be translated
  • The metaprogramming system is responsible for making the parts of the metaprogram interact correctly

Metaprogramming systems are used to abstract away two things that make programming difficult:

  • Redundant specifications
  • Interaction issues between pieces
  • Mismatches between the problem domain and the solution domain

Now, because metaprogramming systems are doing all of this, it necessarily means that metaprogramming systems are narrow in scope.

A Quick Introduction to V(D)J Recombination

The cell can generate millions or billions of antibodies out of a relatively few number of genes.  It does this by splitting antibodies into four regions - the variable region (V), the diversity region (D), the joining region (J), and the constant region (C).  Each of these regions has multiple genes associated with it, separated by Recombination Signal Sequences (RSSs) and spacers.  Heavy chain antibodies use all four types of regions, while light-chain antibodies just use V, J, and C regions.  The constant regions are (surprise!) constant within an antibody class.

So, when B-cells mature, they pick a single gene from each of the V, D, and J regions, assemble them together, and join them to the constant region of the antibody.  The VDJ regions specify the affinity of the gene towards an antigen (which is why it needs so much diversity), while the C region specifies the attachment to the cell (which is why it remains constant).  

However, when VDJ regions are recombined, an interesting thing happens - a series of non-templated (N) and/or Palindromic (P) elements are inserted between these regions.   Current efforts so far (as of the 2006 paper - I haven't kept up since then) have classified these as "random" insertions.  What can the Creation model offer?

Note -This paper is a good starting point if you want to know more about V(D)J Recombination in general.

V(D)J Recombination as a Metaprogramming System

So, in V(D)J recombination, you have

  • A series of code rearrangements
  • V, D, and J segments all being pulled from a bucket of similar genes and stitched together
  • The whole thing being attached to a constant region
  • All of these parts contributing to a narrow biological focus
  • The rearrangement system is adding in non-templated code in-between the joined-together segments

So, in the metaprogramming model, what is the role of the rearranging system?  It is to not only help the parts of the code go into their correct places and add-in the non-redundant parts, but to also manage the interactions of the parts, so that the recombine properly.

Therefore, if V(D)J recombination is acting as a metaprogramming system, then the probable reason for the addition of N and P elements is to manage the interaction of the parts.  This lets the V, D, and J components evolve more freely without having to worry about how they will interact with the other parts of the recombination system.  The recombination system worries about how the parts will interact.

So, is there any evidence that this is what is going on?  Actually there is.  In certain mouse antibodies, arginine is required at position 96 in order for the antibody to have proper affinity.  Interestingly, this was always generated during recombination in the cases where it was required, even if it wasn't coded for by either of the joined segments!  It appears as though the V(D)J recombination system knows that the arginine is required for affinity, and therefore is able to generate it when necessary.

Now, there are two possible pieces of counter-evidence which I am aware of:

  • Some antibodies recombine in multiple ways using the same template pieces. 
  • Some recombinations are in fact unproductive

However, the first one could simply be because either (a) the recombination system is directional but non-deterministic (i.e. it biases outcomes that are probably workable, but doesn't limit the outcome to a single possibility), or (b) there are additional elements at play, (c) both (a) and (b).

The second one could be the result of a non-deterministic system - that it only biases good results but doesn't guarantee them.

Obviously, this needs experimenting before it is taken as fact, but I think the evidence currently points in this direction.

Other Metaprogramming Possibilities

If the V(D)J recombination system is actually a metaprogramming system, there are some other possibilities worth looking into:

  • There is currently an "unused" region of nucleotides that is a "spacer" between the RSS and the unrecombined genes.  Could that possibly contain metadata about the frequencies and occasions which that gene should be used?
  • Non-homologous end-joining uses a similar recombination method to V(D)J recombination.  Perhaps it also contains heuristics about how the affinities of genes works and how they can be recombined.

Enterprise Metaprogramming and Biology

The V(D)J Recombination system is a fairly standard metaprogramming system.  However, in Computer Science we have another type of metaprogramming facility, called an "enterprise" metaprogram.  In these, the specifications are actually specifications for multiple different subsystems.  That is, a single template is run through multiple metaprogramming systems (one for each subsystem), and it can generate a unified, interacting system.

So, in biology, we would be looking for a system that recombined one way in one tissue, and recombined another way in another tissue, in such a way that variations in those genes would cause the two tissue types to change in coordination with each other.  Alternatively, we might be more likely to find, instead of a recombination system, a mechanism of alternative splicing, so that one gene is spliced in different ways, depending on the tissue, but spliced in such a way that changes within the gene through evolution would cause coordinating changes in the protein products in multiple tissues.  

NOTE - there are several claims here that are unreferenced.  If you are interested in them, mention it in the comments, and I will try to look it up for you.  As I said, it's been two years since I did this research, and it's been pretty much sitting on a shelf since then, so it may take me a few days to find.