12 Days of Christmas

I was recently tasked with figuring out an obfuscated C program. It is 822 characters of complete gibberish, yet magically prints the entirety of The Twelve Days of Christmas when run.

We can clean it up a bit by running it through clang-format. After that, we carefully replace the ternary operators with if statements and clean up the logic, making sure that the output is the same with the command $ diff <(./orig) <(./a.out). We end up with a slightly nicer-looking program.

It now looks similar to a typical C quine, except there are two data blocks here. Additionally, the call to putchar() looks a bit odd, until we remember that in C, a[b] and b[a] are the same:

        return putchar(a[31]);

Using gdb, we see that when the program is run with no arguments, the value of t is 1 (as t takes the value usually assigned to int argc. Therefore, control flow falls into the last if statement, running return main(2, 2, "%s");. However, we find through experimentation that the return values actually have no effect. We transform the calls to return main(...) with two statements, main(...); return 0. Since the nested calls to main() are effectively adding 0 to the value of a, we can unnest them, making sure to order them in the correct sequence. Our program now looks a lot less complex.

We run through it again, optimizing more aggressively this time. By tracing through the program and occasionally guessing, we find that some strings like "%s %d %d\n" are useless, as well as all the negative numbers. The useless strings (including where a is passed but not used) are replaced with the empty string "" and the negative numbers are replaced with a single value. This final pass allows us to visualize the program easier.

At this point, it's fairly obvious that the two strings have something in common. The parser prints the 32nd character of one of the strings when two characters are matching, and seems to stop when it reaches a slash (/). We write a small program to play around with this. For every character in the big string (except slashes, which are printed verbatim), we look through the small string for the corresponding character. Once we have a match, we offset our position by 31 in the small string and send it to putchar(), just like in the original program. Running this algorithm, we are greeted with the components of the poem, delimited with slashes, as expected:

On the /first/second/third/fourth/fifth/sixth/seventh/eigth/ninth/tenth/eleventh/twelfth/ day of Christmas my true love gave to me
/twelve drummers drumming, /eleven pipers piping, /ten lords a-leaping,
/nine ladies dancing, /eight maids a-milking, /seven swans a-swimming,
/six geese a-laying, /five gold rings;
/four calling birds, /three french hens, /two turtle doves
and /a partridge in a pear tree.
It turns out that the program in question is a condensed version of Ian Phillipp's 1988 ioccc submission. Although old, the challenge proved to be a fun exercise in reverse engineering.