ICFP 2007: Entangled in strings =============================== During the last weekend, I got hooked on the problem provided by this year ICFP challenge. I haven't participated officially, though. I knew in advance that I wouldn't solve problem, let alone win the contest. However, the contest's material is great. For me, it was a great learning experience, and a chance to test my new BytesIO module for Python 3000. If you are looking for a fun problem to spend your weekend on, this is the one. The only part I worked out is the DNA->RNA converter (I didn't really cared about rest of challenge). It took me about 4 hours to code my first prototype, accompanied with a full test suite. After that, I mostly spend my time trying to optimize it. My first prototype was slow; I mean really slow. It was barely able to get pass the first 5 iterations. So, I started to profile my code and remove bottlenecks. With my test suite, that was a fun and rewarding process. Then, I hit the wall of my naive approach -- i.e., about 100 iterations per second. Using a BytesIO object for storing the DNA sequence is simple, but inefficient since the data is always moving around. While optimizing, I tried some other approches too. For example, storing the DNA in reverse order to make prepending sequences efficent, but that mades other operations slower, thus making the whole thing even slower. I also tried the array module but without success. The reason why all these approches were doomed in advance is, they didn't solve the right problem. There is five critical operations on the DNA sequence: dna ← dna[n..] Removing the first n elements dna ← S + dna Prepending another sequence dna ← dna + S Appending another sequence dna[n..m] Taking the subsequence from n to m dna[..n] Taking the first nth elements Just by looking at this list, it is not hard to see why any array-based methods for representing the sequence will fail. Arrays are good for last two operations, but they are terrible on the other three. I had a few unexpected surprises too. For example, I found that using the Python implementation of StringIO was faster than the C one. After few ...