Archive for Python

Why you cannot pickle generators

Joseph Turian wrote a post about regarding pickling generator on his blog. In his post, he says:

However, generators become problematic when you want to persist your experiment’s state in order to later restart training at the same place. Unfortunately, you can’t pickle generators in Python. And it can be a bit of a PITA to workaround this, in order to save the training state.

This caught my attention, because I was involved in the decision, he cites, to not allow generators to be pickled in CPython. Although Joseph’s examples are a bit convoluted, it is pretty clear why his generators cannot be pickled automatically—i.e., Python cannot pickle the operating system’s state, like file descriptors.

Let’s ignore that problem for a moment and look what we would need to do to pickle a generator. Since a generator is essentially a souped-up function, we would need to save its bytecode, which is not guarantee to be backward-compatible between Python’s versions, and its frame, which holds the state of the generator such as local variables, closures and the instruction pointer. And this latter is rather cumbersome to accomplish, since it basically requires to make the whole interpreter picklable. So, any support for pickling generators would require a large number of changes to CPython’s core.

Now if an object unsupported by pickle (e.g., a file handle, a socket, a database connection, etc) occurs in the local variables of a generator, then that generator could not be pickled automatically, regardless of any pickle support for generators we might implement. So in that case, you would still need to provide custom __getstate__ and __setstate__ methods. This problem renders any pickling support for generators rather limited.

Anyway, if you need for a such feature, then look into Stackless Python which does all the above. And since Stackless’s interpreter is picklable, you also get process migration for free. This means you can interrupt a tasklet (the name for Stackless’s green threads), pickle it, send the pickle to a another machine, unpickle it, resume the tasklet, and voilà you’ve just migrated a process. This is freaking cool feature!

But in my humble opinion, the best solution to this problem to the rewrite the generators as simple iterators (i.e., one with a __next__ method). Iterators are easy and efficient space-wise to pickle because their state is explicit. You would still need to handle objects representing some external state explicitly however; you cannot get around this.

Porting your code to Python 3

See the plain HTML version.

The following is a write-up of the presentation I gave to a group of Python developers at Montreal Python 5 on February 26th. This is basically a HTML-fied copy of the notes I prepared before the presentation. I haven’t done editing, so expect a few grammar mistakes there and there. My complete presentation slides are available here. A video was taped should be released in the upcoming weeks (I will post a link here when I finally get my hands on it). Please note that if you’re looking for more complete guide about Python 3 (and more accurate), I highly recommend that you read the What’s New In Python 3.0 document and the Python Enhancement Proposals numbered above 3000.

You may wonder why we did Python 3 afterall. The motivation was simple: to fix old warts and to clean up the language before it was too late. Python 3 is not complete rewrite of Python; it still pretty much the good old Python you all love. But I am not going to lie. There are many changes in Python 3; many that will cause pain when you will port your code; and so many that I won’t be able to cover them all in this talk. That is why I will focus only on the changes that will need to know to port your code. If you want to learn about all new and shiny features, you will need to visit the python.org’s website and the online documentation of Python 3.

In the second part of this presentation, I will go over the steps needed to port a real library to Python 3. Hopefully, this part will give you a basic knowledge and tools to tackle the problems linked to the migration.

Finally, I will give you an insider’s view of the upcoming changes in Python 3.1, which suppose to be released later this year.

Let’s starts with the most obvious change in Python 3—that is print is now function. Some people really don’t like this change (mostly because it makes hello world one character longer). But making print a function is actually a good thing. First, it more flexible; you can now change the string separator, pass print() as an argument or even override the function completely.

In addition, the syntax is much cleaner—no weird >>sys.stderr anymore. On other the hand, it is true that it takes a bit of time to get used to the extra parentheses. Thankfully, converting your code to use the new print() is easy and completely automated. You just run the 2to3 tool (I will talk more about 2to3 later) and you’re done.

There is one thing special about the keyword arguments of print(); they need to be explicitly written out. In other words, they can only be supplied as keywords and never as a positional argument.

This behavior is actually a new feature in Python 3, called keyword-only arguments. This is one of things that might surprise you when write new code with Python 3 (it did surprise me more once), since the error message is not that great. It makes sense from an implementation point-of-view, but not so much the user point-of-view. I hope someone will suggest something better in the future, but in mean time we have to live with this funky error message.

Keyword-only arguments are really useful when you have function that takes a variable number of arguments and you want to add optional options to it—just like print(). Another good use of this feature is for forcing your API users to explicitly write out their intent. For example, this is currently for the list.sort() method and the sorted() function.

Finally, the syntax for making a function take keyword-only arguments is the following:

There is also a way to do the same thing in C, but that is out of scope of this presentation.

Now, let me introduce you the big change in Python 3: Unicode throughout. (Ed. There was a big applause when I announced this at Montreal Python. So, I guess the conversion pain did worth it.) This is huge; it took six Python Enhancement Proposals (PEPs, for short) to cover the changes related to Unicode. And I am pretty sure that not everything is covered in these. For this reason, I hope you will understand that I cannot cover everything today. So, what are these changes?

For one, all strings are Unicode by default. This means you cannot treat text as bytes, and vice versa, anymore. For example, if you read some bytes from disk or a network, you will need decide whether it is data or text; and this isn’t always obvious. Is a filename data or text? Is command-line argument data or text? Or, is environment variables data or text? In many cases, Python core developers had to make compromises when converting the old APIs to Unicode.

So, let’s examine the case of filenames. The first problem we run into is: how do we detect the character encoding used by the filesystem? There is no standard way of doing this that works on every platforms supported by Python. On MacOS X, life is simple; we just use UTF-8. On Windows, we can use the Wide API and things mostly work. On Unix however, the encoding can be anything. So, we cannot tell in advance what the encoding will be; we have to detect at runtime with langinfo API (if present). And this leads to some interesting bootstrapping issues, since some codecs in Python are not built-in. For example, there are known problems with Python scripts running from a directory whose path contains non-ASCII characters.

Another problem we run into is: what should we handle filenames encoded incorrectly? Even if we know that the filesystem uses UTF-8, that doesn’t mean all filenames will be a valid UTF-8 byte sequence. In Unix for example, there is only nul and slash that cannot appear in a filename; so, it is possible to construct filenames that cannot be interpreted as a text string. And this is basically what I want to say; it is not always clear what is text and what is data. So in Python 3, most system APIs accept bytes as well as strings as a work-around.

However, the problems I have described are not as bad as it sounds. In most cases, the Unicode enhancements will lead to better code and also fewer bugs. And having Unicode throughout has opened the door for other internationalization improvements as well. One of these improvements is non-ASCII identifiers are now supported (but not advocated).

Another feature of Python 3 is the new I/O library designed with Unicode in-mind. From a core developer’s point of view, this change is fairly large: a departure of C stdio and a brand-new I/O class hierarchy completely written in Python (which is currently being rewritten in C for performance). However, from the point of view a typical Python developer, there isn’t much that has changed. I/O still work the same as before; open() still return file-like object, which an be written to and read from just like before.

But if you want more control over your I/O, now you can. Just import the io module, and use or derive a class that fits your needs. One nice thing about the new I/O is once you’ve defined the raw byte-based interface, you can easily add buffering and text-handling features.

Take for example a network socket. What can we do with a socket? Well, we can read some bytes from it and maybe also write to it too. But, we cannot seek it like a file. Usually, we call such objects streams. So, we can derive our SocketIO class from io.RawIOBase and define our methods. Need buffering? Just wrap an instance of SocketIO with io.BufferedReader or io.BufferedWriter. Need text-handling too? Well wrap your instance with io.TextIOWrapper. And that’s all there is to it.

If you’re used to Java I/O libraries, this should sound fairly similar; and this is intentional. The main difference is the new I/O in Python simpler. If you want to learn more about the details the new I/O library, I encourage you to read the PEP and the online documentation.

Now, let’s talk about the change that will probably cause the most pain during the transition: the standard library reorganization. In Python 3, many modules were remove, renamed and repackaged. Initially, the reorganization was not part of the plans of Python 3. But since Python 3 was going to be backward-incompatible anyway, many developers (myself included) saw a chance to clean up the library and remove the silly old stuff all at once. So, instead of having many incompatible releases over time, we have big one.

Thankfully, the 2to3 tool will handle almost all the work for you. Unfortuately, 2to3 won’t help with removals. This means you will need to change your code to not use these before porting to Python 3. PEP 3108 documents all the changes we have done; it also suggests replacements for modules that were removed. So, this should be the first place to look at whenever you have a problem with a reorganized module.

Also if you the pickle module, the standard library reorganization will make it hard for you to create pickle data streams that works both on Python 2 and 3. The problem is pickle saves class and function objects by named reference. This means if you have pickle data created with Python 2, in which a instance whose class was renamed in Python 3, pickle will not be able to recreate the instance in question. Unfortunately, there is nothing yet to help you with that problem. Although it is possible to subclass Unpickler and modify it to rewrite names on-the-fly, this is not very convenient.

In addition to stdlib reorganization, the behavior of some well known APIs has changed. In particular, many methods that used to return lists, now return an iterator or a view. For example, dict’s keys(), items() and values() (Ed. values() is not actually a set-like object for the obvious reason that a dictionary may contain duplicate values. This was an error from my part.) are no longer lists; they return a set-like object called a view. Personnally, I found this change very nice when working graphs implemented using dicts, because I could now use standard set operations, like addition and subtraction, on the views.

Similarly, many built-in functions now return iterators instead of lists. This is the case of map(), filter() and zip(). For map() and filter(), it is typically a good idea to rewrite them as list-comprehension. Another change in the same line is xrange() is the new range(). For most code, this requires no modifications. Again, 2to3 handles these changes for you.

Continuing on API changes, some special methods have been removed or renamed. For example, the next() method on iterators is now called __next__(). To get the next item of an iterator, use the built-in function next().

Also, __getslice__ and friends were removed in favor of __getitem__.

The special methods __hex__ and __oct__ were removed in favor of __index__(). Generally, this requires no change in your code. Note, 2to3 will not remove the old methods.

Another fairly important change in Python 3 is the simplification of the rules for ordering comparisons. So in Python 3, the old three-way comparison rules has completely replaced by a much simpler (and faster too) mechanism (Ed. There wasn’t much rejoice when I presented this change. People kept asking why Python doesn’t generate comparison methods automatically from __lt__ and __eq__).

Clearly, 2to3 won’t translate old three-way compares, so you will need to support three-way and rich comparisons if you want your code to work both on Python 2 and 3. The changes needed are usually straightforward, so this generally not a problem.

We already saw that the syntax for the print statement and unicode string was changed. So, the remaining changes I want to talk about are the other syntactic changes in Python 3. For the most part, the new syntax niceties are also available in Python 2.6 has optional features; the difference in Python 3 is you’re now required to use them. But don’t worry, 2to3 will handle these changes fairly well. So what are these changes?

First, we have new syntax for catching and raising exceptions. In particular, the syntax for saving an exception was simplify to remove ambiguities.

Similarly, the syntax for raising exceptions was simplified. Note that all exceptions must derive from BaseException or, more commonly, from the Exception class. This was optional in Python 2; this is now enforced in Python 3. As a consequence of the new syntax, tracebacks must be new set explicitly via the __traceback__ attribute. However if you need to do that, you probably want to check out a new feature called exception chaining.

In Python 3, we also have new syntax for specifying metaclasses. To do so, we allowed keywords arguments in after base classes list in the class definition. Currently, this is only used to support the new metaclass syntax; but this could be used for other purposes, as well, as long the metaclass used supports it.

Continuing, relative imports now need to use the from-dot-package syntax. If you omit the dot, it will be interpreted as an absolute import.

Now, let me show you two lovely additions to Python 3: set and dict comprehension. But first, I need to introduce the new syntax for set literals.

We can almost guess what is the syntax set and dict comprehension.

So, we are now ready for the real thing: migrating to Python 3. There is more than one way to approach the migration and there is no approach that will fit all your needs. In many cases, you have to experiment and choose whatever work best for you. Also, I am only going to cover the issue of migrating Python code. If you want to migrate C extensions, you will need to check out the online documentation.

The very first step before migrating is to verify you have an excellent test coverage. If you do not have a test suite, it would be a good time to start investing time to create one. I wouldn’t even think about migrating to Python 3 without test suite, since it is practically impossible to predict where your code is going to break.

Once you verified your test suite was alright, you should begin by porting your code to Python 2.6; generally this is effortless. Then, turn on the -3 flag of Python 2.6. This will enable warnings about features that have been removed or changed in Python 3. Run your tests and fix all the warnings you see.

It is also a good idea to modernize your code at this stage; and try to reduce the semantic gaps as much as possible. For example, start using the iterator version of dict.keys(), .values() and .items(); avoid implict str and unicode coercions; use __getitem__ instead of __setslice__; etc. Doing this will decrease the amount of changes the 2to3 translator will have to do, and thus reduce the chances of introducing new bugs.

Once you are done with that, you now ready to port your code to Python 3. This is where it become tricky. First, you will need to decide how you will maintain the Python 3 version of your project.

There is three main possibilities at this point. You can, for one, remove support for Python 2 and move your project completely to Python 3. This is not good idea if you already have a lot of users (in the case of a library).

Another possibility is to modify your code so that the 2to3 tool can translate it, without manual intervention, to a working Python 3 version. This is approach recommended by Python’s core-devel team if you maintains a library that needs to support both Python 2.6 and Python 3. So when you do changes to your code, you edit the Python 2.6 version and run the 2to3 tool again to forward your changes, rather than editing the Python 3 version of the source code. This approach works, but I find it unnecessarily painful as you still end up maintaining a lot of crufts.

So, the approach I prefer is to create a separate branch for Python 3 and start maintaining two lines of development. This works great if you use one of these fancy DVCSs, as you can do your changes in the Python 2 branch and then forward your changes to the Python 3 branch by simply merging them. And when there is incompatibilities, you can run 2to3 tool on Python 3 code and it will fix these for you. An advantage of this approach is it gives a change to clean up your code and remove, from the Python 3 version, all that backward-compatibility stuff you may have accumulated over the years. And for many projects, this will be the only acceptable approach (mainly because of the Unicode changes).

Now, I would like to demo some of the features that are available that will ease the transition to Python 3.

Ed. In this part, I have shown a short demo on how to use 2to3 to convert feedparser to Python 3.0. This portion of the presentation was not prepared in advance and was interactive. If you want to see it, you will need to watch the video.

Concluding remarks:

Ending note. If you appreciated the content of this presentation or have suggestions, let me know! I am currently planning to do another talk at Montreal Python about extending and interfacing external code with Python. This presentation would mostly cover how to write extension using the C API of Python. As you can imagine, preparing a good presentation is lot of work. So any encouragement is welcomed.

Summer of Code Weekly #4

All is well for me and my project. I finished the merge of cStringIO and StringIO, and I am now moving to the more challenging cPickle/pickle merge. During the last two weeks, I mostly spend my time analyzing the pickle module and thinking how I will clean up cPickle. My current plan is:

  1. Make cPickle’s source code conform to PEP-7.
  2. Remove the dependency on the now obsolete cStringIO.
  3. Benchmark cPickle and pickle.
  4. Add subclassing support to Pickler/Unpickler.
  5. Reduce the size of cPickle’s source code based on the bottlenecks found by the benchmarks.

Hopefully, cPickle/pickle merge will be as smooth (and as fun) as the cStringIO/StringIO merge.

Pickle: An interesting stack language

The pickle module provides a convenient method to add data persistence to your Python programs. How it does that, is pure magic to most people. However, in reality, it is simple. The output of a pickle is a “program” able to create Python data-structures. A limited stack language is used to write these programs. By limited, I mean you can’t write anything fancy like a for-loop or an if-statement. Yet, I found it interesting to learn. That is why I would like to share my little discovery.

Throughout this post, I use a simple interpreter to load pickle streams. Just copy-and-paste the following code in a file:

import code
import pickle
import sys

sys.ps1 = "pik> "
sys.ps2 = "...> "
banner = "Pik -- The stupid pickle loader.\nPress Ctrl-D to quit."

class PikConsole(code.InteractiveConsole):
    def runsource(self, source, filename="<stdin>"):
        if not source.endswith(pickle.STOP):
            return True  # more input is needed
        try:
            print repr(pickle.loads(source))
        except:
            self.showsyntaxerror(filename)
        return False

pik = PikConsole()
pik.interact(banner)

Then, launch it with Python:

$ python pik.py
Pik -- The stupid pickle loader.
Press Ctrl-D to quit.
pik>

So, nothing crazy yet. The easiest objects to create are the empty ones. For example, to create an empty list:

pik> ].
[]

Similarly, you can also create a dictionary and a tuple:

pik> }.
{}
pik> ).
()

Remark that every pickle stream ends with a period. That symbol pops the topmost object from the stack and returns it. So, let’s say you pile up a series of integers and end the stream. Then, the result will be last item you entered:

pik> I1
...> I2
...> I3
...> .
3

As you see, an integer starts with the symbol ‘I’ and end with a newline. Strings, and floating-point number are represented in a similar fashion:

pik> F1.0
...> .
1.0
pik> S'abc'
...> .
'abc'
pik> Vabc
...> .
u'abc'

Now that you know the basics, we can move to something slightly more complex — constructing compound objects. As you will see later, tuples are everywhere in Python, so let’s begin with that one:

pik> (I1
...> S'abc'
...> F2.0
...> t.
(1, 'abc', 2.0)

There is two new symbols in this example, ‘(‘ and ‘t’. The ‘(‘ is simply a marker. It is a object in the stack that tells the tuple builder, ‘t’, when to stop. The tuple builder pops items from the stack until it reaches a marker. Then, it creates a tuple with these items and pushes this tuple back on the stack. You can use multiple markers to construct a nested tuple:

pik> (I1
...> (I2
...> I3
...> tt.
(1, (2, 3))

You use a similar method to build a list or a dictionary:

pik> (I0
...> I1
...> I2
...> l.
[0, 1, 2]
pik> (S'red'
...> I00
...> S'blue'
...> I01
...> d.
{'blue': True, 'red': False}

The only difference is that dictionary items are packed by key/value pairs. Note that I slipped in the symbols for True and False, which looks like the integers 0 and 1, but with an extra zero.

Like tuples, you can nest lists and dictionaries:

pik> ((I1
...> I2
...> t(I3
...> I4
...> ld.
{(1, 2): [3, 4]}

There is another method for creating lists or dictionaries. Instead of using a marker to delimit a compound object, you create an empty one and add stuff to it:

pik> ]I0
...> aI1
...> aI2
...> a.
[0, 1, 2]

The symbols ‘a’ means “append”. It pops an item and a list; appends the item to the list; and finally, pushes the list back on the stack. Here how you do a nested list with this method:

pik> ]I0
...> a]I1
...> aI2
...> aa.
[0, [1, 2]]

If this is not cryptic enough for you, consider this:

pik> (lI0
...> a(lI1
...> aI2
...> aa.
[0, [1, 2]]

Instead of using the empty list symbol, ‘]’, I used a marker immediately followed by a list builder to create an empty list. That is the notation the Pickler object uses, by default, when dumping objects.

Like lists, dictionaries can be constructed using a similar method:

pik> }S'red'
...> I1
...> sS'blue'
...> I2
...> s.
{'blue': 2, 'red': 1}

However, to set items to a dictionary you use the symbol ’s’, not ‘a’. Unlike ‘a’, it takes a key/value pair instead of a single item.

You can build recursive data-structures, too:

pik> (Vzoom
...> lp0
...> g0
...> a.
[u'zoom', [...]]

The trick is to use a “register” (or as called in pickle, a memo). The ‘p’ symbol (for “put”) copies the top item of the stack in a memo. Here, I used ‘0′ for the name of the memo, but it could have been anything. To get the item back, you use the symbol ‘g’. It will copy an item from a memo and put it on top of the stack.

But, what about sets? Now, we have a small problem, since there is no special notation for building sets. The only way to build a set is to call the built-in function set() on a list (or a tuple):

pik> c__builtin__
...> set
...> ((S'a'
...> S'a'
...> S'b'
...> ltR.
set(['a', 'b'])

There is a few new things here. The ‘c’ symbol retrieves an object from a module and puts it on the stack. And the reduce symbol, ‘R’, apply a tuple to a function. Same semantic again, ‘R’ pops a tuple and a function from the stack, then pushes the result back on it. So, the above example is roughly the equivalent of the following in Python:

>>> import __builtin__
>>> apply(__builtin__.set, (['a', 'a', 'b'],))

Or, using the star notation:

>>> __builtin__.set(*(['a', 'a', 'b'],))

And, that is the same thing as writing:

>>> set(['a', 'a', 'b'])

Or shorter even, using the set notation from the upcoming Python 3000:

>>> {'a', 'a', 'b'}

These two new symbols, ‘t’ and ‘R’, allows us to execute arbitrary code from the standard library. So, you must be careful to never load untrusted pickle streams. Someone malicious could easily slip in the stream a command to delete your data. Meanwhile, you can use that power for something less evil, like launching a clock:

pik> cos
...> system
...> (S'xclock'
...> tR.

Even if the language doesn’t support looping directly, that doesn’t stop you from using the implicit loops:

pik> c__builtin__
...> map
...> (cmath
...> sqrt
...> c__builtin__
...> range
...> (I1
...> I10
...> tRtR.
[1.0, 1.4142135623730951, 1.7320508075688772, 2.0, 2.2360679774997898,
2.4494897427831779, 2.6457513110645907, 2.8284271247461903, 3.0]

I am sure you could you fake an if-statement by defining it as a function, and then load it from a module.

def my_if(cond, then_val, else_val):
    if cond:
        return then_val
    else:
        return else_val

That works well for simple cases:

>>> my_if(True, 1, 0)
1
>>> my_if(False, 1, 0)
0

However, you run into some problems if mix that with recursion:

>>> def factorial(n):
...     return my_if(n == 1,
...                  1, n * factorial(n - 1))
...
>>> factorial(2)
RuntimeError: maximum recursion depth exceeded in cmp

On the other hand, I don’t think you really want to create recursive pickle streams, unless you want to win an obfuscated code contest.

That is about all I had to say about this simple stack language. There is a few things haven’t told you about, but I sure you will be able figure them out. Just read the source code of the pickle module. And, take a look at the pickletools module, which provides a disassembler for pickle streams. As always, comments are welcome.

Summer of Code Weekly #3

During this third week of the Summer of Code, I found very difficult to concentrate on my work — I been a lightbulb instead of a laser. The result was little code done. On the other hand, I learned a lot about other things. For example, I now finally understand assembly language; how to use gdb; the basics of the design of the Linux kernel; etc, etc.

I also read the book “Producing Open Source Software”, by Karl Fogel. It is really good primer to the world of free software. If you have a burning desire to contribute open source projects, just like me, I highly recommend that you get your own copy, or read it online.

Summer of Code Weekly #2

I can confirm it now, this second week of coding was even better. It was harder on my brain cells, though. I am mostly done with the StringIO merge. I now have working implementations in C of the BytesIO and the StringIO objects. The only thing remaining to do, for these two modules, is polishing the unit tests. And that shouldn’t that me very long to do. So, in basically one week of work, I completed the merge of cStringIO. I am certainly proud of that.

Now, I will need to attack the cPickle and cProfile modules. I don’t know yet which I work on first. cPickle still seems very scary to me, and unlike cStringIO it’s huge. It’s about five or six times bigger. cProfile, on the other hand, is about the same size of cStringIO and well documented. I even wonder if I need to code anything for cProfile. It will be a piece of cake to merge. Now, one question remains: should I take the cake now, or keep it for the end?

Summer of Code Weekly #1

During this summer, I will post each week a short summary of what I did, the challenges I encountered and what I learned during my Summer of Code project. I am doing this for helping me to keep track of my progresses.

So how was my first week? It was great. I don’t know why but I love programming in C. It is just plain fun. I thought learning Python C API was going to be hard, but it is quite easy after all. I just read the code in Python itself and check the reference manual for the things I don’t know. My biggest surprise, this week, was really learning how to do subclassable types. It is strikingly easy, however it’s quite verbose. You can look at my scratch extension module, if you want a minimal working example.

Other than learning the C API, I started working on the cStringIO/StringIO merge. My current plan is to separate the cStringIO module into two private submodules, _bytes_io and _string_io. One will be for bytes literals (ASCII), and the other for Unicode. This will reflect the changes made to the I/O subsystem in Python 3000. These two submodules will provide optional implementations for the speed-critical methods, like .read() and .write().

One the best things, of this week, was the great feedback I got from other Python developers, and particularly from my mentor Brett Cannon, who cheerfully answers all my questions. Now, I just hope the following week will be as fun, or even more, as this one.

Smoked brains for dinner

Today, there will be a special quiz on Python hosted by me, in #ubuntu-trivia on FreeNode, at 20:00 UTC. Most of the quiz will be to write some simple procedures, faster than your opponents. The winner will, of course, get a superb prize — 5 Ubuntu stickers! Obviously, the real prize is the fun that will get during the quiz. And who knows, maybe you will learn a few neat tricks. So, see you there!

Boosted Python Startup

Yesterday, I was reading Peter Norvig’s excellent article about spell checking. Then, I started to look to some of his older stuff. So, I found his Python IAQ (Infrequently Answered Questions), and discovered a pretty neat trick:

h = [None]  # history

class Prompt:
    """A prompt a history mechanism.
    From http://www.norvig.com/python-iaq.html
    """
    def __init__(self, prompt='h[%d] >>> '):
        self.prompt = prompt

    def __str__(self):
        try:
            if _ not in h: h.append(_)
        except NameError:
            pass
        return self.prompt % len(h)

    def __radd__(self, other):
        return str(other) + str(self)

sys.ps1 = Prompt()
sys.ps2 = '     ... '

This improve the interactive prompt of Python with a shell-like history mechanism. With this prompt, you can reuse any previous value returned by Python. For example:

h[1] >>> lambda x: x * 2
<function <lambda> at 0xb7dab41c>
h[2] >>> [1, 2, 3, 4, 5]
[1, 2, 3, 4, 5]
h[3] >>> map(h[1], h[2])
[2, 4, 6, 8, 10]

You can make it your default prompt, by adding the above in your .pythonrc.py. You will need to specify its location to Python with the environment variable PYTHONSTARTUP. Just add something like the following to your shell configuration (e.g., .bashrc or .zshrc).

export PTHONSTARTUP="$HOME/.pythonrc.py"

I am sure there is a ton of other useful modifications, which can be done with the startup file of Python. If you’re interested, here my brand new startup file. And if you know any other cool tricks for Python, please tell me!

Flipping bits this summer

Dear Applicant, Congratulations! This email is being sent to inform you that your application was accepted to take part in the Summer of Code.

Today, I am truly happy. I wasn’t expecting to be accepted, really, and perhaps no other candidate did. My accepted project is to merge C and Python implementations of the same interface (i.e., StringIO/cStringIO, Pickle/cPickle, etc), and my mentor is the Python star developer, Brett Cannon. This will be a challenging project; I will have to work hard and efficiently to be successful. But one thing is sure, I will have some great fun.

I would like to congrats everyone who have been accepted. A special thanks to students who will be working on Ubuntu, this summer. There is surely some great projects for Ubuntu. And also, another special thanks to the mentors, who will be helping us this summer.