Hexspeak and Generator

EDIT: 2011-04-21 While browsing my old blog posts, I found the code here can be greatly improved. Here it is:

def words():
    with open('/usr/share/dict/words', 'r') as f:
        return (x.strip().upper() for x in f.readlines())

MAPPING = {'A':'A', 'B':'B', 'C':'C', 'D':'D',
           'E':'E', 'F':'F', 'O':'0', 'S':'5', 'I':'1'}

def main():
    is_hexword = lambda word: all(ch in MAPPING for ch in word)
    for word in filter(is_hexword, words()):
        print word, "\t", ''.join(MAPPING.get(ch, ch) for ch in word)

if __name__ == '__main__':

So, it’s a relatively slow day at work, and I’ve been “stumbling upon” on Wikipedia when I found this:


It’s pretty interesting, because I remember OS161 from my operating system course where they use 0xDEADBEEF as a value for uninitialized pointers. So I decided to write a small Python program that finds me all the “Hexspeak” words from a regular English dictionary.

Beginning by finding a plain text English dictionary. I know Linux has a file ‘words’ in the file system somewhere. A little search gives me its location: /usr/share/dict/words

Now onto Python coding:

Define a character-to-hex map:

HEXSPEAK_DICT={'A':'A', 'B':'B', 'C':'C', 'D':'D', 'E':'E', 'F':'F', 'O':'0', 'L':'1', 'S':'5', 'I':'1', }

Define a method to open the dictionary and return a list of words:

DICT = '/usr/share/dictdef words(file=DICT):
    f = open(file, 'r')
    retval = f.readlines()
    retval = [ x.strip().upper() for x in retval ]
    return retval

Define a method to print out the collection of hexspeaks

def print_dict(dict):
    for key in dict.keys():
        print "%-20s\t%s"%(key, dict[key])

The program itself is pretty clear. We go through every character in every word in the dictionary. If the character is not in a permitted Hexspeak character, we through out the word. Otherwise, we take the character and translate it into a hexspeak character if necessary. Finally we print out the whole thing.

Running it, it gives me a list of hexspeak words. Everything is cool.

Now, the idea of generator has been around for a while. If you’re operating on a list, instead of loading the list into memory at once, you can use a generator and return one piece at a time, so it can save some resources. Granted, I’ve never used generators before, so I decided to experiement using a generator.

Here’s the code. Everytime xwords() is called, it returns the next processed value in the list.

def xwords(file=DICT):
    f = open(file, 'r')
    retval = f.readlines()

    for rv in retval:
        yield rv.strip().upper()

Now we need to modify WORDS to make it point to xwords

WORDS = xwords()

Running it, it gives the same result. So my experiment with generator succeeded! Hooray!

However, a little profiling on the program contradicts the intuition that using a generator is faster. I used Python’s timeit module for profiling:

python -m timeit -n 3 'import hexspeak; hexspeak.main()'

Using generator:

3 loops, best of 3: 565 msec per loop

Using list:

3 loops, best of 3: 549 msec per loop

Hmmm, so not only using a generator doesn’t save me any time, it actually got beaten by a tad bit by the plain ol’ list implementation…I’m sure I’m missing some points here. A good topic for a blog post for another day.

Here’s a list of Hexspeak words

comments powered by Disqus