An Example of Python Generators in Action

Here is an interesting example of using a python generator to generate prime numbers


>>> def is_prime(num, primes):
... for prime in primes:
... if num % prime == 0:
... return False
... return True

>>> def make_primes():
... current_num = 3
... primes = [2]
... while True:
... if is_prime(current_num, primes):
... primes.append(current_num)
... yield current_num
... current_num +=1

>>> mp = make_primes()
>>> mp.next()
... 3
>>> mp.next()
... 5
>>> mp.next()
... 7


There is still some low hanging fruit as far as speed optimization goes, but I'll leave that as an exercise to the reader.

As for performance as is, I was able to calculate all the prime numbers up to 10,000,000 using ~20MB of RAM.

Note that for speed reasons, it is still storing the prime numbers so it WILL eat RAM as you go, but it saves many CPU cycles since there is no need to check against non-prime factors

Interators vs list comprehensions

So I made a discovery today... I was profiling iterators against their list comprehension counterparts and discovered that iterators are actually slower.

Not always slower though. They're slower if you're evaluating every object in the list.

Here is my test code:


def iters():
x = (x+1 for x in xrange(1,10000))
y = (y/1.5 for y in x)
z = [z**2 for z in y]

def listcomps():
x = [x+1 for x in xrange(1,10000)]
y = [y/1.5 for y in x]
z = [z**2 for z in y]

>>> %timeit iters()
100 loops, best of 3: 5.38 ms per loop

>>> %timeit listcomps()
100 loops, best of 3: 4.06 ms per loop

I guess I shouldn't be surprised. There has to be some overhead to setting up the iterator, and if there were more elements, or if they were bigger (like django ORM objects), it probably wouldn't be as noticable.

I'd be interested to hear about practical applications though, so leave a comment =D

Advantages of the Iterator (even though it's slower)
  • Uses less memory, since you don't store those lists in RAM
  • May actually be faster if you don't operate on all it's elements (useful if you don't know how many you'll use)

Another note: You can't use those iterators again. Once you've cycled through the elements... that's it. You'll need to re-declare it.

The RAM saving and deferred execution will probably come in handy in my always-RAM-strapped django apps =D

Site Search beyond keywords

This is going to be a quick one off, because I'm at work. But I'm putting a little time into our site search app (fixing a bug) and a though occurred to me.

Right now we are breaking the data into keywords, from various fields (title, description, tags, etc) and giving each keyword a weight on a per-object basis.

I had a new thought today. Why not log all the past search queries and explicitly check for the top 10% or so of them in every object? Then we could assign a keyword weight to the entire query and get a perfect match on those searches (instead of breaking the query into keywords and searching for objects that have all of them)

This seems like a reasonable way to make smart, phrase based searches rise to the top

PS - I may be able to get my django-search app into the open source, since I havn't found any really good, simple search apps to date :/

 
Please Upgrade to Firefox

Um... it looks like you're using Internet Explorer. No, no... there's nothing wrong with that, it's just that... well... it kind of sucks.

You don't have to upgrade for this site, but IE has a lot of problems. Please Upgrade to Firefox (don't worry it's free).

Subscribe