Showing posts with label profiling. Show all posts
Showing posts with label profiling. Show all posts

Tuesday, January 5, 2010

Let's timeit!

Every now and then you might want to time snippets just to make sure that you choose the more efficient solution. In those cases you can use the timeit module to measure execution time for snippets.

Introduction
It's very easy to setup and measure the execution time for a snippet with timeit. The module contains a class, Timer, which is used to perform the measurement. The class has one constructor and three methods:
  • Timer([stmt='pass'[, setup='pass'[, timer=<timer function>]]]) - stmt is the statement to be timed and setup is called once before executing the main statement. A timer function can be specified and default is time.time() for all platforms but windows which is set to time.clock() instead (according to my timeit.py)
  • timeit([number=1000000]) - Executes the main statement passed to the constructor number of times and returns the result in seconds as a float.
  • repeat([repeat=3[, number=1000000]]) - Convenience function that calls timeit(number) repeat times. Returns a list with the results.
  • print_exc([file=None]) - Helper to print a traceback from the timed snippet.
Starting with Python 2.6 the timeit module also defines two convenience functions, timeit.timeit() and timeit.repeat(). They are basically wrappers around the Timer class.

Example
Suppose that I would like to create a list containing 100 'c':s like this ['c', 'c', ...]. There are at least two ways of doing this:
lst = ['c'] * 100
# or
lst = ['c' for i in xrange(100)]
Which one should I choose? Well, let's execute both statements with timeit and measure the execution time.
>>> import timeit
>>> t = timeit.Timer(stmt="lst = ['c'] * 100")
>>> print t.timeit()
1.10580182076
>>> t = timeit.Timer(stmt="lst = ['c' for x in xrange(100)]")
>>> print t.timeit()
7.66900897026
Ok, I think I'll stick with the first snippet :)

The result returned is the total execution time in seconds. In this particular case when we are executing the snippet 1000000 times the result is also the execution time in microseconds for one single pass (1000000*exe_time/1000000 == exe_time).

Normally, the timeit module doesn't have access to things that you have defined in your module. If you would like to measure a function that you have defined in your module you can specify the import statement in the setup parameter:
>>> def create_lst(size):
...    return ['c'] * size
...
>>> t = timeit.Timer(stmt="create_lst(100)", setup="from __main__ import create_lst")
>>> print t.timeit()
1.21339488029
This will introduce a little overhead since the create_lst() function is called in the measurement loop instead of just executing an inlined snippet.

Note: Timer.timeit() will by default disable garbage collection during timing. To enable GC you can pass 'gc.enable()' as a setup statement.

I find the timeit module as a simple and convenient way to measure execution time for small snippets.

Tuesday, December 8, 2009

Profiling your Python code

Python provides support for deterministic profiling of your application. It's very easy to setup and use. To start profiling your code, two python modules are used:
  •  cProfile - the application that collects profiling data
  • pstats - the application that makes the profling data human readable
You can read more about Python profiling stuff at The Python Profilers page.

I'll show an example of how you can use the profiler.

Say I need to calculate the sum of all odd numbers from zero to an arbitrary positive value. My initial code might end up something like this:
def odd_numbers(max):
    """ Returns a list with all odd numbers between 0 to max (inclusive) """
    l = list()
    for i in xrange(max+1):
        if (i & 1):
            l.append(i)
    return l

def sum_odd_numbers(max):
    """ Sum all odd numbers between 0 to max (inclusive) """
    odd_nbrs = odd_numbers(max)

    res = 0
    for odd in odd_nbrs:
        res += odd
    return res

def main():
    # Run this 100 times to make it measurable
    for i in xrange(100):
        print sum_odd_numbers(1024)

if __name__ == '__main__':
    main()
Now I want to find out where my code spend most if its time to help me optimize the code if possible. To profile this snippet I run:
$ python -m cProfile sum_odd.py
This will output some statistics about the code (try it), but I'll show you a more handy way to browse and examine the profile dump.
$ python -m cProfile -o profile_dump sum_odd.py
This will output the profiling statistics to a file (in non-human readable format) which can be loaded and examined with pstats. Start pstats and browse the profile dump:
$ python -m pstats
Welcome to the profile statistics browser.                    
% help                                                        

Documented commands (type help topic):
========================================
EOF  add  callees  callers  quit  read  reverse  sort  stats  strip

Undocumented commands:
======================
help

% read profile_dump
profile_dump% stats
Tue Dec  8 20:55:41 2009    profile_dump

         51405 function calls in 0.186 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    51200    0.082    0.000    0.082    0.000 {method 'append' of 'list' objects}
      100    0.099    0.001    0.181    0.002 main.py:1(odd_numbers)
        1    0.000    0.000    0.185    0.185 main.py:1(module)
        1    0.000    0.000    0.186    0.186 {execfile}
      100    0.004    0.000    0.184    0.002 main.py:9(sum_odd_numbers)
        1    0.000    0.000    0.186    0.186 string:1(module)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.001    0.001    0.185    0.185 main.py:18(main)


profile_dump%
This will of course give the same information as if you just executed cProfile without specifying an output file. The advantage of using pstats interactively is that you can view the data in different ways.

Now I want to find out in which function we spend most time. This can be done by using the sort command:

profile_dump% sort
Valid sort keys (unique prefixes are accepted):
stdname -- standard name
nfl -- name/file/line
pcalls -- call count
file -- file name
calls -- call count
time -- internal time
line -- line number
cumulative -- cumulative time
module -- file name
name -- function name
profile_dump% sort time
profile_dump% stats
Tue Dec  8 20:55:41 2009    profile_dump

         51405 function calls in 0.186 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      100    0.099    0.001    0.181    0.002 main.py:1(odd_numbers)
    51200    0.082    0.000    0.082    0.000 {method 'append' of 'list' objects}
      100    0.004    0.000    0.184    0.002 main.py:9(sum_odd_numbers)
        1    0.001    0.001    0.185    0.185 main.py:18(main)
        1    0.000    0.000    0.186    0.186 {execfile}
        1    0.000    0.000    0.185    0.185 main.py:1(module)
        1    0.000    0.000    0.186    0.186 string:1(module)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
Nice! We can see that most time is spent in the odd_numbers function. The time key specifies that we would like to sort the data by the time spent in a function (exclusive calls to other functions).

Time to optimize, change the odd_numbers function to the following snippet:
def odd_numbers(max):
    """ Returns a list with all odd numbers between 0 to max (inclusive) """
    return [i for i in xrange(max+1) if (i & 1)]
Now profile the code and load the dump in pstats:
profile_dump% stats
Tue Dec  8 21:20:19 2009    profile_dump

         205 function calls in 0.020 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      100    0.015    0.000    0.015    0.000 main.py:1(odd_numbers)
      100    0.004    0.000    0.019    0.000 main.py:5(sum_odd_numbers)
        1    0.001    0.001    0.020    0.020 main.py:14(main)
        1    0.000    0.000    0.020    0.020 {execfile}
        1    0.000    0.000    0.020    0.020 main.py:1(module)
        1    0.000    0.000    0.020    0.020 string:1(module)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
Wow! Not that bad, we decreased the number of function calls from 51405 to 205. We also decreased the total time spent in the application from 0.186 to 0.020 CPU seconds by writting proper Python code :)