- cProfile - the application that collects profiling data
- pstats - the application that makes the profling data human readable
I'll show an example of how you can use the profiler.
Say I need to calculate the sum of all odd numbers from zero to an arbitrary positive value. My initial code might end up something like this:
def odd_numbers(max): """ Returns a list with all odd numbers between 0 to max (inclusive) """ l = list() for i in xrange(max+1): if (i & 1): l.append(i) return l def sum_odd_numbers(max): """ Sum all odd numbers between 0 to max (inclusive) """ odd_nbrs = odd_numbers(max) res = 0 for odd in odd_nbrs: res += odd return res def main(): # Run this 100 times to make it measurable for i in xrange(100): print sum_odd_numbers(1024) if __name__ == '__main__': main()Now I want to find out where my code spend most if its time to help me optimize the code if possible. To profile this snippet I run:
$ python -m cProfile sum_odd.pyThis will output some statistics about the code (try it), but I'll show you a more handy way to browse and examine the profile dump.
$ python -m cProfile -o profile_dump sum_odd.pyThis will output the profiling statistics to a file (in non-human readable format) which can be loaded and examined with pstats. Start pstats and browse the profile dump:
$ python -m pstats Welcome to the profile statistics browser. % help Documented commands (type help topic): ======================================== EOF add callees callers quit read reverse sort stats strip Undocumented commands: ====================== help % read profile_dump profile_dump% stats Tue Dec 8 20:55:41 2009 profile_dump 51405 function calls in 0.186 CPU seconds Random listing order was used ncalls tottime percall cumtime percall filename:lineno(function) 51200 0.082 0.000 0.082 0.000 {method 'append' of 'list' objects} 100 0.099 0.001 0.181 0.002 main.py:1(odd_numbers) 1 0.000 0.000 0.185 0.185 main.py:1(module) 1 0.000 0.000 0.186 0.186 {execfile} 100 0.004 0.000 0.184 0.002 main.py:9(sum_odd_numbers) 1 0.000 0.000 0.186 0.186 string:1(module) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 1 0.001 0.001 0.185 0.185 main.py:18(main) profile_dump%This will of course give the same information as if you just executed cProfile without specifying an output file. The advantage of using pstats interactively is that you can view the data in different ways.
Now I want to find out in which function we spend most time. This can be done by using the sort command:
profile_dump% sort Valid sort keys (unique prefixes are accepted): stdname -- standard name nfl -- name/file/line pcalls -- call count file -- file name calls -- call count time -- internal time line -- line number cumulative -- cumulative time module -- file name name -- function name profile_dump% sort time profile_dump% stats Tue Dec 8 20:55:41 2009 profile_dump 51405 function calls in 0.186 CPU seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 100 0.099 0.001 0.181 0.002 main.py:1(odd_numbers) 51200 0.082 0.000 0.082 0.000 {method 'append' of 'list' objects} 100 0.004 0.000 0.184 0.002 main.py:9(sum_odd_numbers) 1 0.001 0.001 0.185 0.185 main.py:18(main) 1 0.000 0.000 0.186 0.186 {execfile} 1 0.000 0.000 0.185 0.185 main.py:1(module) 1 0.000 0.000 0.186 0.186 string:1(module) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}Nice! We can see that most time is spent in the odd_numbers function. The time key specifies that we would like to sort the data by the time spent in a function (exclusive calls to other functions).
Time to optimize, change the odd_numbers function to the following snippet:
def odd_numbers(max): """ Returns a list with all odd numbers between 0 to max (inclusive) """ return [i for i in xrange(max+1) if (i & 1)]Now profile the code and load the dump in pstats:
profile_dump% stats Tue Dec 8 21:20:19 2009 profile_dump 205 function calls in 0.020 CPU seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 100 0.015 0.000 0.015 0.000 main.py:1(odd_numbers) 100 0.004 0.000 0.019 0.000 main.py:5(sum_odd_numbers) 1 0.001 0.001 0.020 0.020 main.py:14(main) 1 0.000 0.000 0.020 0.020 {execfile} 1 0.000 0.000 0.020 0.020 main.py:1(module) 1 0.000 0.000 0.020 0.020 string:1(module) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}Wow! Not that bad, we decreased the number of function calls from 51405 to 205. We also decreased the total time spent in the application from 0.186 to 0.020 CPU seconds by writting proper Python code :)
maybe you already know that, but the sum of all the odd numbers from zero to m is equal to (m+1)^2, just ask wolframalpha.com ("sum 2*j+1, j=0..m")
ReplyDeletehi, what about
ReplyDeleteodd_numbers = xrange(1, max, 2)
Sir.rainbow and Joaquin: Oh boy, I knew you would comment on my odd_numbers thing :) I know, there are several 'better' ways to do this. My intention was just to show how easy it is to profile python code.
ReplyDeleteAnyways, thanks for the suggestions.
I tried xrange and it gave me 205 function calls in 0.005 CPU seconds, nice!
Well, as expected, using the (m+1)^2 expression gave best result, 105 function calls in 0.001 CPU seconds
Nice info.
ReplyDeleteI just got rid of 5000 useless calls to format method
and another 5000 calls to len function
My function iterated on a predefined list. On each iteration after some checking i overwrite some items of a list with
listA[point] = "string {0} ".format( expression with len function)
based on some conditions to ensure that undesirable values will be kept back in the list as a True or False string
After i got the pstats i simply make only once a temporary variable for len and then I used
listA[point] = "string %s" %( less cpu-intensive expression )
Thanks Dude :-)
Great! This was exactly my intention with the post, to show how easy it is to see where the code sucks :)
ReplyDeleteNice one:)
ReplyDelete