Sin number 2: too few optimized functions used
This one is pretty common especially for newcomers who have learnt enough about Python to be able to loop over data structures but haven’t yet realized how slow Python loops really are.
Whence the temptation?
If you’re fresh to Python, the temptation to create for
loops everywhere for lists and dictionaries is immense. Consider a fairly simple example, where we loop over 100000 random integers and we apply a simple mathematical operation to each of those integers. The most straightforward solution would be:
for i in random_integers:
out_list.append(perform_predefined_calculation(i))
(assumming here that perform_predefined_calculation
and random_integer
exist)
A slightly more Pythonic way to express this would be:
out_list = [perform_predefined_calculation(i) for i in random_integers]
Why is this a sin?
While it might be ok for small data structures, the problem with all kinds of loops in Python is that they’re pretty slow in comparison with functional constructs such as map
or filter
. Why is that? Python is an interpreted language, so every statement is evaluated at runtime, there is no compilation process. But functions like map
and filter
were actually implemented in C (well, at least if you’re using the most widespread CPython) and C code can be, well… pretty fast.
The test I’ve ran showed about 30% performance increase with a simple refactor from a for
loop to a list(map(f, x))
idiom.
You can do your own timings on your own and modify the examples as you please in this Python notebook.
How to recognize a sinner?
Our sinners will most likely be individuals less experienced with Python and can be recognized by profilic abuse of loops whenever working with iterable data structures. Every professional will also often use for
and less often while
, however performance-critical parts of code where a large amount of data is expected will almost always be brimming with the functional idioms.
How to repent?
- It’s okay to use
for
andwhile
but not where you need fast code. - Learn to use at least
map
,filter
andfunctools.reduce
. - Refactor code that iderates over large data structures to use the aforementioned functions.
In our particular example, the repentance is easy and gives us code nearly as pretty as the list comprehension used before:
out_list = list(map(perform_predefined_calculation, random_integers))
👀
map
appliesperform_predefined_calculation
function to each element of therandom_integers
list.map
is lazy so it will only run when we iterate over the result at some point. To load the output as a standard Python list, we can simply wrap it into alist()
call. The output will be thus exactly the same as originally but we’ll be able to obtain it much faster.