Using itertools.product instead of nested for loops
In many programming situations, you will often have to compare each item in a list to each other item in a list, creating the well-known nested for-loop.
results = []
for x in list_of_items:
for y in another_list_of_items:
results.append(func(x, y))
This is alright! But things get ugly really fast once you involve more than a single list.
result = []
for x in list_of_items:
for y in another_list_of_items:
for z in yet_another_list_of_items:
for a in this_list_i_found:
results.append(func(x, y, z, a))
Nested for loops are especially annoying if you want to respect PEP-8’s recommended line length, which is 79 characters (🤯🤯🤯).
One place where this frequently comes up in my experience is automation code for experiments, where you exhaustively iterate over a bunch of parameters, and then run some code multiple times to average over the scores. This might look something like this:
import numpy as np
import json
param_1 = np.array([0, 1, 2, 3])
param_2 = np.array([4, 5, 6, 7])
param_k = np.array([0, 1])
num_iters = 10
if __name__ == "__main__":
results = {}
for x in param_1:
for y in param_2:
for k in param_k:
for i in range(num_iters):
results[(x, y, k, i)] = experiment(x, y, k)
json.dump(results, open("results.json", "w"))
Note that this use-case is roughly equivalent to what you would do using sklearn’s GridSearchCV module, and if you are using an sklearn compatible workflow, I highly recommend you use their stuff. It is wonderful.
But in some cases, you still want to have multiple parameters in multiple for-loops.
In this case you can use itertools.product
.
product
simply takes as input multiple iterables, and then defines a generator over the cartesian product of these iterables.
from itertools import product
a = [1, 2, 3]
b = [4, 5]
c = list(product(a, b))
print(c)
>>> [(1, 4), (1, 5), (2, 4), (2, 5), (3, 4), (3, 5)]
Note that this is exactly equivalent to a nested for loop, except that it takes up way fewer lines. This is especially apparent when you use more than three iterables.
from itertools import product
a = [1, 2, 3]
b = [4, 5]
c = [7, 8, 9]
d = [10, 11, 12]
e = [4, 5, 1]
f = list(product(a, b, c, d, e))
Here’s another example, showing the workflow in experiment automation, using the same example as before.
import numpy as np
import json
from itertools import product
param_1 = np.array([0, 1, 2, 3])
param_2 = np.array([4, 5, 6, 7])
param_k = np.array([0, 1])
num_iters = 10
if __name__ == "__main__":
results = {}
for x, y, k, i in product(param_1,
param_2,
param_k,
range(num_iters)):
results[(x, y, k, i)] = experiment(x, y, k)
json.dump(results, open("results.json", "w"))
This produces statements which are more readable, and which are not in danger of going over the character limit.
If you use product
, and are not afraid to do a little bit of functional programming, you can also use it in conjunction with python’s built-in map
.
from itertools import product
def simple_func(x, y, z):
return (x ** 3) + (y ** 2) + z
a = [1, 2, 3, 4]
b = [1, 2, 3, 4]
c = [1, 2, 3, 4]
if __name__ == "__main__":
result = list(map(simple_func, *zip(*product(a, b, c))))
Note the use of the unpacking operators *
.
We need these because otherwise the output of product
is interpreted as a single tuple of three values.
Normally, we would unpack this tuple with *
, but unpacking product
leads to the unpacking of the entire output of product
, which is in this case a set of 64 triples! This is not what the function (or you, as a user, I guess), expects.
zip
, together with the first unpacking operator, turns the output of product
into a tuple of three tuples, each of which have the same length, in this case 64.
Applying the *
operator again unpacks each of these into 64 unpacked tuples, each of which contains three separate values, which are then passed into the function.