Recursively dealing with trees
For a project, I wanted to do some work on composition over trees. A typical way to process a tree is to build a recursive function, i.e., a function that calls itself. The motivation behind using a recursive function is that all non-terminal nodes in a tree are themselves trees.
Below is a simple example of such a recursive function:
def recursive_tree(tree):
"""
Recursively do nothing.
Parameters
----------
tree : list (of list)* of strings
A tree represented as lists. The terminals are assumed to be strings.
Example: (("dog", "cat"), "walked")
"""
if not isinstance(tree, (list, tuple)):
return "VISITED-{}".format(tree)
res = []
for x in tree:
res.append(recursive_tree(x))
return res
And an example of use:
a = recursive_tree((("dog", "cat"), "walked"))
>>> [['VISITED-dog', 'VISITED-cat'], 'VISITED-walked']
Here’s a more useful example:
import numpy as np
def recursive_compose(tree, embeddings, emb_size):
"""
Recursively compose a tree.
If a word is OOV (not in embeddings), we replace it with a zero vector.
Parameters
----------
tree : list (of list)* of strings
A tree represented as lists. The terminals are assumed to be strings.
Example: (("dog", "cat"), "walked")
embeddings : dict
A dictionary mapping from words to their word vectors.
emb_size : int
The embedding size.
"""
if not isinstance(tree, (list, tuple)):
return embeddings.get(tree, np.zeros(emb_size))
res = []
for x in tree:
res.append(recursive_compose(x, embeddings, emb_size))
return np.mean(res, 0)
This function recursively composes a parse tree applying the mean function to each sub-tree in some parse tree. A similar mechanism was used as a baseline in Socher et al. 2011, although they used binary parse trees.
Here’s an example:
# Random embeddings
emb_mtr = np.random.rand(2, 10)
# Transform them into a dictionary
embeddings = dict(zip(["dog", "cat"], emb_mtr))
a = recursive_compose((("dog", "cat"), "walked"), embeddings, 10)
>>> array([0.18, 0.19, 0.35, 0.34, 0.38, 0.23, 0.31, 0.29, 0.27, 0.2 ])
# a should be equal to np.mean(emb_mtr, 0) // 2 because "walked" is missing.
z = np.allclose(a, np.mean(emb_mtr, 0) / 2)
>>> True
So, there you have it, some simple functions to deal with parse trees represented as lists of lists.