Stéphan Tulkens

NLP Person

Recursively dealing with trees

For a project, I wanted to do some work on composition over trees. A typical way to process a tree is to build a recursive function, i.e., a function that calls itself. The motivation behind using a recursive function is that all non-terminal nodes in a tree are themselves trees.

Below is a simple example of such a recursive function:

def recursive_tree(tree):
    """
    Recursively do nothing.

    Parameters
    ----------
    tree : list (of list)* of strings
        A tree represented as lists. The terminals are assumed to be strings.
        Example: (("dog", "cat"), "walked")

    """
    if not isinstance(tree, (list, tuple)):
        return "VISITED-{}".format(tree)
    res = []
    for x in tree:
        res.append(recursive_tree(x))
    return res

And an example of use:

a = recursive_tree((("dog", "cat"), "walked"))
>>> [['VISITED-dog', 'VISITED-cat'], 'VISITED-walked']

Here’s a more useful example:

import numpy as np

def recursive_compose(tree, embeddings, emb_size):
    """
    Recursively compose a tree.

    If a word is OOV (not in embeddings), we replace it with a zero vector.

    Parameters
    ----------
    tree : list (of list)* of strings
        A tree represented as lists. The terminals are assumed to be strings.
        Example: (("dog", "cat"), "walked")
    embeddings : dict
        A dictionary mapping from words to their word vectors.
    emb_size : int
        The embedding size.

    """
    if not isinstance(tree, (list, tuple)):
        return embeddings.get(tree, np.zeros(emb_size))
    res = []
    for x in tree:
        res.append(recursive_compose(x, embeddings, emb_size))
    return np.mean(res, 0)

This function recursively composes a parse tree applying the mean function to each sub-tree in some parse tree. A similar mechanism was used as a baseline in Socher et al. 2011, although they used binary parse trees.

Here’s an example:

# Random embeddings
emb_mtr = np.random.rand(2, 10)
# Transform them into a dictionary
embeddings = dict(zip(["dog", "cat"], emb_mtr))
a = recursive_compose((("dog", "cat"), "walked"), embeddings, 10)
>>> array([0.18, 0.19, 0.35, 0.34, 0.38, 0.23, 0.31, 0.29, 0.27, 0.2 ])
# a should be equal to np.mean(emb_mtr, 0) // 2 because "walked" is missing.
z = np.allclose(a, np.mean(emb_mtr, 0) / 2)
>>> True

So, there you have it, some simple functions to deal with parse trees represented as lists of lists.