I recently read an interesting paper by Gehrmann et al. in which the rank of the predictions of a language model is used as a feature vector to distinguish machine-generated from regular text. In implementing this method in pytorch, I ran into an interesting problem that I solved in a really slow way, and subsequently made faster. This blog post shows you how not to do it, and how you can make it faster.
In a previous post, I talked about using
itertools.product with lists.
In this post, I used a typical ML experiment as an example, and made a comparison with sklearn’s
It occurred to me that
GridSearchCV uses dictionaries, while my example only used lists, so in this post I will show you how to build a dictionary iterator using
In many programming situations, you will often have to compare each item in a list to each other item in a list, creating the well-known nested for-loop.
In a previous post, I discussed some easy functions to recursively deal with trees, which often comes up in NLP when trying to deal with parse trees.
For a project, I wanted to do some work on composition over trees. A typical way to process a tree is to build a recursive function, i.e., a function that calls itself. The motivation behind using a recursive function is that all non-terminal nodes in a tree are themselves trees.
spacy is a fast and easy-to-use python package which can be used to quickly parse, tokenize, tag, and chunk with high accuracy across a variety of languages. I attempted to apply spacy to a NER task for which I had pre-tokenized data with gold standard BIO tags. Unfortunately, the default pipeline would still further tokenize some words, while actually I just needed it to split on space characters.
At AMLAP 2017, I gave a poster presentation on our model of word reading, which is currently christened Global Space (GS). On this page you can find the poster, as well as links to any follow-up research we have done on the model.
I recently gave a talk about lexical representations at a CoNGA meeting, which is a monthly meeting of a group of philosophers, neuroscientists, and computational linguists.