Stephan Tulkens

Fast topk in pytorch

I recently read an interesting paper by Gehrmann et al. in which the rank of the predictions of a language model is used as a feature vector to distinguish machine-generated from regular text. In implementing this method in pytorch, I ran into an interesting problem that I solved in a really slow way, and subsequently made faster. This blog post shows you how not to do it, and how you can make it faster.

Using itertools.product with dictionaries

In a previous post, I talked about using itertools.product with lists. In this post, I used a typical ML experiment as an example, and made a comparison with sklearn’s GridSearchCV. It occurred to me that GridSearchCV uses dictionaries, while my example only used lists, so in this post I will show you how to build a dictionary iterator using product.

Using itertools.product instead of nested for loops

In many programming situations, you will often have to compare each item in a list to each other item in a list, creating the well-known nested for-loop.

Parsing a spacy json to a tree

In a previous post, I discussed some easy functions to recursively deal with trees, which often comes up in NLP when trying to deal with parse trees.

Recursively dealing with trees

For a project, I wanted to do some work on composition over trees. A typical way to process a tree is to build a recursive function, i.e., a function that calls itself. The motivation behind using a recursive function is that all non-terminal nodes in a tree are themselves trees.

Getting spacy to only split on spaces

spacy is a fast and easy-to-use python package which can be used to quickly parse, tokenize, tag, and chunk with high accuracy across a variety of languages. I attempted to apply spacy to a NER task for which I had pre-tokenized data with gold standard BIO tags. Unfortunately, the default pipeline would still further tokenize some words, while actually I just needed it to split on space characters.

Poster: A Self-Organizing Model of the Bilingual Reading System

At AMLAP 2017, I gave a poster presentation on our model of word reading, which is currently christened Global Space (GS). On this page you can find the poster, as well as links to any follow-up research we have done on the model.

Talk: Lexical representations in theories of word reading

I recently gave a talk about lexical representations at a CoNGA meeting, which is a monthly meeting of a group of philosophers, neuroscientists, and computational linguists.