Stephan Tulkens

Computational Psycholinguist

Fast topk in pytorch

I recently read an interesting paper by Gehrmann et al. in which the rank of the predictions of a language model is used as a feature vector to distinguish machine-generated from regular text. In implementing this method in pytorch, I ran into an interesting problem that I solved in a really slow way, and subsequently made faster. This blog post shows you how not to do it, and how you can make it faster.

Read More

Using itertools.product with dictionaries

In a previous post, I talked about using itertools.product with lists. In this post, I used a typical ML experiment as an example, and made a comparison with sklearn’s GridSearchCV. It occurred to me that GridSearchCV uses dictionaries, while my example only used lists, so in this post I will show you how to build a dictionary iterator using product.

Read More

Using itertools.product instead of nested for loops

In many programming situations, you will often have to compare each item in a list to each other item in a list, creating the well-known nested for-loop.

Read More

Parsing a spacy json to a tree

In a previous post, I discussed some easy functions to recursively deal with trees, which often comes up in NLP when trying to deal with parse trees.

Read More

Recursively dealing with trees

For a project, I wanted to do some work on composition over trees. A typical way to process a tree is to build a recursive function, i.e., a function that calls itself. The motivation behind using a recursive function is that all non-terminal nodes in a tree are themselves trees.

Read More

Getting spacy to only split on spaces

spacy is a fast and easy-to-use python package which can be used to quickly parse, tokenize, tag, and chunk with high accuracy across a variety of languages. I attempted to apply spacy to a NER task for which I had pre-tokenized data with gold standard BIO tags. Unfortunately, the default pipeline would still further tokenize some words, while actually I just needed it to split on space characters.

Read More

Poster: A Self-Organizing Model of the Bilingual Reading System

At AMLAP 2017, I gave a poster presentation on our model of word reading, which is currently christened Global Space (GS). On this page you can find the poster, as well as links to any follow-up research we have done on the model.

Read More

Talk: Lexical representations in theories of word reading

I recently gave a talk about lexical representations at a CoNGA meeting, which is a monthly meeting of a group of philosophers, neuroscientists, and computational linguists.

Read More