Stéphan Tulkens

NLP Person

Poster: A Self-Organizing Model of the Bilingual Reading System

At AMLAP 2017, I gave a poster presentation on our model of word reading, which is currently christened Global Space (GS). On this page you can find the poster, as well as links to any follow-up research we have done on the model.

The poster can be found here

The Idea

The main idea behind GS is that we should make as few assumptions about the word reading process as possible, something which is also known as Occam’s razor.

In the case of GS, the specific area we are targeting with our razor is the notion of representation used in models of word reading.

Previous models have assumed that if some factor influences word recognition, it is therefore also represented in our mental lexicon. For example, if a model assumes that phonological information, e.g. two words sounding alike, influences reading times of these words, this phonological information will be included in the lexicon as a separate representation. Therefore, homophones co-activate because their identical phonological representations are activated by a single orthographic representation. The pattern of performance seen in human subjects in priming and lexical decision experiments is then presupposed to be the result of the co-activation and competition between all sorts of different representations. This becomes problematic when one asks what causes a word to be read; if one assumes that orthographical and phonological information all influence word reading in separate lexicons, who judges which information is combined, and when the combined evidence is strong enough?

In GS, we do away with the assumption that phonology and orthography are represented in separate modules, and represent a word as the concatenation of its orthographic and phonological vectors. This assumes that words are represented in a single space, hence the name Global Space. During reading, only the orthographic parts of the vector are displayed, and the phonological part is inferred from its orthographic counterpart. We can do this because the representations in our model are transparent, they are in imperfect copies of things found in the world. It is in fact this imperfection which we think can explain some of the things we see in experiments.

Instead of regarding the pattern of performance of human subjects as being the result of competition and co-activation across different levels of representation, we think it might be the result of competition for limited storage space.

The implementation

To test our hypothesis, we used a Self-Organizing Map (SOM) to cluster 108 Dutch and English words, which, as said before, consist of the concatenation of orthographic and phonological representations.

A SOM consists of a certain amount of neurons, arranged in a grid, where each neuron has a weight vector, the dimensionality of which is equal to the input dimensionality. So, unlike normal neural networks, a SOM does not transform its input into opaque hidden states, but into inexact (quantized) representations of the input. SOM training consists of finding a set of approximate representations which most closely approximate the training set.

An important constraint on a SOM is that, unlike in other Vector Quantizers, similar neurons need to be similar together. SOM learning can therefore be constructed as a two-stage process: given an input, first make the most similar neuron to the input more similar to the input, and then make the surrounding neurons on the map more similar to this input, as a function of their distance to the best matching neuron.

Note that, given this description, there are two interpretations of SOM clustering.

In the first interpretation, a SOM is a clustering mechanism analogous to k-means; every input presented to the SOM receives a single neuron as its cluster. In a second interpretation, a SOM is a neural network which produces distributed representations; here every input gets a vector of euclidean distances to each neuron as its activation.

We trained the map for 100 epochs on a set of 10,000 words, which were sampled proportionally from the set of 108 words using CELEX frequencies. We used a learning rate of 1.0, a 25 x 25 grid and used the standard SOM algorithm, as implemented in SOMBER.

Initial results using standard SOM clustering showed that very similar words, e.g. found and hound, were clustered to the same neuron, making them indistinguishable by the model, which is undesirable. On top of this, a SOM does not possess a time dependent component, making it hard to find a good analogue for the time-dependent reaction time measurements often used in psycholinguistic experiments.

While using the error, e.g. the difference between the vector and the most similar neuron, as a stand-in for RT might look like a feasible option, it does not involve any time dependence, and hence glosses over the actual procedure humans perform when they read.

Introducing time

We devised a time-dependent procedure which simultaneously removes the problem of different words being indistinguishable because they cluster to the same neuron. This time-dependence procedure is heavily based on ideas from Dynamic Systems Theory (DST).

In DST, cognition is often thought of as a point moving through an extremely high-dimensional space. The current state of the cognitive system is then nothing more than the position of the point in this high-dimensional space.

In GS, the state of the system is modeled as a vector s, which contains the activation of each of the N neurons of a trained SOM. Presenting input the SOM causes the neurons, and hence the state of the system, to change.

On presentation of an input to the map, neurons are activated or suppressed, depending on their similarity to the input. Concurrently, the current state vector s is also fed back into the system. Thus, the next position of the system depends on the current state of the system as well as the current input.

This procedure endows the SOM with both time dependence, since we can measure the time words take until convergence, and greater expressive power, since words are not longer assigned to a single cluster.