Replacing a stack with concurrency

In which I make a data structure obsolete

April 23, 2008
NLP Erlang

For some language processing task I needed a reasonably powerful parser (a program to identify the syntactic structure of a sentence). So I dug out my copy of Winograd (1983) (Language as a Cognitive Process) and set about implementing an Augmented Transition Network parser in Erlang.

Now, the first thing you learn about natural language is that it is full of ambiguities, and so there will always be several alternatives available, several possible paths through the network which defines the grammar. The traditional solution is to dump all the alternatives on a stack, and look at them when the current path has been finished with. You can either go depth-first, where you complete the current path before you get the next one off the stack, or breadth-first, where you advance all paths by one step at a time, kind of pseudo-parallel.

Having to deal with a stack is tedious, as you need to keep track of the current configuration: which network are you at, what node, what position in the sentence, etc. But then, it occurred to me, there’s an easier way to do it (at least it’s easier in Erlang!): every time you come to a point where you have multiple alternatives, you spawn a new process and pursue all of them in parallel.

The only overhead you need is a loop which keeps track of all the processes currently running. This loop receives the results of successful paths, and gets notified of unsuccessful ones (where the process terminates without having found a valid structure). No need for a stack, and hopefully very efficient processing on multi-core machines as a free side-effect.

I’m still amazed how easy it was to implement. I wouldn’t have fancied doing that in Java or even C. For my test sentences I had about 8 to 10 processes running in parallel most of the time, but it depends on the size of the grammar and the length of the sentence really. What I liked about this was that it seemed the natural way to do in Erlang, where working with processes is just so easy.

And also, another nail in the coffin for the claim that you can’t use Erlang for handling texts easily!