### Category Theory in Machine Learning

#### Posted by David Corfield

I was asked recently by someone for my opinion on the possibility that category theory might prove useful in machine learning. First of all, I wouldn’t want to give the impression that there are signs of any imminent breakthrough. But let me open a thread by suggesting one or two places where categories may come into play.

For other areas of computer science the task would be easier. Category theory features prominently in theoretical computer science as described in books such as Barr and Wells’ Category Theory for Computing Science. Then there’s Johnson and Rosebrugh’s work on databases.

As for machine learning itself, perhaps one of the most promising channels is through probability theory. One advantage of working with the Bayesian approach to machine learning is that it brings with it what I take to be more beautiful mathematics. Take a look at this paper on statistical learning theory. It belongs to the side of the cultural divide where category theory doesn’t flourish. If, on the other hand, we encounter mathematics of the culture I prefer, it is not unlikely that category theory will find its uses.

In a couple of posts (I and II) I discussed a construction of probability theory in terms of a monad. It struck me there that the natural inclination of the Bayesian to think about distributions over distributions fits this construction well. For example, the hot topic of Dirichlet processes are distributions over distributions.

Graphical models, which include directed graphs, are another hot topic. If we remember that a category is a kind of directed graph, perhaps something can be done here. Graphical models can also be blended with probabilities. I once tried to think of Bayesian networks, a result of this blend, as forming a symmetric monoidal category.

Another dimension to spaces of probability distributions is that they can be studied by differential geometry in a field known as information geometry. In this list there are some references to the use of information geometry in machine learning. As a distant propect, perhaps category theoretic aspects to differential geometry could come to play a role.

If Lemm were right,

Statistical field theories, which encompass quantum mechanics and quantum field theory in their Euclidean formulation, are technically similar to a nonparametric Bayesian approach,

and we’re right here about category theory and such field theories, perhaps something interesting could happen.

Another speculative thought was to tie the kernels appearing in machine learning to John’s *Tale of Groupoidification*. Perhaps this might be done to encode invariances more intelligently. Currently, RBF kernels get used a lot, even though they don’t encode your background knowledge well. For example, two images varying just in one pixel are close in the space of images, so if one is classified as a ‘3’, there is a high probability that the other is too. But shift an image two pixels to the right and the images are far apart in the space of images, so the kernel is agnostic about what the label of one images means for the other. One needs to encode this invariance in the kernel.

Two people who very much believe that the kernels used in machine learning are not the right ones for the tasks we need to perform in the world are Lecun and Bengio. The problem is with the shallowness of the architecture, they say here. Instead they advocated neural nets with deep architectures. These have that ‘catastrophic’ behaviour that small changes in the weights may lead to very different performance.

A neural net architecture is a mapping from space of weights to a certain space of functions, but this is not a 1-1 mapping. Many weight settings may correspond to the same function. Singularity theory can be used to study some of the properties, as Watanabe does. That’s the kind of mathematics where category theory should show.

There is also some work directly applying category theory to neural networks, such as here, but I haven’t followed it.

All in all, you can see that there are no sure fire bets. I very much doubt we’re as far advanced as a mathematical physicist wondering about categories in their field in 1993. If anyone else has some reasons to be optimistic, do let us know.

## Re: Category Theory in Machine Learning

Hi David.

This might be a somewhat silly question, but have people done much work on using the technology of type theory with machine learning? That’s generally where a lot of the uses of categories come in to CS, at least the kind of CS I seem to read about.

Also, I had never heard of information geometry before. The idea of the metric being related to the “amount of information” between two configurations is fun. I’m surprised I haven’t seen it come up more often in quantum mechanics.