Hacker News new | past | comments | ask | show | jobs | submit login
An invariant from category theory solves a problem in mathematical ecology [pdf] (ed.ac.uk)
124 points by peanutcrisis on July 23, 2023 | hide | past | favorite | 30 comments



The book that includes the results from these slides has broader scope, and also can be downloaded for free from arXiv: https://www.maths.ed.ac.uk/~tl/ed/

"The starting point is the connection between diversity and entropy. We will discover:

• how Shannon entropy, originally defined for communications engineering, can also be understood through biological diversity (Chapter 2);

• how deformations of Shannon entropy express a spectrum of viewpoints on the meaning of biodiversity (Chapter 4);

• how these deformations provably provide the only reasonable abundance-based measures of diversity (Chapter 7);

• how to derive such results from characterization theorems for the power means, of which we prove several, some new (Chapters 5 and 9).

Complementing the classical techniques of these proofs is a large-scale categorical programme, which has produced both new mathematics and new measures of diversity now used in scientific applications. For example, we will find: [...]"

"The question of how to quantify diversity is far more mathematically profound than is generally appreciated. This book makes the case that the theory of diversity measurement is fertile soil for new mathematics, just as much as the neighbouring but far more thoroughly worked field of information theory"


Ah I was wondering about that. Their formula look suspiciously like the definition of Renyi entropy.

I'm not too sure where the category theoretical stuff enters though. They mention that metric spaces have a magnitude, but their end result looks more like a channel capacity (with the confusion matrix being the probability to confuse one species with another). Which, you know, makes sense, if you've got 'N' signals but they're so easily confused with one another that you can only send 'n' signals worth of data then your channels are not too diverse.

They do mention that this is equivalent to some modified version of the category theoretical heuristic, but is that really interesting? The link to Euler characteristic is intriguing, but from the way they end up at their final definition I'm not sure if metric spaces are really the natural context to talk about these things. It almost feels like they've stepped over an enriched category that would provide a more natural fit.


Metric spaces are enriched categories. They are enriched over the positive reals. The 'hom' between a pair of points is then simply a number: their distance.


And, these non-negative real numbers, which are these homs, are “hom objects”, so regarded as objects in “the category with as objects the non-negative real numbers, and as morphisms, the ‘being greater than or equal to’ “ ? Is that right?

So, I guess, (\R_{>= 0}, >=, +, 0) is like, a monoidal category with + as the monoidal operation?

So like, for x,y,z in the metric space, the

well, from hom(x,y) and hom(y,z) I guess the idea is there is a designated composition morphism

from hom(x,y) monoidalProduct hom(y,z) to hom(x,z)

which is specifically,

hom(x,y)+hom(y,z) >= hom(x,z)

(I said designated, but there is only the one, which is just the fact above.)

I.e. d(x,y)+d(y,z) >= d(x,z)

(Note: I didn’t manage to “just guess” this. I’ve seen it before, and was thinking it through as part of remembering how the idea worked. I am commenting this to both check my understanding in case I’m wrong, and to (assuming I’m remembering the idea correctly) provide an elaboration on what you said for anyone who might want more detail.)


> are “hom objects”, so regarded as objects in “the category with as objects the non-negative real numbers, and as morphisms, the ‘being greater than or equal to’ “ ?

This works, but it's not quite what you want in most cases. There's a lot of stuff that requires you to enrich over a closed category, so instead we define `Hom(a,b)` to be `max(b - a, 0)` (which you can very roughly think of as replacing the mere proposition `a < b` with its "witnesses"). See https://www.emis.de/journals/TAC/reprints/articles/1/tr1.pdf for more.


Indeed they are. I'm saying it may not be the right context in this case.

At least what they seem to be doing has little to do with metrics, and a lot more to do with probability distributions.


It's not clear what you're seeking. Probabilities appear because the magnitude of a space is a way of 'measuring' it -- and thus magnitude is closely related to entropy. Of course, you can follow your nose and find your way beyond mere spaces, and this may lead you to the notion of 'magnitude homology' [1]. But it's not clear that this generalization is the best way to introduce the idea of magnitude to ecology.

[1] https://arxiv.org/abs/1711.00802


Why not define ecological diversity as number of distinct biological species living in the area?


This is precisely the question answered by the OP. The answer is, "because there is a whole spectrum of things you might mean by 'diversity', of which 'number of distinct species' is only one extremum".


And also, I assume, because the concept of "species" isn't all that well defined?


It is well defined: a group of living organisms consisting of similar individuals capable of exchanging genes or interbreeding


Ring species make your definition non-transitive. The same with species that can interbreed but exhibit hybrid breakdown.


I invite you to examine the notes of the international ornithological congress... The difference between species and subspecies is quite subtle, and subject to interpretation, because no one is really going to do the experiment to find out if two individuals of geographically district populations can actually still interbreed.


So if you have a few grams of soil and want to know how many species of micro organisms are in there, you're setting them up with dates to see which ones will end up breeding?


One of a number of definitions. It is one that allows lions and tigers to be the same species.


Does he provide an example of other definition of diversity that makes sense in biological context?


Yes, the two extremes are captured by the common metrics of "species richness" which is the pure "how many unique species are there", and "species evenness", which depends on how evenly distributed the species are. A community in which 99% of individuals are species A and the remaining 1% are from species B-G is exactly as species rich as a community in which there are equal numbers of individuals of each species, but it is much less even (and therefore, under one extreme of diversity, less diverse). In different contexts and for different ecological questions, these two different versions of diversity can matter more or less, and there are metrics which take both into account, but this is a fully generalized solution which shows you relative diversity along the entire spectrum from "all I care about is richness" to "all I care about is evenness".

-edit- by the way, since it may not be obvious to everyone, the reason why an ecologist might care bout evenness is because extremely rare species are often not very important to the wider community. From an ecological function perspective, there is very little difference between my above example of the 99%/1% community and a community that is 100% species A. So an community with two, equally populous species might have more functional diversity than a community with one very abundant species and several more, very rare species.


I found the paper easier to follow than the slides: https://www.maths.ed.ac.uk/~tl/mdiss.pdf

(Less emphasis of the category theory, and more attention to the basic math behind entropy-like diversity measures)

(TLDR) There are two important aspects:

1. Generalize diversity measures to when the categories are not fully distinct (as assumed for the Shannon entropy calculation) but have similarities parametrized in a "Z"-matrix here.

2. A parameter "q" to represent whether you value a category highly (for diversity purposes) even when it has only a single distinct example highly (q --> 0) or whether you value it highly only when it has many many examples (q --> infinity)

* With Z = identity matrix (categories completely distinct) they show how different values of q reproduce different measures that have been considered before. (nicely summarized in a table)

* The generalization (parameterized by Z) when the categories can/do have some overlap is very elegant, and feels like an important step forward. (especially how it makes the measure robust to how we partition a bunch of examples into distinct categories, so long as we keep track of the similarity between the categories)

* The paper also summarizes a bunch of sensible properties that we would want any diversity measure to satisfy (like the one I just mentioned above).

Fun stuff!


This presentation is getting into the idea of "magnitude" the invariant from category theory and how it can be used in mathematical ecology, especially when we're trying to max out diversity. It's put forward as a possible answer to a head-scratcher in mathematical ecology: How do we find the maximum diversity if we're given a list of n species and a similarity matrix?

The presentation gives us a whole range of views on biodiversity, making it clear that we need to think about both common and rare species when we're figuring out diversity. It hints that this magnitude idea might give us a more detailed picture of diversity by considering how important different species are relative to each other. Then the presentation gets into the weeds of category theory, chatting about enriched categories and size-like invariants. It talks about stuff like monoidal categories, V-enriched categories, and linear categories. These ideas are brought up as tools to help us understand and calculate magnitude.

The presentation also explores how these category theory ideas relate to metric spaces. It suggests that getting a handle on this relationship could give us even more insight into how to max out diversity. Lastly, the presentation brings up the Euler characteristic, which is a concept from algebraic topology. It hints that the Euler characteristic and magnitude are pretty tight, and understanding this connection could give us even more insight into how to max out diversity.

So, to wrap it up, the presentation is a thorough look at how category theory ideas, especially magnitude, can be used to tackle problems in mathematical ecology. It suggests that these ideas can give us a more detailed understanding of diversity and might even help us figure out how to get the most diversity.

References:

https://arxiv.org/abs/2012.02113

https://arxiv.org/abs/1606.00095


I am actually sympathetic to category theory, and the research at hand. That being said, what was presented might be interesting if left in terms of linear algebra or graph theory (both legitimate fields). Trying to put a category theory gloss on it just makes it look hollow.


It’s literally the result of a category theorist “following his nose” and solving a problem - mathematical ecologists are perfectly capable of doing some graph theory/linear algebra!


Amazing to see 'magnitude' on the front page of Hacker News! If you are interested in a direct application of this invariant beyond ecology, check out our recent pre-print in which we study the generalisation behaviour of neural networks: https://arxiv.org/abs/2305.05611

My personal approach to magnitude is not based on category theory but rather based on weightings of a metric space. If your metric satisfies certain properties, you can obtain a measure of the 'effective number of points' of a metric space. This is particularly relevant when looking at the metric space from different scales---zooming in gives you a lot of disconnected points, while zooming out gives you clusters. Magnitude then captures the changes in the number of points in a principled manner.


An interesting result, but saying that “the maximum diversity problem is completely solved by an invariant that comes from category theory” sounds like a parody. In the end it is just a mathematical result with a biological metaphor. But who knows, maybe in near future it will be used for solving diversity problems of boards of directors.


This is an interesting talk! I love reading about applications of math/computer science to ecology. The parent page has links to relevant papers that are worth reading, too. [0] The species similarity paper has some concrete examples on coral, butterflies, and gut microbiomes that I felt missing from the slides. [1]

[0] https://www.maths.ed.ac.uk/~tl/genova/

[1] https://www.maths.ed.ac.uk/~tl/mdiss.pdf


Wow. The fact that there is an objective answer that is independent of any perspective on the importance of rare species is a rare gift, at least for this part of the problem.

Some questions and thoughts.

It seems that the result could vary based on how you construct the similarity matrix Z, e.g. is it purely taxonomic? or does it try to account for the ecological roles that a species is playing in the community, etc.

A seeming limitation is that the optimization works only for a fixed set of n species. While it is useful for managing existing communities, it means that there is still a question of whether larger n is strictly better, and leaves open questions of how to deal with transient or migratory members (if the community is spatially bound).

The answer I think, is that it depends on how the similarity matrix is constructed. If every species is fully dissimilar then increasing n is always a good thing. If you use niche space to construct it and new species do not some add or enter new niches so they overlap with others, then they will be close to another species in the matrix and increasing n will not have much impact. On the other hand if you use a purely taxonomic approach then you wind up balancing the number of birds and mammals regardless of niche.

It is not clear to me whether it is possible to construct a similarity matrix that can account for the interaction between n, the carrying capacity of the ecosystem, and the number of available niches (or the ability of species to create new niches). By analogy if you have a stream (sunlight) powering water wheels, how many wheels and how many levels of gears (layers in the ecosystem) can be added, created, and/or sustained? At what point does adding an additional species mean that either two species are forced to be close together in the similarity matrix or both their populations must shrink in size because they must compete for the same energy sources?

Does the model sometimes produce impractical results, e.g. that it is good to have a single member of a sexually reproducing species (this is probably an orthogonal concern and you would want to scale to real population sizes such that the minimum corresponded to the smallest viable a self sustaining population)?

Is there evidence that maximizing diversity using this measure actually produces more robust and stable ecologies?


I do wonder whether counting artificial categories is the right measure.

Let's take the mammoth - it's basically a hairy elephant and the genetic material that could make one is 99+% present in Asian elephants ( possibly 100%, just not all in the same individual ) today.

A bit of selective breeding and you could probably produce one quite quickly [1]

ie the true measure of diversity is the pool of genetic diversity out there - rather than some arbitrary classifications.

[1] Look at the variety in shape and form of domestic dogs - they all came from the same stock a mere 20-40K years ago.


I found that much more useful and convincing:

http://www.loujost.com/Statistics%20and%20Physics/Diversity%...


Can someone tl;dr?


These two talks are for a less mathematical audience.

The first explains the problem of measuring ecological diversity, and the problem with old, traditional measures like Shannon entropy or Gini-Simpson index. The second introduces the species similarity matrix Z and the viewpoint parameter q.

https://www.maths.ed.ac.uk/~tl/riken/

Now you know how to measure the diversity of a given community.

But if you are given the number of species and their similarity matrix Z, and the viewpoint parameter q, if you get to design a community (decide the relative abundance of each species), how would you maximize the diversity? Which distribution of relative abundances will maximize the measured diversity? The posted talk gives the answer, and the surprising result that the answer does not depend on q.


This is the most confusing possible title. Why is it written this way?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: