Hacker News new | past | comments | ask | show | jobs | submit login
Category Theory for Programmers (2014) [pdf] (github.com/hmemcpy)
403 points by furcyd on Dec 24, 2019 | hide | past | favorite | 129 comments



Can someone please explain to me the excitement for category theory in this forum?

I've gone through the first 10 chapters + exercises of this book, and truth be told the content is interesting but the ROI is low for programming (FWIW I started this book as part of a study group in a large unicorn co, and after approximately one month I was the only person left grinding through the exercises, hoping for some payoff).

It's cool that I know what a functor is, I guess, but where others tout CT as unlocking a higher level view of programming, all I see are little curiosities (when you do X you are really doing Y, isn't that cool!?).

I've heard it said that CT's true power comes as a "refactorer" for what would otherwise be complicated proofs. I.E. we discovered a new tool to make proofs that already existed more elegant. That's cool that it has a use! Does this somehow extend to programming?

I don't mean to sound like a grouch, but I'm genuinely baffled by the level of excitement around this topic.

If you're looking to scratch a math itch, ML has plenty of opportunities to hobby some math while advancing your career and marketability.

For programming in general, there are tens of topics I would pick before reaching for this, with major immediate payoff to our occupation as programmers.

Finally, after going through 10 chapters I skipped to the end to see if there was some culminating breakthrough that we were striving towards... and I found more definitions stacked upon other definitions. I suppose memorizing all these definitions is supposed to give me a different mental model about programming. Is that right?

I'll quote here the first line of the Analects, "Isn't it a pleasure to study and practice what you have learned?". This is what I look for in studying any topic: something that gives me joy to put into use. In the best cases it gives me a feeling of a super power - something that was previously impossible or extremely difficult is now elegant and clear. Can CT offer that in any way to programming?

-Sincerly, Guy that doesn't get it, but wants to


I think I've got the category bug. I very much was intrigued by Milewski's course and found that it made things happening in Haskell make more sense to me. There are a couple of things that really excited me.

Conal Elliott's compiling to categories. Categories make a great intermediate representation for DSLs. They are very easy to manipulate/optimize in a simple algebraic manner without issues related to named variables. This proposal showed how a plugin could take ordinary programs written in more user friendly syntax and compile it to the category IR. You can overload ordinary programs to automatically differentiable programs and more. http://conal.net/papers/compiling-to-categories/

Another thing I think is really interesting is calculating programs. It is possible in the right formalism to write down a specification of a program in a simple small form, and use algebraic manipulation to derive an executable form of the program. It feels like doing an integral or something. Check out these books on that topic. https://themattchan.com/docs/algprog.pdf http://www4.di.uminho.pt/~jno/ps/pdbc.pdf


Great example!

Here’s a video where Conal demonstrates interpreting a Haskell program (actual Haskell, not a DSL) into a) an image of a graph-like structure (visualizing the algorithm), b) Verilog (for compiling to hardware), and c) GLSL (for being rendered by a GPU in the browser).

https://youtu.be/vzLK_xE9Zy8


As a functional programmer myself, I have the same confusion.

As best I can tell, category-theory a cargo-cult thing ("I want to look like the type of engineer who CARES about monoids [because if I use words you don't know I'm smart]" much in the same way hipsters claimed vinyl sounds better).

I'd like to be wrong about this, but I've already met a handful of "phonies" who rant and rave about this stuff, but when I ask them "How does labeling an array a monad help me?" or even "Is a hash-map a monad?" they are completely stuck.


Label the string as a functor and think about lifting the array into the string functor and doing string manipulation to change it to a dict, then lift it back into a dict type.

String manipulations of a lifted type in "string space" can be faster and more efficient then actually manipulating it in "type space"


[flagged]


This is exactly what I mean. I can't tell if that's a joke or not.


Not a joke.

Take a look at my response to the other guy. I clarified my point with a thorough example. Please note that I was somewhere else and on the phone when I typed my initial response so forgive me for the brevity and lack of clarity. Hopefully the new response will clear things up.

Anyway...

I think theres a good chance that you simply aren't grasping what the category theory fans are saying. Hopefully you grasped what I'm saying.


Apologies. I was on the phone typing the above. Still, remarks like "What?" that provide no information or context are against the rules on HN. Just FYI.

The OP said that there's no point in labeling an arbitrary type as a functor or monadic value. I'm going to use an example to illustrate how there is a benefit in being aware of this concept. I will be using a "String Monad" in my example.

Imagine you have some type, like in the initial example.

And you want to convert that type into another type.

I'll give you an arbitrary example using python:

You want to convert this: {1:2 ,3:4, 5:6, 7:8}

To this: [1,2,3,4,5,6,7,8]

You can do this via loops or you can "lift" the dictionary type from "type space" to "string space" via the python functor str():

   lifted_dictionary = str({1:2 ,3:4, 5:6, 7:8})
This is similar to movement between two categories or objects via a functor. You see similar techniques in other fields of engineering when say someone "lifts" something from cartesian coordinates to polar coordinates. Which is why I'm using the word "space" it's similar to how I'm shifting representations of coordinates to have an easier time with certain calculations.

Then you do your type manipulations in "string space" rather then on the type itself. So it all ends up being string manipulation operations to get from the string "{1:2 ,3:4, 5:6, 7:8}" to the string "[1,2,3,4,5,6,7,8]". I didn't use regexp in the example below but you get a huge performance boost if you use that instead.

   lifted_array = "{1:2 ,3:4, 5:6, 7:8}".replace('{','[').replace('}',']').replace(':',', ')
   #computed value is a string of the form: "[1,2,3,4,5,6,7,8]"
Then you use the opposite functor from str() to lift the value back into "type space"

   array = eval(lifted_array)
Here eval() is the opposite functor to str()

Type conversion complete. The full code:

   original_dict = {1:2 ,3:4, 5:6, 7:8}
   array = eval(str(original_dict).replace('{','[').replace('}',']').replace(':',', '))
   #value of array is [1,2,3,4,5,6,7,8]
Essentially you use functors to lift your data structures into other categories for easier conversion, then you simply bring the stringified type back down into regular "type space".

So in short I took types and lifted it into the String Monad or "String Category" or whatever you want to call it, then brought back down into types.

It seems like an arbitrary way to do type conversion but using regexp to do string manipulation in place of a for loop that unrolls the dictionary is more performant and faster. Think about preserving order as well. When I convert the dict type into string space, ordinal properties of the string itself will be enforced on the dict. SO the keys in the dict will follow the order they appear in the string as will the array string that it is converted to.

This technique utilizes category theory. The categories you are lifting to must be isomorphic to the origin category, you must understand the notion of functors (or monadic values) to really get it.


> This technique utilizes category theory.

No, it doesn't. Category theory is not intrinsic to the development, use or explanation of this technique. Category theory is merely one possible framework for explaining why the technique works the way it does. That's an important distinction, and likewise it's not the most practical framework for understanding this technique by a long shot.

In particular, understanding why certain programming techniques are more performant than others virtually never requires the language and abstractions of category theory. Everything you're saying here is a further demonstration of the complaints others have voiced in this thread. Not because you're wrong - you're not wrong. Rather because it's just not the most practical way of understanding or leveraging most programming techniques, and it injects a significant amount of unnecessary abstraction and foreign mathematical terminology into discussion.

For what it's worth I have taken graduate math courses wholly and partly focused on category theory, so I have the background to follow what you're saying. It's essentially correct. But it's also obtuse and mostly inscrutable to other people. It reduces, rather than increases, the shared context software engineers leverage to understand each other. You can project a vast and magnificent theory of categories onto programming because it adheres to a variety of algebras, but at the end of the day it's just not clarifying anything further. Instead it's miring it in overcomplicated verbiage.

There's basically no reason for you to be explaining why a regular expression is faster than a for loop for this use case with the formalism of category theory, unless it's as an academic exercise just to show you can. Explaining this example with the language of functors and type spaces is like proving that Excel is Turing complete in order to explain how summing arbitrary rows in a column works.


>Category theory is not intrinsic to the development, use or explanation of this technique.

No it's not intrinsic to the technique, you're right on this.

>There's basically no reason for you to be explaining why a regular expression is faster than a for loop for this use case with the formalism of category theory, unless it's as an academic exercise just to show you can. Explaining this example with the language of functors and type spaces is like proving that Excel is Turing complete in order to explain how summing arbitrary rows in a column works.

You're right on this as well. The category theory part I am using here is basically saying that you can "lift" the type into "string space" and do conversions in an isomorphic category. I am not trying to say that category theory will explain why regexp is faster. More like I'm just saying that categorical insight offers you alternative design choices to reach your goal. It is up to you to determine whether that path is performant.

>For what it's worth I have taken graduate math courses wholly and partly focused on category theory, so I have the background to follow what you're saying. It's essentially correct. But it's also obtuse and mostly inscrutable to other people. It reduces, rather than increases, the shared context software engineers leverage to understand each other. You can project a vast and magnificent theory of categories onto programming because it adheres to a variety of algebras, but at the end of the day it's just not clarifying anything further. Instead it's miring it in overcomplicated verbiage.

I'm not a mathematician, I'm the farthest thing from that, so hopefully my explanation is understandable to the layman as I, being a layman myself, can offer that perspective naturally. But you aren't wrong here. I can see how the vocabulary can confuse... BUT I am not trying to inject overblown concepts into what is otherwise a straightforward technique. This is not why I brought it up and it is not my intention. I'll explain. Read on.

>That's an important distinction, and likewise it's not the most practical framework for understanding this technique by a long shot.

Honestly, I found this technique by thinking in terms of category theory. I would not have come up with it had I not learned category theory. I stated that it uses category theory because that is what I myself used to come up with it.

That is the point I am trying to make. Categories personally helped me utilize a technique. And hopefully my anecdotal experience will lend some credibility to using category theory to help with programming. I am not trying to use the terminology to sound pedantic. Apologies if it seems that way.

And also note this notion of converting from one "space" to another "space" to have an easier time doing transformations is a common technique even outside of software. To cite another example: electrical engineers move between "signal space" and "frequency space" in order to better understand the properties of signals.

I know of no universal term for this conversion that covers the general idea other then the categorical term: "functor." At the very least, category theory introduces a vocabulary for Design techniques like this. Similar to "GoF Design Patterns" but more theoretical and formally defined.


Well it might use category theory and be buzzword compliant but the code you’ve written wouldn’t pass code review at my work. This code converts numeric data to a string and back to numeric data.


Buzzwords are popular words. You put category theory on your resume it's not gonna get you anything. It's obscure.

Look the point isn't the python, the point is, the design pattern can be applied to many places and in some contexts it's faster in other contexts it is not, but the design pattern exists. My intent was to convey understanding of the pattern at a very general level. Not to argue about the validity of the python implementation.


I would simply do:

    original_dict = {1:2 ,3:4, 5:6, 7:8}
    array = sum(map(list, original_dict.items()), [])
And it would have been simpler if items returned lists instead of tuples. In my benchmarks it runs 10x faster.


This wasn't the point. Also map is not recommended for usage in python. Stylistically the convention says you should be using a list comprehension in place of filter and map.

Also did you do your tests with regular expressions? I specifically mentioned I didn't use regexps but it would be dramatically faster if you used it instead of the replace method.

I'll reiterate the point. The point is there is a pattern you can follow where you can move from one "type space" into another "type space" then back because the other type space may be easier to write transformations.

Obviously from the this thread, the serialized type space may not be the best way for python. But there are many other contexts where this it is better and my point is, category theory allowed me to derive this general pattern. That's it.


Still needlessly complex and slow.

    [x for kv in d.items() for x in kv]


aaah! thanks for explaining that its about fun things to do with isomorphic conversions and transforms -- which makes it such a fundamental thing that i can likely continue to invent the bits i need of it as i go.


It's not just fun, there's a practicality to the method as well which I believe, judging from the responses, that my example failed to convey.

In another comment, I came up with another example that uses the same pattern that uses a more "practical" example. I'll paste it below:

the goal is to convert:

   {"1":"2", "3":"4", "5":"6"} to {6:1, 2:3, 4:5}
Or essentially the ordered dict, rotated. Which space is a rotation more intuitive? A list space. [1,2,3,4,5,6] is more readily rotated into [6,1,2,3,4,5] with one operation and converted back into a dict.

If you tried to do the above directly by manipulating the dictionary it would not be as straightforward. Use a functor to lift the dict into a list, do the rotation and use the opposite functor to lift it back down to a dict.


Also totally don't appreciate the sarcasm. I hate people who have this type of attitude. What's your intention? Just to piss me off? I'd flag your post, but this thread is pretty much dead already.


[flagged]


Please don't cross into personal attack, no matter how wrong or ignorant another comment is, or seems.

https://news.ycombinator.com/newsguidelines.html


Author mentioned elsehwere that he used that trick in postgresql , not in python, where it allowed him to skip SQL acrobatics and sped up query.


Thank you for this. This is indeed where I actually used the technique. You do get dramatic speed up if you do it in SQL this is provable.

I chose to use python as an example here mainly because the SQL example would get very convoluted and be much less concise.

The main point was the general technique of moving back and forth between isomorphic types, but people are getting too involved with implementation details of python. Yeah it might(keyword) be slower, yeah it's more convoluted than a traditional list comprehension in python, but that is not the point.

Lesson learned: don't use arbitrary examples to prove a point. Use an example where all metrics are dramatically better.


First off, I don't like your sarcastic attitude. It's against the rules on HN to do that. Don't violate rules to be rude.

Second off, my example is just an example. It's not production level code. The intention of the example is to show "lifting" from from one type into another "category." The efficiency is less relevant here... in some contexts it can be faster in other contexts it could be slower or not even possible.

>Your gloss reveals further substantial misapprehensions about python basics (there are no regexps here, no it's not faster to do it that way, no it's not any more order preserving then iteration).

You sentence here reveals a failure in reading comprehension. I never mentioned what did was a regexp. I said you Could use a regexp to make it faster. A regexp if you know basic computer science is the one of the fastest possible string parsing operations. Take a course on Automata at a top university, this would be helpful in educating you, but you need to qualify first.

If there is a bottleneck it's in the casting functors str() and eval(). I didn't benchmark it and I very much doubt you did either. I suspect it is faster as the interpreter needs to "interpret" the data structure anyway regardless of whether or not it has string quotation marks around it so the parsing bottleneck is already there regardless of whether or not you have quotation marks. This is just a guess, and I'm pretty sure both of us are too lazy to generate actual specs to prove it either way. Stop throwing hard conclusions at things you have ZERO evidence for. If you do have evidence, show it, don't talk out of your ass... I'm willing to concede given evidence but without evidence you don't know what you're talking about.

The string does hold order preserving properties. You are absolutely wrong about this:

"{1:2, 3:,4}" during string manipulation can be made into "[1,2,3,4]" without sorting because the string itself is a List of ordered characters. Unserialized, the dictionary has no order. Again what's going on here is you misreading and not understanding my explanation.

>Assuming your problem is purely lack of experience, here's a word of advice: stay the hell away from abstractions for a year or two.

This is flagrantly offensive. Let me give you a word of advice. Keep your attitude in check, this kind of attitude can ruin your whole career after you get your ass fired. Any hiring manager would prefer a less technical person (which you have not demonstrated any dramatic ability in) over someone with a bad overly domineering attitude, and that is a fact.

>Instead try to form some basic mental model of a) how a computer and b) your main tools (such as python) work (have a very simplified model of how the CPU, memory network and SSD work, with numbers correct to 1 or 2 orders of magnitude; read a bit about the python implementation). Try to predict how much time/memory/etc something should take before you implement it. Then implement it in the most straightforward and direct way.

How does this have anything to do with the topic at hand. You are invoking OS and CPU architectural level concepts which are so far away from python it has basically nothing to do with it. An SSD? Are you serious? All of these low level operations are light years in distance away from python. If you want time and space analysis of both algorithms, at the python level/category theory level you just follow the standard model of Big Oh complexity. You want to get nitty gritty? get the bench marks or switch to something like C++ or Rust. Nowhere does anyone have a detailed enough mental model to map out how the python code will execute at the assembly instruction level.

Category theory is a top down approach to programming while computer architecture is more of a bottom up approach. Experts in both fields are unlikely to collide.

FWIW I double majored in Electrical/Computer Engineering and Computer Science 10 years ago from a top university. Not only do I have a general model of computing from semiconductor physics (this is below logic gates and non linear circuits and computer architecture, all said to patronize you.) all the way up to python, but I hate throwing down credentials for no reason... SO why did I do it here? because you decided to assume that I lack experience.


I timed it -- 2 orders of magnitude slower across the board.

    In [19]: short, middle, long = [{2*i+1:2*i+2 for i in range(n)} for n in [2, 2**8, 2**16]]
    In [20]: timeit  [x for kv in d.items() for x in short]
    812 ns ± 6.86 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
    In [21]: timeit  [x for kv in d.items() for x in middle]
    30.1 µs ± 1.87 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    In [22]: timeit  [x for kv in d.items() for x in long]
    9.77 ms ± 1.19 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
    In [24]: timeit  eval(str(short).replace('{','[').replace('}',']').replace(':',', '))
    10.4 µs ± 240 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
    In [25]: timeit  eval(str(middle).replace('{','[').replace('}',']').replace(':',', '))
    592 µs ± 23.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    In [26]: timeit  eval(str(long).replace('{','[').replace('}',']').replace(':',', '))
    264 ms ± 3.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Python dictionaries are ordered, and iterate in order (since 3.6, as guaranteed property since 3.7). In older versions of python where they are not ordered, str also does not preserve the order in the literal (except for one of all possible orderings).

   Python 3.7.2 
   Type "help", "copyright", "credits" or "license" for more information.
   >>> str({5:6, 4:5, 3:2})
   '{5: 6, 4: 5, 3: 2}'

   Python 2.7.15
   >>> str({5:6, 4:5, 3:2})
   '{3: 2, 4: 5, 5: 6}'
You might consider what I wrote offensive, but maybe you can still take the above as evidence that there are areas where your mental model is imperfect. I have the impression there is a puzzlingly common class of CS engineering failures stemming from misguided abstraction attempts because people are either unable or just unpredisposed to making extremely back of the envelope task breakdown of the most direct implementation in terms of primitive, costed operations. I also see this happen to rather smart but inexperienced engineers. Since you are not inexperienced my advice is unlikely to be helpful.


>I timed it -- 2 orders of magnitude slower across the board.

Dude, didn't I tell you to use regular expressions? Replace is obviously slower. I used it because it's easier to read then regexp. Rerun your experiment on that. I specifically mentioned that the regexp is the faster implementation. You completely ignored it.

Again the implementation was not my point. I don't even understand why you would spend the effort to record something so trivial.

Also the "magnitude" difference in speed is not exponential meaning for most applications the difference is negligible. You should know if you're "experienced" that a C++ binary is a "magnitude" faster than python just running a for loop counting to 100. Most people still use python because in the end because it Doesn't even matter.

>Python dictionaries are ordered, and iterate in order (since 3.6, as guaranteed property since 3.7). In older versions of python where they are not ordered,

Good to know.

>str also does not preserve the order in the literal (except for one of all possible orderings).

Categorically wrong. You talk about mental model? Examine yours. The very nature of a string is an ordered list of characters. A string literal "abc" has order and preserves it by nature of having order as a property. A data structure that loses order is one that does not explicitly have ordinal properties hence it does not preserve order.

>You might consider what I wrote offensive, but maybe you can still take the above as evidence that there are areas where your mental model is imperfect.

No one has a perfect model. To understand the world and computing, what people do is they make predictions and guesses and abstractions. I'm just guessing that string space might be faster in python. If I really wanted to understand why one benchmark was faster than the other I basically have to understand the source code behind eval and str. That is something that I don't care too much about and is therefore not worth my effort. Overall that wasn't even my main point.

>I have the impression there is a puzzlingly common class of CS engineering failures stemming from misguided abstraction attempts because people are either unable or just unpredisposed to making extremely back of the envelope task breakdown of the most direct implementation in terms of primitive, costed operations

The back envelope calculation is that the upper bound for both algorithms is O(N) where N is the amount of characters in the serialized data structure; and thus benchmark times are negligible in almost all contexts. Think about it. We have back envelope calculations down to a science (complexity theory) and that science doesn't even reference actual timing because humans can't see the difference between 0.001 ms and 0.001 ns. But you're going around all of the science and throwing bench marks at me. You want to squeeze the maximum amount of time out of your python program? Stop using python... start writing your code in cisc.

>I also see this happen to rather smart but inexperienced engineers. Since you are not inexperienced my advice is unlikely to be helpful.

An experienced engineer is only concerned with performance when it actually makes a difference. Your advice is not helpful because it is wrong.

Understanding computing is in itself composed of layers of abstraction. To optimize a high level language I don't have to understand assembly language. I definitely don't need to understand CPU architecture which you seem to imply is required. I have an abstraction that will help me: Big Oh Notation.

The most common time a programmer will need to understand the low level aspects of a computer is if they are writing systems level code, not application level code. These specialists tend not to use python at all. I would imagine category theory is not applicable at this level because they are closer to the turing model of computing which is by nature non-functional. Assembly, C, C++ or Rust is what they usually work in.


> Dude, didn't I tell you to use regular expressions? Replace is obviously slower. I used it because it's easier to read then regexp. Rerun your experiment on that. I specifically mentioned that the regexp is the faster implementation. You completely ignored it.

Well, I ignored it because I thought that a) the actual string manipulation would form a negligible part of the total runtime b) regexp would not be faster (and the actual fast way to do it would be str.translate). Let's try it:

    In [43]: pat = re.compile('[{}:]'); slong = str(long)
    In [44]: timeit slong.replace('\{','[').replace('}',']').replace(':',', ')
    8.06 ms ± 23.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    In [45]: timeit repl = pat.sub(lambda m: {"{": "[", "}": "]", ":": ", "}[m.group()], slong)
    128 ms ± 5.94 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    In [54]: timeit eval(repl)
    237 ms ± 3.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    In [54]: timeit slong.translate({'{':'[', '}':']', ':': ','})
    753 µs ± 27.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
See how having a reasonable mental model can be nice?


Please read the entire post and respond to that rather then the tidbits that help you gain argumentative advantage. I have repeatedly said that the implementation WAS not the point and that all performance metrics are negligible as the are under O(N). Either way here's a response:

Honestly reg expressions in all general contexts is one of the fastest implementation of string parsing available for this specific type of parsing. You would have to understand computing theory to see this. Since you didn't even bring it up the actual fact of the matter is, you don't have a good model of computing in general in your head.

The issue you're seeing here is python specific. So really the only way to see this kind of thing is through benchmarks OR knowing the python source.

> See how having a reasonable mental model can be nice?

The problem is your posts also indicate to me that you don't have a good mental model. From what I can make of the model is this: Abstractions are bad, use less of it for more speed, also know SSD's and CPU architecture that will help you write faster python code.

Then what you do is run benchmarks and rely on that in place of an actual working mental model. Believe it or not you CAN run benchmarks on every permutation of a code path you can forego a model altogether. Evidence is more reliable then a model, yet your genius advice was for me to learn CPU architecture.

>a) the actual string manipulation would form a negligible part of the total runtime b) regexp would not be faster (and the actual fast way to do it would be str.translate)

Two things about this, first... str.translate is python specific. No general mental model would assist you with this. You are using python specific knowledge here.

The second part is similar. How do you know eval would be non-negligible? Theoretically the interpreter interprets python code and eval interprets the same thing. Is eval/str accessing heap space or stack space? What is causing it to be slow or is it the extra parsing itself?

Likely you don't know.

Either way my example served one thing. If you understood it you would know that the theme was basically to say that moving through Another type space could have advantages over moving through the original type space. The string functor was just an example.

I could easily say that the goal was to convert:

   {"1":"2", "3":"4", "5":"6"} to {6:1, 2:3, 4:5}
Or essentially the ordered dict, rotated. Tell me which space is a rotation more intuitive? A list space. [1,2,3,4,5,6] is more readily rotated into [6,1,2,3,4,5] with one operation and converted back into a dict.

If you tried to do the above directly by manipulating the dictionary it would not be as straightforward. Use a functor to lift the dict into a list, do the rotation and lift it back down to a dict.

That is the point and the pattern I am trying to convey. We can argue about benchmarks all day it serves Nothing.


I think it is a similar kind of excitement people have when learning Haskell and functional programming: pleasure to experience the aesthetics of pure forms, distilled mathematical concepts, that contrast with impurity and mess of real world coding.

I see some practical benefits from studying Category Theory and Pure Mathematics in general: mental patterns, some limited applications in general, interesting application in specific problems, but it requires huge investment of time, and, frankly, there are much more practical applied areas in maths that can bring food on the table: optimizations, probability, linear algebra, fluid dynamics, Fourier transforms.

One usually does not ask questions about applicability when decides to study Pure Maths, it is a passion driven adventure: https://en.m.wikipedia.org/wiki/A_Mathematician's_Apology


I didn't read the book yet, but I'll take a crack with a couple of examples:

Unix file systems, and its philosophy of "everything is a file". It wasn't common before Unix to interact with devices through a filesystem. Somebody noticed that simple tasks like "listing with ls" and "hex-dumping with xxd" doesn't need separate utilities for files vs devices, and thus made the abstractions converge. If you ask me, this is the kind of abstraction I wouldn't normally think of. It's useful to know what kind of things to be on the lookout for.

The second example is fuzzing. At least today, it's still kind of painful to have to manually write fuzz-tests for everything we care about. Manually coming up with good test cases is also hard. But remember those Monad laws in Haskell? https://wiki.haskell.org/Monad_laws these are essentially free fuzzing code. If I implement something as a Monad, at least in theory I should be able to reuse existing fuzz-drivers to test my new code as well. I probably would not have had the patience to come up with good tests myself. Similarly for other categories. Again, it's good to know what general interfaces already exist.


Here’s maybe one attempt at answer https://blog.ploeh.dk/2017/10/04/from-design-patterns-to-cat...

As I take it the hope isn’t that all programmers would learn CT but rather that those who work on progressing the craft and our tools get at a proper formalism to work with for certain aspects that until now has been rather “soft”


> the content is interesting but the ROI is low ...

Even mathematicians themselves have had this view, for a long time. The attitude appears to be "hey I'm doing my good calculations here, why should I be messing around with arrows?" Theoretical physicists are now getting involved in this debate aswell. So it's a much bigger complaint than "how is this going to give me better programs?"

I've thought so much about this, I don't really know where to start. The functional programming stuff is nice, but it is only one manifestation of CT ideas. Probably, if you are any good at programming you are already "doing" CT. Anytime you have a class invariant, you are "doing" category theory. The methods are preserving some kind of structure (the class invariant.) And that is the key idea behind OOP. That is what it means to have an object and not just a bucket of data. FP does this too but in a strict way; nothing is mutable.

The revelation of CT is how deep (or wide?) these ideas are, but maybe the only thing you get from this is the satori.


It's a really interesting way to think about things, that's pretty unlike other ways to think about things, and yet you can use it as a viewpoint for so many other things. It's intellectually really neat and marginally useful. More useful in certain programming communities.


Watched that guy's lectures and conference talks. While it's quite interesting and educative - I still don't get the "for programmers" part of it. I don't quite get how I would jump from understanding categories, morphisms, monoids etc. to building actually better systems. There are zero practical examples in his talks. Is it because i'm not using functional languages or what am I missing here?


Being into functional programming helps quite a bit I think. In particular, category theory seems to nest well and illuminate some complicated uses of polymorphism and continuation passing style transformations.

Categories are also a useful abstraction/pattern for designing interfaces to libraries. It's sort of a more mathematically flavored version of dataflow style libraries like simulink or gnuradio diagrams and stuff. Gluing together lego blocks and building up pipelines. I have been fiddling around with a categorical interface for solve linear equations which I think is pretty neat. http://www.philipzucker.com/linear-relation-algebra-of-circu...


Yeah, +100 I have this same problem in general. FWIW, the only real practical example I've seen (which has been rather serendipitous for me) is the "Seven Sketches in Compositionality" linked in the comments below applying category theory to databases. You can even play with the actual an implementation (written Java) to see some real world code here: https://github.com/CategoricalData/CQL Dunno if that helps your needs much but it was at least some kind of concrete starting point for me.


The practical takeaway for me is being able to see patterns and systems I never would have seen otherwise.

I avoid a lot more pointless building; sometimes entire projects. The things I do end up creating come from a much deeper place of true need.


That is really interesting point. Could you give an example when you avoided project because of understanding of categories?


I was able to see how the automation software I was designing would ultimately be an equivalent experience to its manual counterpart and thus be pointless to build.

This reads like common sense but it wasn't; there were thousands of lower level factors. Studying the mechanism of analogy and "sameness" seems to help the mind with abstraction. Perhaps because it is the differences in data which reveal what is most useful to know when creating.

In other words, I think a perspective that can see more of the similarities can "diff" and solve problems more quickly.


So here's a "folk theorem" that explains why you should care about such abstract structures.

Take a 2x2 matrix with real elements (a b)(c d). If I take the space of all such matrices and put a uniform probability distribution over it, what is the probability of getting a matrix that is not invertible? Not invertible is det M = 0, or ac - bd = 0. I can solve for a in terms of b, c, and d, which shows that the non-invertible matrices are a 3 dimensional subspace of the 4 dimensional space, which has zero volume. So a matrix chosen at random is invertible.

The folk theorem is in analogy with this: say I have a set of elements {a, b, c, ...}, and I describe mappings of various kinds that take some subset of tuples of the elements into other subsets. You can draw it as a directed multigraph. If you consider the space of all such graphs, how many of them generate "rich" or "regular" structure? For example, having a system where my mapping is defined over all elements is actually pretty restrictive. For a set of N elements, there are N-1 + N-2 + ... + 1 sets that are not defined over all elements. As N gets large, this dwarfs my one regular version. As I put in more operations and go to infinite sets of elements and add more regularity properties, this imbalance grows. So systems that have rich, regular structure are a zero volume subspace of the set of all such systems.

Given this, suddenly the interest in the few dozen algebraic structures that algebraists of various kinds explore makes a lot more sense. They're following infinitely thin paths of structure through space. You start from a raw set with no operations. In one direction you trace through monoids, groupoids, semigroups, groups, rings, rigs, tropical rings, fields, vector spaces, modules, etc. In another you go through pre-orders, partial orders, chain complete partial orders, semilattices, lattices, etc. In another you go through categories, topological spaces, natural transformations, monads, arrows, topoi...

Roughly, the groups, rings, fields path takes you through values that have regularity that looks vaguely like numbers. Orders and lattices take you through things that look like decomposition into pieces. Categories to topoi take you through things that look like sequences of operations and transformation. That the latter might be of interest to a programmer is fairly obvious from this point of view. So when someone says a monad is interesting, what they are trying to tell you is that it is the relevant data structure for describing an imperative program the way an array or list is the relevant data structure for describing an ordered sequence of numbers.

The reason you care, then, is that once you have traced out these paths, when you are looking at a problem your brain will automatically try to draw you back towards the thread of regular structures. Sometimes you don't go to familiar ones. I ended up building an odd variation of lattices to describe genome annotations because of this, and it was an exercise in finding what about the domain drew me out of strict regularity rather than trying to find my way into it, which is a lot easier.

Similarly, the entirety of the literature on eventual consistency in databases can be summarized as "make your merge operation the meet of a semi-lattice." If you've traced through that particular thread, then you can immediately think in a deep way about what kind of structure eventual consistency has and what the variations on it are.


Thank you. A lot. This is one of the most well-founded, didactic and insightful comment I ever read on HN.

I happen to have a decent background in mathematics. Some people say it's useless for most contexts that a programmer can meet (exceptions being vectors and matrix for 3D and quaternions for some 3D APIs for example), but I disagree. You just illustrated how abstract mathematics can be directly relevant to programming.

Even when the mathematics one knows are not directly applicable, I think having studied mathematics helps shape the brain activity at all tines into thinking things orderly and more efficiently. Having correct intuitions about what will be a productive approach and what will not. Loving and embracing simplicity

That said, perhaps we should use different words for the different activities that look like programming. To be caricatural, one is just piling frameworks and libraries on a language where you don't even master the basics, filling in the blank from examples and snippets from the net. The other activity is writing well thought out programs on top of highly generic patterns like Reactive Programming or even designing your own patterns and frameworks to be used by others.

The first activity does not need much mathematics, the second greatly benefits from it.

What do you think? Should we use different words?


> What do you think? Should we use different words?

Maybe it's because I was up until midnight debugging a production problem on Christmas eve, but I can't make myself care much either way. My kneejerk reaction is to say no, because I hesitate to introduce categories without being quite certain of their operational reality.


One guy Eugenio Moggi saw that the category theoretical "monad" could formalize "imperative programming" in pure functional languages by carrying a "world" parameter that represents the state being modified. Since then people have been trying hard to jam the rest of category theory into programming hoping to uncover similarly striking results, to no avail.


> by carrying a "world" parameter that represents the state being modified

From this, it's clear you've never read and understood Moggi's seminal paper. Monads are functors with some extra monoidal structure. The concept, and even Moggi's use of it in categorical semantics, has nothing to do with "worlds".

The important realization is that there are many more monads than just the one hardcoded into one's programming language of choice. State, error handling, parsing, reading from an environment, backtracking, nondeterminism, mutable state, logging, probability, continuations, async/await, I/O, ...these are all just specific instantiations of the general interface of monads. Recognizing that allows you to build abstractions that work for any monad, rather than re-discovering and re-implementing the same idea for each one separately. It's been a remarkably fruitful area of research.


Sorry for saying "world" and for underrepresenting Moggi's paper, but all those things you're mentioning are side effects and are captured by the same concept, of representing effects by carrying a parameter through a chain of function calls.

I'm not backing down from the core claim that Moggi realised that CT monads are a nice formalization of side effects in pure functional programming, and that this caused a flurry of interest in "what else" of high impact/interest CT can explain of programming, and that there isn't much else.


> all those things you're mentioning are side effects and are captured by the same concept, of representing effects by carrying a parameter through a chain of function calls

No. The state and environment monads fall under that rubric, but many of the others are far removed from what you're describing. You can't implement backtracking in a purely functional way simply by passing around an extra parameter, because backtracking affects the control flow. Continuations are even stranger—they're not even algebraic!

> I'm not backing down from the core claim that Moggi realised that CT monads are a nice formalization of side effects in pure functional programming

No, Moggi's paper was about formalizing the semantics of impure programming languages. It wasn't until Wadler that we started using monads to model side effects in pure functional programming. Please read the actual research before citing it.


What about optics?


Monads are still programmable semi-colons, though. Yes, there are a lot of things you can program into a line-end symbol. Yes, some of those things are general and work for any already modded semi-colon. But it's still just programmable semi-colons: implicit, uncomposable, and frankly better left alone.


>implicit

You'll have to elucidate, I don't follow.

>uncomposable

Demonstrably false, see monad transformers.

>better left alone.

"Yeah, well, that's just, like, your opinion."


Monad transformers are really not the great counterexample you seem to think they are.


Monads were described as uncomposable. Monad transformers are one way they’re composed. I don’t understand your objection.


A lot of people consider Monad transformers as an ugly hack to partially paper of the lack of compositionality of Monads (hence alternative attempts like extensible effects). Something that composes nicely has low overhead, does not introduce additional boilerplate or require making arbitrary additional choices (such as ordering). But Monad transformers do often involve non-trivial overhead, additional boilerplate (not sure how much things have progressed since mtl) and extraneous complexity.


> Monads are still programmable semi-colons, though

No, they’re not. That’s just an analogy that’s sometimes useful when describing them to imperative programmers to aid intuition.


Unix pipeline a category, with text be object, and executable be morphisim converting text to text(monoid). Forth is a category, with stack as object, functions as morphisim. There are many examples. Best learn from haskell.


I could never learn what pipeline is this way.


I watched the first part of his lectures on category theory. Keep in mind I'm the furthest thing from a mathematician you can find.

The feeling I'm getting when learning category theory is that if there was a formal theory for how to design programs. Category theory is it. Application is therefore not straightforward... You have to get really creative and think really hard to see the insights that category theory has to offer.

The notion of a isomorphism and a functor and lifting was directly applicable to programming after I learned about it. Here's what happened:

In postgresql there are specific library functions for dealing with specific types of json structures in the postgresql library. If your json blob doesn't fit the structured type parameter that the library function accepts you have to do some very awkward joins to convert the json into the right format which can cause massive slow downs.

I use the notion of two isomorphic objects and the opposing functors between them to convert the json into a string functor. Then I did string manipulations to convert the serialized json into a different form of serialized json then re-lifted the serialized json back into the json functor. The converted json type fit the parameter of a library json function and thus the result was a query that executed 10x faster then the other implimentation.

If I didn't know category theory the code would look absolutely crazy. I cast json to a string, replace a bunch of characters in the string then cast it back to json. The notion that a string can contain another type as a functor and the notion that string manipulations operations on serialized json have an isomorphic equivalent in "json space" was what allowed me to creatively come up with this optimization.

Note that the json equivalent morphisms I needed do not actually exist in the postgresql implimentation but because of the isomorphism that exists between stringified json and actual json I can build the json morphisms I need by composing two opposing functors and a string manipulation operation.

The thing about this though is that it's debatable whether or not you would actually need such "creativity" in a language that wasn't as terrible as SQL.


Check out Program Design by Calculation and The Algebra of Programming. Category theory and related formalisms do have a strong case to being a formal theory for designing/calculating programs

http://www4.di.uminho.pt/~jno/ps/pdbc.pdf https://themattchan.com/docs/algprog.pdf


Super interested in this and thanks for posting. Your first link is dead though. Do you have an alternative?



Yes, thanks for this. It appears that maybe the entire university's website has changed location? So maybe a more official link will come back up soon


The clearest and most concise treatment I've come across on how Category Theory is used is Tai-Danae Bradley's "What is Applied Category Theory": https://arxiv.org/abs/1809.05923 (It's a 50 page PDF)

Unfortunately the examples aren't within CS, but at least for me personally, seeing it clearly applied in any context was the most important thing. It may be a good pre-req for looking at (or conjuring up your own) more detailed applications within CS specifically.

Also, if you like that, Tai-Danae Bradley has more writing at: https://www.math3ma.com/


I'm familiar with the applications here, and I've never found them compelling. They amount to taking things we already understand perfectly well (chemical reactions, written in terms of differential equations or visualized in terms of graphs) and draping huge amounts of formalism around them. The "theorems" they then prove do not provide any new insight -- they're only called theorems because the excess formalism makes completely trivial things hard to see!

For example, the main theorem of one of Baez's very long papers is:

> There is a symmetric monoidal functor from the category Petri to the category Dynam.

When you plug in the hundred relevant definitions, you find that this is literally just the statement that you can specify a chemical reaction network in terms of differential equations, or pictorially in terms of graphs where the edges represent reactions, which is a standard thing taught in ordinary chemistry classes. The language of categories is not achieving anything here but obfuscation.

Category theory is only good when there's a very rich preexisting mathematical structure, so that introducing it simplifies things. That usually isn't true in science.


I think you may be missing something here.

> The "theorems" they then prove do not provide any new insight

Knowing that you can model reaction networks using categorical concepts may not provide new insights IF you have no deep familiarity with those concepts—but if you do, then knowing there is a "symmetric monoidal functor from the category Petri to the category Dynam" is a richly useful piece of information. It's probably more accurate to say that the amount of new insight conferred is proportional to your familiarity with the categorical concepts.

And yes, it's true the concepts are 'just' being translated into a new language—but if this new language is extremely general, applicable to a wide range of domains, then:

1) You can build a set of skills and knowledge related to concepts in this general language, which become automatically applicable to a wide range of domains (when their concepts are imported into the language).

2) It's possible to build 'machinery' (i.e. theorems or software) which deals with / operates on (representations of) concepts from this very general language—so once again, if you can translate some concept from domain X into e.g a symmetric monoidal category, now your general machinery can operate on things from domain X (which it was built with no awareness of).

Additionally, at least in Bradley's paper, the point wasn't to prove new things about e.g. reaction networks; instead it was to illustrate a method. To be more specific she is showing how "functorial semantics" can be applied in this context, using Petri nets as the syntax and dynamical systems as the semantics. From there the categorical classification of these objects gives us guarantees on their behavior under composition (the utility of which I assume is clear).

I'm not sure in the context of reaction networks whether modeling the system with compositionality guarantees and a clean mapping between graph representation and differential equations gives scientists something new or not—but the illustration of this method is done clearly, and the generality of the method is enormous.

And maybe I'm also missing something. I'm not a mathematician, but that's how the situation looks to me.


> Knowing that you can model reaction networks using categorical concepts may not provide new insights IF you have no deep familiarity with those concepts—but if you do, then knowing there is a "symmetric monoidal functor from the category Petri to the category Dynam" is a richly useful piece of information.

Look, I completely agree with the general point. I love new ways of thinking about things. But blindly applying extremely complicated tools to domains where their power isn't helpful is not good.

What is happening here is like taking a 2-line program that adds together two numbers and dressing it up into thousands of lines of OOP, until you have AdditionObjectFactoryGetterSetter objects being created by AdditionObjectFactoryGetterSetterFactory objects. Then you can bring in powerful OOP tools, but the only things they help you with are understanding the excess complication you added in. Maybe the sheer scale of that structure is beautiful to someone, but we're discussing whether scientists should bother learning it.

> I'm not sure in the context of reaction networks whether modeling the system with compositionality guarantees and a clean mapping between graph representation and differential equations gives scientists something new or not

And I'm telling you, it really doesn't! There simply is not a rich enough formal structure present in chemical reaction networks for category theory to yield anything useful, like it did in parts of pure math. Adding a complicated structure on top of something simple does nobody any good. All of the "guarantees on compositionality" were not useful, in the sense that they went without saying in freshman-level courses, or worse, were problems solely created by the introduction of category theory.

> the illustration of this method is done clearly, and the generality of the method is enormous.

The fact that the technique is extremely general does not improve things. If a technique makes science harder to do, you can't make it up on volume.


> What is happening here is like taking a 2-line program that adds together two numbers and dressing it up into thousands of lines of OOP, until you have AdditionObjectFactoryGetterSetter objects being created by AdditionObjectFactoryGetterSetterFactory objects. Then you can bring in powerful OOP tools, but the only things they help you with are understanding the excess complication you added in. Maybe the sheer scale of that structure is beautiful to someone, but we're discussing whether scientists should bother learning it.

In my opinion, this severely misrepresents what's happening. Your example is simply adding extrinsic, incidental complexity and this is not what the categorical treatment of this is doing, in my opinion. The additional complexity introduced by the CT machinery is not a lot and most of the complexity is intrinsic to the problem.


Argh, I wrote this in an editor and accidentally posted only a partial response. I cannot edit anymore so I'm responding with the full comment:

> What is happening here is like taking a 2-line program that adds together two numbers and dressing it up into thousands of lines of OOP, until you have AdditionObjectFactoryGetterSetter objects being created by AdditionObjectFactoryGetterSetterFactory objects. Then you can bring in powerful OOP tools, but the only things they help you with are understanding the excess complication you added in. Maybe the sheer scale of that structure is beautiful to someone, but we're discussing whether scientists should bother learning it.

In my opinion, this severely misrepresents what's happening. Your example is simply adding extrinsic, incidental complexity and this is not what the categorical treatment of this is doing, in my opinion. The additional complexity introduced by the CT machinery is not a lot and most of the complexity is intrinsic to the problem.

What this does is to add rigor to something that you (or a chemist) may understand intuitively. To a person with a working, intuitive knowledge about a subject, that may seem a bit pointless, but it is nevertheless useful since an intuitive understanding might overlook reasoning holes or edge cases. Building upon this result may yield further useful insights.

Granted, this really may not be useful to a working chemist. The cost/benefit ratio of a chemist doing chemistry is more favourable than investing a lot of time in CT, hoping for a large breakthrough.


CT provides a language that allows us to talk about things without description of their inner workings. That allows us to see connections between seemingly different topics, which we wouldn't likely find within the domain specific language (DSL). DSLs have their purpose, but so has a meta language connecting DSLs.


General relativity doesn’t help anyone throw a basketball better, but that’s not what it’s for.

Category theory is rarely going to be the best way to do any kind of science, but it does nonetheless allow one to make statements which are true across many domains, without getting bogged down in the details of any particular subject.


> General relativity doesn’t help anyone throw a basketball better, but that’s not what it’s for.

Yes, of course! And as a physicist, I don't ask basketball players to use general relativity.

But I'm responding to a post claiming the category theory really is useful, directly, for science. And just like the basketball player, I'm saying that it really isn't.


I liked “An Invitation to Applied Category Theory“, pdf is available here: [1]

It is written in a more terse, mathematical language, but it is very clear and friendly for non-mathematicians. The authors provide interesting examples of applicability. Hardcover is quite expensive, but the quality is great [2].

[1] http://math.mit.edu/~dspivak/teaching/sp18/7Sketches.pdf

[2] https://www.amazon.com/gp/aw/d/1108482295


If you're interested in going through "Seven Sketches", John Baez taught an online category theory course in 2018, with "Seven Sketches" as the text. Check out past lectures (and student discussion) here:

https://forum.azimuthproject.org/categories/applied-category...


This is awesome, holy crap, thank you so much.

For anyone else wanting some video, here's a link to youtube channel for some of the lectures: https://www.youtube.com/channel/UCUQlS-R4O094jP0sGHDnrjA


I've quite literally spent the last month trying to read, digest, and apply the above to a record layer I'm writing for FoundationDB. Dr. Spivak's work has been absolutely beyond helpful and insightful. There's no way I'd even be able to approach the subject with his efforts.

I'd also recommend these as they're associated with the 7 Sketches and help fill in some blanks:

Functorial Data Migration - https://arxiv.org/pdf/1009.1166.pdf

Category Theory as a Unifying Database Formalism - https://arxiv.org/pdf/1009.1166.pdf

Database Queries and Constraints via Lifting Problems - http://math.mit.edu/~dspivak/informatics/LiftingProblems.pdf

Additionally, there's also https://www.categoricaldata.net/ which helps provide more actualized implementation details of "FQL" (Functional Query Language) which he mentions in his papers.

For anyone else looking to get started with Category Theory, I'd also highly recommend choosing a functional library (i.e. something with Applicative, Functor, Category, Monad, Semigroup, Semgroupoid, Comonad, etc) in whatever language you're most familiar with and then _re-implement_ the functions to learn what it's doing (and what properties each class of item must conform to). In Elixir, I've been using Witchcraft... github.com/witchcrafters/witchcraft

For me, it has helped a great deal by forcing me to think about things like associativity, identity, and composition which I normally... didn't really do. It lends some nice patterns and naming conventions, even if you don't adhere to them perfectly. After a month of category theory, and only a crude/rudimentary understanding of it, has been enormously beneficial. My refactor of my FDB record layer after category theory has less code, works more correctly, and has more features. It's crazy...

Category theory gets 10/10 from me.

P.S. You'll probably, at some point, want to learn a little Haskell to help elucidate things in a "pure" context. It also helps as most every functional library is modeled after Haskell's standard library (? I'm still new, heh). To get started, this will save you time, as it seems most things in the Haskell world now-a-days be using this to setup/manage applications: https://docs.haskellstack.org/en/stable/README/


Since you sound like you're from the category theory community you may already know this, but my database library Opaleye is directly based on David Spivak's ideas.

https://hackage.haskell.org/package/opaleye


I'm totally new to the community! So I have no idea about almost anything I didn't link to or isn't linked here, ha.

That is a freakin' awesome link, thank you so much! Though, it'll probably mean yet another rewrite of my record layer :) (at learning is fun)


Feel free to email me if you have any questions around Haskell/category theory/databases. My email address is linked in the Opaleye README.

(BTW I meant "sounds like you're from the Haskell community" ... thinko)


Thank you for sharing your experience!

I think of Category theory and Abstract algebra as great upgrade kits for brain. If, say, a software engineer manages to learn, understand, and internalise those fields up to some level, she will be able to see her daily work - algorithms, architecture, approaches to solve problems - through the mental models those areas of mathematics imprinted into her brain, think and express solutions in those categories. May feel a bit like a superpower :)


You linked to the 1166.pdf twice, did you mean to link to http://math.mit.edu/~dspivak/informatics/notes/unorganized/P... for "Category Theory as Unifying Database Formalism"?


Totally! Thank you for providing the correct link, I had to step out for dinner and didnt see the screw up until now.


Thanks for this, super interesting. Your second link is repeated though, I assume it should be this: https://math.mit.edu/~dspivak/informatics/notes/unorganized/...


Another interesting one about databases and category theory: Relational algebra by way of adjunctions: https://dl.acm.org/citation.cfm?id=3236781


is the source code of the record layer available somewhere?


It will be in the next few days, I've been in a long sprint to get everything I've been working on done (and my personal website online); so, I can have an online portfolio and apply for jobs again (yay, ~f~unemployment).

It's totally not compatible with Apple's record layer I don't use protobufs at all (though there's a good case to be made for using them even despite the article from today's frontpage, heh).


This is probably one of the best introduction to category theory text out there now, but it isn’t exactly as friendly to the non-mathematicians as advertised. The book assumes a level of mathematical sophistication beyond what most non-math majors know and ramps up fast. The average programmer would probably get lost pretty quickly with this book.


>but it isn’t exactly as friendly to the non-mathematicians as advertised.

i really chafe at criticisms like this given how much effort was clearly invested by those authors in that book being readable - there are ample definitions (and clearly offset), tons of diagrams, instructive examples, solutions to many exercises, pointers to further readings, motivation and discussion.

is the thing that bothers you that there's formal mathematics? it's a math book. i feel like your same criticism translated into a criticism about a programming book would be something like "this book is too unfriendly to people that don't program because it has lots of code and expects people to be able to read code".

i don't understand what exactly would qualify for you as "friendly"?

>category theory for the average programmer.

that's because despite what people would have you believe - category theory isn't for programmers but mathematicians. so a book like that would be like a book about fluid dynamics for plumbers (read: impossible).


>is the thing that bothers you that there's formal mathematics

No. I like formal mathematics.

The audience of this book includes “motivated high school student who hasn’t seen calculus yet“.

IMO the book fails here. The average high schooler won’t understand this book on their own. This book is excellent if you already have an understanding of abstract algebra or higher level mathematics though, which is why I said it’s probably one of the best introduction to Category theory text out there. The topics covered in this book are great. This is a really outstanding book.


compare

>The average high schooler

with

>motivated high school student

certainly we all underestimate the difficulty of material (obverse of the dunning-krueger effect) but it says it does at least hint at it right there.

but honestly i'm curious where in the book you see an example of a term or notion that isn't defined in situ? as usual (with all of these things) mathematical maturity is what's required rather than familiarity with the material. what this means is a sense for why definitions are written in the way that they are, which hypotheses in theorems capture the phenomenon and which are technical, which lemmata can be ignored on first reading, etc. this is a very hard thing to communicate, especially in text, but this book does do its best to make that not-overwhelming as possible with ample restatements of definitions and assumptions and diagrams.


> "this book is too unfriendly to people that don't program because it has lots of code and expects people to be able to read code"

That would be a perfectly valid response to the claim that the book "is very clear and friendly for non-coders". I don't understand why you are chafed.


what kind of book that teaches programming could possibly have no code in it? that's not just "friendly" that's completely completely lacking substance.


I imagine everyone agrees with you on that. You responded to an objection to a characterization of the book, not a criticism of the book for failing to match that characterization nor a claim that a book on the topic should be able to match it.


you're not getting me

>That would be a perfectly valid response to the claim that the book "is very clear and friendly for non-coders".

no it wouldn't because it would be vacuously misplaced because there is no such book that aims to teach programming (friendly or not) and doesn't have code.


I am absolutely getting you. The person you responded to did not make the claim that such a book exists. He challenged the idea that the book in question is an example of such a book.


Is it possible to write such a book? I've served as the (non-mathematician) outside chair on committees for category theory dissertations, and the mathematicians themselves talk about whether category theory is too abstract. I can't really tell, of course, because my main role is to not fall asleep during the defense of any math dissertation.


Thanks for the feedback.

I’d like to elaborate on “friendly to non-mathematician”. The book is written for non-mathematicians with the focus on applications, and, like throwlaplace noticed in their comment, they put a lot of effort to explain the subject for the target auditory: diagrams, pictures, explanations, few proofs.

But, after all, it is a maths book on a highly abstract topic, therefore the reader should have some previous exposure to maths and mathematical texts, and expect to struggle, from the very first pages.

I think it is impossible to teach category theory to a person who does not have some basic knowledge in undergraduate maths, better in abstract algebra.


>The audience for this book is quite diverse: anyone who finds the above description intriguing. This could include a motivated high school student who hasn’t seen calculus yet but has loved reading a weird book on mathematical logic they found at the library. Or a machine-learning researcher who wants to understand what vector spaces, design theory, and dynamical systems could possibly have in common. Or a pure mathematician who wants to imagine what sorts of applications their work might have. Or a recently-retired programmer who’s always had an eerie feeling that category theory is what they’ve been looking for to tie it all together, but who’s found the usual books on the subject impenetrable.

Gives the idea this text should be accessible to those that haven’t seen calculus yet, presumably that would also mean those that haven’t had abstract algebra either. But in reality this book is more advanced than what the average retired programmer or average high schooler could understand on their own. It’s better suited for those that already are familiar with more advanced mathematics. I am not knocking the book. As mentioned, it is probably one of the best introductory books on Category theory currently out there.



Also released today, "Notes on Category Theory with examples from basic mathematics" by Paolo Perrone. (preprint state). arXiv:1912.10642v1


Nice find!

Abstract:

> These notes were originally developed as lecture notes for a category theory course. They should be well-suited to anyone that wants to learn category theory from scratch and has a scientific mind. There is no need to know advanced mathematics, nor any of the disciplines where category theory is traditionally applied, such as algebraic geometry or theoretical computer science. The only knowledge that is assumed from the reader is linear algebra. All concepts are explained by giving concrete examples from different, non-specialized areas of mathematics (such as basic group theory, graph theory, and probability). Not every example is helpful for every reader, but hopefully every reader can find at least one helpful example per concept. The reader is encouraged to read all the examples, this way they may even learn something new about a different field.

> Particular emphasis is given to the Yoneda lemma and its significance, with both intuitive explanations, detailed proofs, and specific examples. Another common theme in these notes is the relationship between categories and directed multigraphs, which is treated in detail. From the applied point of view, this shows why categorical thinking can help whenever some process is taking place on a graph. Form the pure math point of view, this can be seen as the 1-dimensional first step into the theory of simplicial sets. Finally, monads and comonads are treated on an equal footing, differently to most literature in which comonads are often overlooked as "just the dual to monads". Theorems, interpretations and concrete examples are given for monads as well as for comonads.

pdf: https://arxiv.org/pdf/1912.10642


Is there any point in learning category theory without first learning the things that it generalizes? I didn't see any value in learning about geometric algebra until I had learned enough about complex analysis to appreciate what it would mean to connect it with vector calculus.


I think category theory generalizes pretty much everything, so that may be tough.


The only thing that can generalize everything is something with no structure of its own.


Started reading this and was disappointed to find the same disingenuous claims about strong static type systems.

I say disingenuous because the people that espouse these beliefs seem to have a minimum modicum of intelligence and do or would certainly (after a moment of reflection) certainly realize their fallacy:

> The only serious argument I hear against strong static type check- ing is that it might eliminate some programs that are semantically cor- rect. In practice, this happens extremely rarely and, in any case, every language provides some kind of a backdoor to bypass the type sys- tem when that’s really necessary.

This is patently false. Port a correct dynamically typed program to Haskell and it will certainly fail until you satisfy the type checker through no small extraneous effort. And this is not a theoretical matter; it is an enormous cost to the programmer that is not there in weaker or non- typed systems.

It is a certainty that type systems ask for far more than is needed to produce correctness. Anyone who tries to dodge this obvious and fundamental truth is operating off of a religious not rational commitment to defending their type system. Note that I am not saying the type system doesn't give you something back but it certainly comes with a inherent cost over alternatives.

> Another argument I hear a lot is that dealing with types imposes too much burden on the programmer. I could sympathize with this sen- timent after having to write a few declarations of iterators in C++ my- self, except that there is a technology called type inference that lets the compiler deduce most of the types

This is the most heinous and sad lie. Type inference changes nothing. The types are there even if you don't have to type them in and they are significantly constraining your set of accepted programs.

I've heard this response before from seemingly smart static typers. I just can't believe a smart person would not see the obvious emptiness of this claim. There is no way anyone would think that having to type characters is the exclusive cost of static typing that inference suddenly whisks away.

I want to buy into the strong static typist's promise of utopia but I have yet to meet a static typist enthusiast that doesn't parlay in these lies, wittingly or unwittingly.

Because of this I have a serious suspicion that there is nothing else to stand on and that strong static typing is simply a case of the Emperor's New Clothes. Prove me wrong.


> > The only serious argument I hear against strong static type check- ing is that it might eliminate some programs that are semantically cor- rect. In practice, this happens extremely rarely and, in any case, every language provides some kind of a backdoor to bypass the type sys- tem when that’s really necessary.

> This is patently false. Port a correct dynamically typed program to Haskell and it will certainly fail until you satisfy the type checker through no small extraneous effort. And this is not a theoretical matter; it is an enormous cost to the programmer that is not there in weaker or non- typed systems.

It seems like this disagreement should be easy to resolve. You are saying that it's hard to satisfy the type checker when porting a dynamically-typed program; OP says this happens very rarely. Therefore you can easily disprove OP's claim by providing some examples that happen commonly in practice. Provide them, please!

> There is no way anyone would think that having to type characters is the exclusive cost of static typing that inference suddenly whisks away.

OP is not claiming that the benefit of type inference is that it automatically "types characters".

> I have yet to meet a static typist enthusiast that doesn't parlay in these lies

I don't think it's necessary to bring a moral dimension into this discussion. Let's keep it technical please.


> It seems like this disagreement should be easy to resolve. You are saying that it's hard to satisfy the type checker when porting a dynamically-typed program; OP says this happens very rarely. Therefore you can easily disprove OP's claim by providing some examples that happen commonly in practice. Provide them, please!

Great, let's stay in a weakly typed language -- and give the smallest of examples. This is so easy to come by it should be obvious that extrapolating this up to real business systems greatly magnifies cost. Here's weakly typed Java:

   Function<String, Integer> countChars = str -> str.length();
That's a function that counts characters in a string. It compiles. Now let's drop the types:

   Function countChars = str -> str.length();
That is a correct definition of the same function but it doesn't compile. The compiler needs information to carry out its proof.

This extrapolates to infinite more examples and the cost in building actual business systems is profound.

Now carry this to a strongly typed system.

The general claim is that type inference will allow the second form to somehow magically pass the compiler.

Not true.

The type inference requires `length` to be formally declared before the compiler can pass the program. I.e., the compiler must be told about the types somewhere.

In a dynamically typed language I can late bind `length`; ie the compiler doesn't need to be "told" something that I, the programmer, have got fully covered by my own measures.

But the compiler doesn't know this. I've already done my proof of correctness (as I should). But because the compiler does not have context, I have to tell it something that has no innate value to me and what I'm delivering -- it only has value to the closed-room of the compiler. This is hostile to productivity and serves only my early commitment to having a type prover guard my every move.


I think your point about disingenuous arguments made by static typing proponents is mostly reasonable, but I don’t think those are the only arguments to be made for strong static typing.

Other benefits of strong static type systems are compiler-enforced documentation (through types) and ease of refactoring. Of course, working in a typed system does constrain you, but I would argue that the cost/benefit ratio is usually quite good. I don’t typically feel very constrained when writing a program in Haskell, compared to Python. I do generally stick to “boring” Python code that would have passed a type checker if Python had one. It does depend a little on the type of application. For writing a compiler, I prefer Haskell, but for doing data-munging and one-off scripts, I prefer Python.

However, arguing about the cost/benefit ratio of programming language features is often difficult, since the benefits and costs are hard to quantify. I do think that it isn’t reasonable to measure the cost of a type system by porting a program from a dynamic language to it, since doing large scale rewrites from a different language is not a typical task. It’s also hard to tell which parts of the porting process are “extraneous” and which parts aren’t. If I port a program from Python to Lisp, I will also have to do some significant extra thinking to make it work, but it isn’t because of static typing of the target language. Porting between different languages is just inherently difficult, and will always involve non-trivial “extraneous” effort. Maybe it’s more work when going to statically typed languages, but it’s not a reasonable measure in any case.


> Other benefits of strong static type systems are compiler-enforced documentation (through types) and ease of refactoring.

I do agree if "compiler-enforced" is your target then by definition you'll choose compiler enforcement.

For me the cost is too high. I'd rather solve my documentation needs through other far less costly means.

> ...ease of refactoring.

This is one of the lies espoused by the strong static typer. The claim here usually is that the compiler will tell you where you're broken through each step of the refactor.

The irony is that the effect is the opposite of what is claimed: requiring a refactor to be formally/machine-verified correct through each iteration is a huge drag on refactoring.

In a non-strongly typed language I can iteratively refactor and execute parts of my program even if a formal verified/strong-type prover would reject the program.

This equates to way better throughput/speed over the formal static verifier/typer.

Formal static verification may give you a host of benefits but in what most of us are doing Time is the most important resource by far. And formal verification/strong-type systems are downright hostile to Time. Which is especially offensive because the claims of stability can be readily achieved through much more efficient and agile techniques than formal verification.


Things are of course not so black-and-white. It is hard to discuss these things at a general or abstract level, since things are often different in practice than they are in theory. But my point was to show that there are actual reasonable arguments that you can make in favor of and against static typing, without being disingenuous. For example, you can argue that static types hurt more than they help during refactoring, but the claim is not so obvious that anyone who says otherwise is simply a static typing zealot.


People who like this may also be interested in the awesome applied category theory list https://github.com/statebox/awesome-applied-ct


And Conal Elliott's Compiling to Categories, which I've seen discussed here before http://conal.net/papers/compiling-to-categories/



Highly recommend the lecture series. He makes it accessible and fun


Seriously, second this. It was extremely helpful. I had both the background in programming and abstract algebra but it wasn't until this course that I was really on the path to understand what was going on with category theory.


Or direct[-ish] link to TFA's PDF: https://github.com/hmemcpy/milewski-ctfp-pdf/releases/downlo...

-ish: since it's doing something untoward involving AWS, but the link does resolve to a actual PDF file, unlike the original submission.


Worth it just for the illustrations, but this book is excellent, especially if you already know Haskell.


The book itself is constantly updated by the author and community, people add particular language flavors. Scala flavor[1] (along with the original Haskell one) is already part of the book, OCaml flavor[2] is being worked on and mostly finished.

[1] https://github.com/hmemcpy/milewski-ctfp-pdf/releases/tag/v1...

[2] https://github.com/hmemcpy/milewski-ctfp-pdf/pull/201


Question to all CS majors, that invested in learning Category Theory:

Did you have any concrete take aways from learning it?

( I have a math background, and struggle to find something concrete. Maybe I am just blind. )


The chain goes like this: Everyone's favorite programming language (Python, JS, ...) gets (much of) their concepts from Haskell and Haskell gets its concepts from category theory. Thus, by studying CT you enable yourself to see these concepts in the programming languages you use at work. This can make it easier for you to read, write and reason about programs, since you are provided with new (previously unknown) ways to think about it.


I believe my real analysis class made me a significantly better programmer and thinker in general, but I don't use anything specific from it (not yet, anyway).


Genuinely curious how real analysis made you a better programmer...? I've taken it and loved it, but my interest was more due to physics/engineering applications. I can't see a connection to programming.


I suspect it could be that real analysis truly makes you challenge your assumptions and forces you to build a habit of thinking through corner cases.

Other maths courses do this as well, but real analysis is particularly vivid for some people because of all the fun counter-examples you get to see in a well-taught course, such as a function that is everywhere continuous but nowhere differentiable.

This habit of thinking through corner cases is something I miss from a lot of (junior) programmers.


What does real analysis have to do with category theory?


Yeah. I couldn't have attempted learning about certain sorts of objects like Chu spaces, posets, or topoi without category theory. After having gone through the change and categorifying my thinking, I now find category-theoretic constructions everywhere in mathematics, and I find the mathematicians who refuse to examine category theory quite insufferable.

Edit: More: The connection between type theory, logic, and computation is not specious, but is part of the fabric of reality. I now can understand why chemistry -- being a linear logic -- is time-reversible, or why every DAG is a poset is a system of logic is a language of computation, or why a ring has two interlocking groups, or why social groups tend to include one big conversation and many side chats in a power-law distribution of sizes.


> I now find category-theoretic constructions everywhere in mathematics,

i think gp might have been asking about practical use in CS ( CS majors).


Yeah exactly. I am interested in practical applications for programmers.

I know a thing or two about the prevalance of categories in Mathematics (https://arxiv.org/abs/1012.3121).

I heared many people talk about Haskell, Monads and Types <=> Proof duality, but I never heard anyone saying: "I applied the Yoneda Lemma to build this API.", or "I read about Category Theory and now $CONCEPT_XYZ started making sense to me."


And CS is merely a subfield of maths. What should I say? Scott theory, domain theory, CPOs, Galois connections? The problem here is that, as usual, folks are trying to figure out what category theory is good for, but ignoring the entire point: The definitions are the important part. Once the definitions make sense, then the rest of the system makes sense. Category theory is all about the definitions, and moreover, all about coming to have the definitions at hand so that one can recognize categorical structures in practice.

Or, I guess, I could take a lazy route and just say "theorems for free, those are the practical use" and be done. But it's frustrating to hear folks over and over wonder what something is good for, when they clearly aren't reading and trying to find the goodness for themselves.


The problem for someone like myself is that I have taken some classes on basic abstract algebra in high school (groups, monoid, rings etc) and came out of it with some definitions and... Nothing else. I can probably still tell if some operation on some set forms a group or not, but that has never been a directly useful tool in what I learned further through high-school and university. It didn't help in linear algebra. It didn't help with calculus. It didn't help with control theory. It didn't help with numerical methods. Sure, I could go around and see if new operations I learned about formed, say, rings, and I'm sure a lot of them do, but that was never used in theorems or demonstrations in those fields (the way I was taught them, anyway).

It was only ever a set of definitions. And almost everything I've seen browsing on category theory seemed the same - like it's just definitions, new names for old concepts, with few theorems shown that elucidate the need for these new names.

For example, lists and optionals and Either are monads. IO is also a monad. The most I've seen make use of knowing this fact is some nice-ish syntax sugar that works for all 3 of these seemingly disparate concepts. But even then, the 'work' for enabling that sugar seems to be done by the simple properties of the monad, more than some theorem that had been proven on monads in general.

Are there some examples of common CS abstractions that are almost but not quite category theory constructs? Are there any simple examples where we get new proofs by recognizing that some old CS concept is, say, a monad?

I understand your point about theorems for free in principle, but I have never seen one in these kinds of discussions (at least, not one that wasn't also a definition, e.g. if X is a monad than Y must be a functor).


I very much like: "Sets for Mathematics", which I currently read and which finally clicks with me. I always found expositions of category theory tiring in that it bombarded me with trivialities and examples that were kind of obvious. Instead, "Sets for Mathematics" describes one particular and probably the most familiar and important category, the category of sets, and I very much like this fresh look at sets.


Here's a really great introduction to CT for the non-mathematically inclined:

Lawvere and Schanuel. "Conceptual mathematics: a first introduction to categories." Cambridge University Press, 2009.

The authors are two of the "founders" of the subject, so don't be fooled by the elementary appearance of this book.


There's a lecture series he did if anyone's interested: https://www.youtube.com/watch?v=I8LbkfSSR58

I must say that there is a slight Tommy Wiseau energy about the guy, but that's no bad thing :-)


I wish some subjects of mathematics could be summarized in a TLDR fashion. It would make it much more interesting as an introduction.

I remember that we used to write very small sheets of formulas and other memos to condense a class into a much shorter thing.

Simple.wikipedia.org seems to somewhat be that.

I have always liked math, but I can see why people don't like math or get interested in math. The literature is exhaustive, unnecessary long, and just too arcane.

I will always be frustrated with the image of mathematics and how laymen often say "math is not for me", but sometimes I can understand why.


Also by the same author: Applied Category Theory 2020

https://johncarlosbaez.wordpress.com/2019/12/23/applied-cate...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: