Hacker News new | past | comments | ask | show | jobs | submit login
Convolutional-KANs (github.com/antoniotepsich)
76 points by AntonioTepsich 48 days ago | hide | past | favorite | 6 comments



Hot on the heels of recent advances in Kolmogorov-Arnold Networks (KANs) technology, we introduce our innovative Convolutional Kolmogorov-Arnold Network (Convolutional-KANs)!

This breakthrough extends the idea of the innovative architecture of KANs to the Convolutional Layers, changing the classic linear transformation of the convolution to learnable non linear activations in each pixel

Our team has worked diligently to explore the potential of this novel architecture and has obtained promising preliminary results, with Convolutional-KANs achieving only 0.04 less accuracy with almost half the parameters of the common CNN (2-layer) and 7 times fewer parameters than a CNN (4-layer)

We invite the research and development community to explore our repository, experiment with Convolutional-KANs, and contribute to its evolution

Explore the GitHub repository! https://github.com/AntonioTepsich/Convolutional-KANs


I appreciate your work, but could you also include the size of the model in bytes and information about inference speed in your tables? Seeing a bunch of very similar numbers just tells us that MNIST is not a very challenging benchmark for Convolutional KANs. One of the key reasons one would want to reduce a model's size is so that the model can fit on very small low power accelerators such as corral TPUs. SRAM is a precious resource and needs to be conserved.


From the doc:

> At the moment we aren't seeing a significant improvement in the performance of the KAN Convolutional Networks compared to the traditional Convolutional Networks.

So, inference speed probably not improved.

As for model size in bytes, is there any reason to assume it's not directly related to parameter count? I didn't see any mentions of pruning/quantization/other optimizations, so I'll naively go with KKAN being about 400KB. (For inference purposes)

Either way, probably still a bit too early to think about productionizing KANs - there's still a ton of unanswered questions. The biggest one for now probably the fact that they are sloo-ooo-ooow to train.(10x slower than MLPs)

On the upside, you can probably get quite a bit of traction if you publicly look at KANs on low power accelerators, it seems a very open topic :)


This was done quite quickly after the KAN paper. Nice work. It's especially exciting to think that there are novel and useful NN architectures.

It's interesting to think about how much faster KANs might develop into mature systems compared with the original perceptrons.


I wonder how well dataset distillation would work between NN <-> KAN. Could significantly reduce training time by seeding the initial KAN model, thus making more of these experiments feasible to conduct and thus can experiment on larger models.


Nice! Should be called a "kanvolution", my 2¢.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: