Hacker News new | past | comments | ask | show | jobs | submit | almostdigital's comments login

I thought those pans that are marketed as "granite" with a speckled pattern where just PTFE coated aluminium, is this something different?


Here's the same benchmark with np.matmul instead of native python (on M2 MBP)

    Python             4.216 GFLOPS
    Naive:             6.400 GFLOPS            1.52x faster than Python
    Vectorized:       22.232 GFLOPS            5.27x faster than Python
    Parallelized:     52.591 GFLOPS           12.47x faster than Python
    Tiled:            60.888 GFLOPS           14.44x faster than Python
    Unrolled:         62.514 GFLOPS           14.83x faster than Python
    Accumulated:     506.209 GFLOPS          120.07x faster than Python


Does that use Apple accelerate? Depending on the matrix size, that sees a bit low, even the M1 Pro can easily reach 2.2 TFLOPS.


What is your BLAS backend?


Yeah this is confusing for me: I'm non an expert in numpy * but I had assumed that it would do most of those things - vectorize, unroll, etc, either when compiled or through any backend it's using. I understand that numpy's routines are fixed and that mojo might have more flexibility, but for straight up matrix multiplication I'd be very surprised if it's really leaving that much performance on the table. Although I can appreciate that if it depends separately on what BLAS backend has been installed that is a barrier to getting default fast performance.

* For context I do have done some experience experimenting on the gcc/intel compiler options that are available for linear algebra, and even outside of BLAS, compiling with -o3 -ffast-math -funroll-loops etc does a lot of that, and for simple loops as in matrix vector multiplication, compilers can easily vectorize. I'm very curious if there is something I don't know about that will result in a speedup. See e.g. https://gist.github.com/rbitr/3b86154f78a0f0832e8bd171615236... for some basic playing around


Even OpenBLAS (the default iiuc) does all of that and more to optimize for different levels of the cache hierarchy: https://www.cs.utexas.edu/~flame/pubs/GotoTOMS_revision.pdf

I'm not sure where/how they'd be squeezing out more performance unless its better compilation/compatibility with Apple Silicon intrinsics.

Edit: ..Is Mojo using more than 1 core? I'm not sure I understand their syntax and if they are parallel constructs.

Edit2: Yeah Mojo seems to be parallelizing, so the comparison really isn't fair. The np.config posted elsewhere shows that OpenBLAS is only compiled with MAX_THREADS=3 support, and its not clear what their OPENBLAS_NUM_THREADS/OPENMP_NUM_THREADS was set to at runtime.


I'm not super familiar with Mac but I also notice that numpy here is using openblas64. I had thought the go-to was the Accelerate framework? Or is that part of it somehow? If so it would be interesting to see how that impacts performance. Of course it's all kind of an argument for something like Mojo that gives better performance out of the box. Also an argument for why Mojo would be way more interesting if it was open source.


Just whatever you get by default with pip install numpy... Changing the benchmark to run with a 1024x1024x1024 matrix instead of a 128x128x128 does speed up numpy significantly though

    Python           119.189 GFLOPS
    Naive:             6.275 GFLOPS            0.05x faster than Python
    Vectorized:       22.259 GFLOPS            0.19x faster than Python
    Parallelized:     50.258 GFLOPS            0.42x faster than Python
    Tiled:            59.692 GFLOPS            0.50x faster than Python
    Unrolled:         62.165 GFLOPS            0.52x faster than Python
    Accumulated:     565.240 GFLOPS            4.74x faster than Python
np.__config__:

    Build Dependencies:
      blas:
        detection method: pkgconfig
        found: true
        include directory: /opt/arm64-builds/include
        lib directory: /opt/arm64-builds/lib
        name: openblas64
        openblas configuration: USE_64BITINT=1 DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS=
          NO_LAPACK= NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP= SANDYBRIDGE MAX_THREADS=3
        pc file directory: /usr/local/lib/pkgconfig
        version: 0.3.23.dev
      lapack:
        detection method: internal
        found: true
        include directory: unknown
        lib directory: unknown
        name: dep4364960240
        openblas configuration: unknown
        pc file directory: unknown
        version: 1.26.1
    Compilers:
      c:
        commands: cc
        linker: ld64
        name: clang
        version: 14.0.0
      c++:
        commands: c++
        linker: ld64
        name: clang
        version: 14.0.0
      cython:
        commands: cython
        linker: cython
        name: cython
        version: 3.0.3
    Machine Information:
      build:
        cpu: aarch64
        endian: little
        family: aarch64
        system: darwin
      host:
        cpu: aarch64
        endian: little
        family: aarch64
        system: darwin
    Python Information:
      path: /private/var/folders/76/zy5ktkns50v6gt5g8r0sf6sc0000gn/T/cibw-run-27utctq_/cp310-macosx_arm64/build/venv/bin/python
      version: '3.10'
    SIMD Extensions:
      baseline:
      - NEON
      - NEON_FP16
      - NEON_VFPV4
      - ASIMD
      found:
      - ASIMDHP
      not found:
      - ASIMDFHM


If you are looking for improved performance, you will always go with NumPy + vectorization. That's what is important. So I don't know what is the argument here, am I missing something?


I did just that, it's 120x faster than numpy. See my comment to OP.


Here's the same benchmark with numpy instead of native python (on M2 MBP)

    Python             4.216 GFLOPS
    Naive:             6.400 GFLOPS            1.52x faster than Python
    Vectorized:       22.232 GFLOPS            5.27x faster than Python
    Parallelized:     52.591 GFLOPS           12.47x faster than Python
    Tiled:            60.888 GFLOPS           14.44x faster than Python
    Unrolled:         62.514 GFLOPS           14.83x faster than Python
    Accumulated:     506.209 GFLOPS          120.07x faster than Python


Haha! This is great! Maybe it could be less jarring if you put a low pass filter on the angle changes


This is great, I'm going to switch to this from black. Being used to working in other languages I feel like I'm swimming in molasses when using Python.

It's funny that all good things for Python are not written in Python. Says a lot about the language.


It doesn't replace black, at least not yet. It sounds like autoformatting is planned.


Yeah, just realized that and was about to edit my comment. Found an issue tracking the black replacement bit: https://github.com/charliermarsh/ruff/issues/1904


SvelteKit's routing pattern is anything but clean. It uses filesystem based routes where every file is called "+page.svelte" (or +page.server.js/ts for API only routes).

For anything but a demo app with just a couple of routes it's a headache to navigate.

Edit: Another smell of the routing system to me is that you need to resort to complexity like this [1] to have anything but top down inheritance. I love Svelte btw and have used it in many projects.

[1] https://kit.svelte.dev/docs/advanced-routing#advanced-layout...


Having built a rather large app using SvelteKit, I found the routing scheme to make lots of sense actually. You always know what code is located where, concepts transfer everywhere consistently, and concerns are nicely separated.


In theory it seems neat but in practice I'm here with 10 tabs all called "+page.svelte" or "+page.server.ts" and it completely breaks my workflow since I can't tell them apart or navigate with fuzzy name matching. How do you deal with that?


Generally you should try to get away from clicking around on file tabs and use your command palette but one thing you can do, if you're using VS Code, is change the "Label format" to "short" in the settings.


As I've grown older, I've seen lots of improvements to software usability - it warms my hear to see the attitude of yesteryear's "you're not using it right, consider changing your lifestyle" still alive and well.


Why should you get away from clicking tabs, apart from having to deal with this issue?


Because it's much faster, goes for most navigation in code editors, try it! :)


I'm offended! I don't click around like some lowly peasant, I navigate by keyboard like God intended.


For anyone wishing for a VS Code fix, comment on this issue: https://github.com/microsoft/vscode/issues/41909#issuecommen...


It would be great if VSCode provided an API for the tab labels but the Svelte team really created the problem in the first place.

This whole file-based routing wasn't a great idea to begin with, as a general solution to routing. It works (up to a point) for a static site generator, but most SSGs provide a permalink setting so there's an escape hatch.

The whole thing gets even worse when you start introducing issues like having dozens if not hundreds of files with the same filename, weird characters in folder and file names, etc.


Try a JetBrains IDE - if two tabs with the same filename are open, it will automatically prepend the name of the containing folder (recursively up, until the names are different). I wouldn't know how to stay sane for the same reason otherwise.


This.

It might look complicated at first sight, but once you actually use it you really appreciate the way it's structured.


Well yeah you can get used to anything. Doesn't mean it's a good idea to begin with.


I completely agree.

I've been using Svelte happily for years, but I won't be using SvelteKit. In part because of the routing but also because it doesn't really solve much in the backend.

It's amazing that all the full stack frameworks (Next, Nuxt, SvelteKit, Remix, Astro, etc) are investing so much effort into reinvent the backend and after years they still don't provide even basic backend functionality. For example, out of the box, Fastify gives you validation, sessions, CORS, cache headers, etc. Features that you need in probably all backend projects.

I started this repo to figure out how to integrate Svelte with Fastify using Vite. It has hot reload, partial hydration, etc. It's very quick and dirty code, but it works.

https://github.com/PierBover/fastify-vite-svelte-template


It's only a headache if you make it one. With an open mind, it looks pretty neat to be honest. The top down inheritance doesn't look complex, just weird.


I agree with this comment. SvelteKit is unnecessarily complicated. The `+page.svelte` etc are just the start of it.

Plus I can't use it with Go, PHP, Ruby, Rust etc when it comes to SSR (without running multiple servers and handling deployment nightmares).

Something about this whole Node + SSR Front-end is smelly. (next, nuxt, solidstart) I love Svelte as a framework and a way of writing UI, but SvelteKit. Eh! Not so much.

SvelteKit is too much complexity for no reason. Goes opposite of what Svelte was meant to be: Simple and intuitive.


Curious to hear more, what specifically is complicated about it?

SvelteKit is a JavaScript framework, it makes sense that you can't use it with other languages. You can pair it with a backend of your choice of course, but to get the SSR benefits you do need to work within the framework.

There are other ways of using Svelte with other languages, I would take a look at something like Inertia.js [0].

[0] https://inertiajs.com


Hey Kevin! First of all. I am a regular listener to your podcast! Love it!

Now, I am not specifically targeting SvelteKit to be fair. But the whole host of meta-frameworks like NextJS, NuxtJS etc which Vercel is pushing.

These frameworks in general are too much complexity added. SSR is hard and the best way to handle all of these is to not have a server layer at all. That is just abstract the rendering/routing part and leave the rest of the server stuffs to the user.

I want to simply write my Fastify/Express/Go-gin/Django app. Then add SvelteKit as my front-end with SSR support.

Right now, I first write SvelteKit and then think how am I going to integrate Express and Fastify to it (for a moment let's leave non-node solutions).

Trust me. If you simply leave the server out of SK and generate a simple API abstraction like `res.send(renderSKPath('/users/:id', { serverData }))`. It would have done the job.

I think its difficult to express what I want to say, but in short, remove the server and keep SK as a rendering layer only.

P.S. I know that there is an express adapter for SK. But that is not the point of this comment at all.


You're asking for two things that seem largely incompatible. How do you expect to do SSR in a Go-gin or Django app? Svelte components get compiled to JavaScript and SvelteKit is written in JavaScript. Doing SSR in those frameworks would necessitate calling JavaScript from Go or Python and introduce far more complexity if you could get it to work at all. The simplest options are either to run a Node server or turn off SSR, which you can do with 1-line in SvelteKit.


“SvelteKit is too complicated, it doesn’t even have this thing that would certainly make it more complicated!”


Looking forward to trying this, VSCode is great but I really miss the performance of Sublime Text. I hope they get the plugin system right, killer feature would be if it could load VSCode plugins (incredibly hard to pull off, yes)


Thanks, almostdigital!

After our past experience with Atom, getting the plugin system right is a top priority for the editor.

The thought of cross compatibility with VSCode plugins definitely crossed our mind and it's not out of the question, although our current plan is to initially support plugins using WASM.



Automatic dark/light mode tracking sunset/sunrise is the best of both worlds IMHO.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: