More

daakus · on Sept 27, 2023

It can! TheBloke is to thank for the incredibly quick turnaround.

https://github.com/ggerganov/llama.cpp/pull/3362

https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF/tree/ma...

moffkalast · on Sept 27, 2023

Birds fly, sun shines, and TheBloke always delivers.

Though I can't figure out that prompt and with LLama2's template it's... weird. Responds half in Korean and does unnecessary numbering of paragraphs.

Just one big sigh towards those supposed efforts on prompt template standardization. Every single model just has to do something unique that breaks all compatibility but has never resulted in any performance gain.

daakus · on Sept 28, 2023

I used the prompt included in llama.cpp and it worked for me in English (for fun GK type questions):

MODEL=./models/mistral-7b-v0.1.Q5_K_M.gguf N_THREAD=16 ./examples/chat-13B.sh

aidenn0 · on Sept 27, 2023

I have yet to get any useful output out of the Q5_K_S version; haven't tried any others yet.

poser-boy · on Sept 28, 2023

Linked is the base model. What you want is the instruct model (also on TheBloke's profile), which has been trained on following instructions.

daakus · on Sept 28, 2023

I used mistral-7b-v0.1.Q5_K_M.gguf and it responded to basic questions.

turnsout · on Sept 27, 2023

Wow, awesome!

daakus · on March 1, 2023

Ah, I remember seeing this a long time ago. It seemed like something CL fans would enjoy, but to me it felt verbose. I guess this is my Clojure preference showing.

Regarding runtime macro-expansion - since Dak is written in JavaScript, it comes for free.

daakus · on March 1, 2023

Nice! My goal in Dak is to reach a point where macros can allow transforming hiccup like syntax to hyperapp or React like function calls, or original hiccup style optimized string concat, or lit-html style template string generation. I know I could use all these for different use cases.

daakus · on March 1, 2023

Thanks!

My fear taking this path is around performance. I've not done any profiling yet, and I'm hoping I don't regret taking this path when I get around to it.

Vanit · on March 3, 2023

Well you know what they call un-optimized code? Shipped :)

daakus · on March 1, 2023

I'm a huge fan of Clojure and have had a lot of fun building things with it. CLJS on the other hand has felt heavy to me, from a browser performance and dev tooling perspective. Clojure startup time always affected me more so with CLJS projects. I hope these two projects alter the landscape for the better!

Besides those aspects, Dak is different than these two specifically in that it tries to provide something closer to a minimal 1-to-1 language feature mapping to JavaScript as the base, with a goal of having essentially no runtime.

The clean room implementation has downsides - Squint and Cherry can reuse Clojure tooling like clj-kondo etc, which Dak cannot. On the other hand Dak is small, the transpiler is under 2k lines as I write this. It can run on virtually any modern JavaScript runtime (all browsers, node, deno, bun etc).

daakus · on Feb 28, 2023

Thanks for the feedback. I'll add some notes comparing it with other attempts like you suggested.

daakus · on March 22, 2021

Safety for me is confidence to use the thing. For me in my own code, but also others on my team that may work on this code.

I mostly have experience building things in GC languages. But with Rust I managed to safely use [1]:

- stack references in threads

- kept mmap references alive until threads finish work

- zero copy xml parsing (from mmaped data!)

- SSE/AVX enabled searching

The Rust language empowered me to do these things with a high degree of confidence. Not one segfault or core dump, just lots of compiler errors.

I played with Zig. Admittedly, the small ecosystem aspect is something all languages go thru, and it would be a better experience with a Zig specific libraries. But Zig doesn't empower library authors to make a large category of bugs impossible, and leaves it to documentation. This is like C, I don't have enough confidence in myself to use it.

Brilliant people are building powerful, safe-ish, reusable libraries in Rust. For mere mortals like me, this is Awesome.

[1]: https://gist.github.com/daaku/58557e2545612df8f40b13b66b7d3b...

burntsushi · on March 22, 2021

Hi, author of the aho-corasick crate here. Your use of it piqued my interest and caused me to take a closer look.

I believe your use of `unsafe` on this line is unsound: https://gist.github.com/daaku/58557e2545612df8f40b13b66b7d3b...

Namely, there is no guarantee that the bytes between `<page>` and `</page>` will be valid UTF-8. It may be the case that you only run this program with UTF-8 input, in which case, UB is never triggered. But it's worth pointing out here since there is nothing actually stopping your program from hitting UB.

Also, as long as you're bringing in the twoway crate, you might as well use it on lines 43 and 48 since you're just searching for a single needle.

daakus · on March 22, 2021

The bytes are assumed to be utf8 (I was using the safer `from_utf8` prior to confirming the data was utf8).

I brought in `twoway` when I couldn't find a way to `rfind` using `aho-corasick`. I'll switch the use over for consistency.

Thanks for the quick code review!

PS: Thanks for ripgrep too!

burntsushi · on March 22, 2021

Ah gotya. Yeah, I haven't added reverse searching to aho-corasick yet. Ran out of steam.

Either way, my point here is to be a counter-balance. To be fair, you did say, "But with Rust I managed to safely use." But the code you posted is technically unsound. It's not a huge deal if you know you'll always be feeding the program valid UTF-8. But it is worth mentioning here in this HN thread that is specifically comparing the safety properties of competing programming languages. :-)

daakus · on March 22, 2021

Correct and fair. Updated the code to remove the safety issue.

burntsushi · on March 22, 2021

Thank you. :-)

daakus · on June 7, 2020

Would be great to understand what communication with Google servers can be turned off via setting changes rather than code changes, and what cannot.

0xy · on June 7, 2020

Chrome sends X-Client-Data headers to DoubleClick and other Google-owned properties, which can be used for tracking purposes. There's no way to disable this behavior.

The header contains a "low entropy" random ID generated by Chrome upon installation. Coupled with other data, this can be used to track users even after clearing cookies and in private mode.

rossjudson · on June 7, 2020

There's a rather precise description of X-Client-Data at https://www.google.com/chrome/privacy/whitepaper.html#variat...

Note that you can reset at any time with the “--reset-variation-state” command line flag.

"Coupled with other data", anybody can track anything.

aspenmayer · on June 7, 2020

I use the GDPR definition for what “other data” means in an online data collection context. Even then, legal hoop-jumping causes those definitions to be gamed, to the detriment of user privacy, and to the boon of site operators and advertisers.

Damian George, Kento Reutimann, Aurelia Tamò-Larrieux, GDPR bypass by design? Transient processing of data under the GDPR, International Data Privacy Law, Volume 9, Issue 4, November 2019, Pages 285–298, https://doi.org/10.1093/idpl/ipz017

Michael Veale, Reuben Binns, Jef Ausloos, When data protection by design and data subject rights clash, International Data Privacy Law, Volume 8, Issue 2, May 2018, Pages 105–123, https://doi.org/10.1093/idpl/ipy002

Frederik J. Zuiderveen Borgesius, Singling out people without knowing their names – Behavioural targeting, pseudonymous data, and the new Data Protection Regulation, Computer Law & Security Review, Volume 32, Issue 2, 2016, Pages 256-271, https://doi.org/10.1016/j.clsr.2015.12.013

NiekvdMaas · on June 7, 2020

You can disable most of the telemetry with command line switches like --disable-background-networking and --disable-sync, but some things like field trials and doubleclick fingerprinting cannot be excluded in regular Chrome/Chromium AFAIK.

frank2 · on June 7, 2020

The flag --disable-background-networking might break some sites:

https://github.com/cypress-io/cypress/issues/1320

est31 · on June 7, 2020

Even basic things like auto-suggestions in the URL bar can't be turned off any more. A while ago there used to be an option for it but it was removed. So when you enter an URL it's automatically sent to Google as you type.

pvg · on June 7, 2020

The setting is 'Autocomplete searches and URLs', just type it in the settings search box. It's still there.

est31 · on June 7, 2020

Indeed it's still present. Thanks for pointing it out!

exikyut · on June 7, 2020

!!!

Scrambles to check settings

It's over in chrome://settings/syncSetup now.

daakus · on June 7, 2020

If the default is set to something else, say DuckDuckGo, it'll go there instead right?

cma · on June 7, 2020

At some point the new tab page stopped being replaceable something less distracting/compulsive like your own custom url. Your homepage can only apply at startup. It isn't something that should have to be an extension.

daakus · on Dec 20, 2018

Anyone have experience with making this work? I tried starting it on the server I have WireGuard running on and it fails to start because it also wants to bind to the UDP port WireGuard uses (even in server mode).

Additionally http://www.cs.columbia.edu/~lennox/udptunnel/ has a note saying:

UDPTunnel is designed to tunnel RTP-style traffic, in which applications send and receive UDP packets to and from the same port (or pair of ports). It does not support request/response-style traffic, in which a client request is sent from a transient port X to a well-known port Y, and the server's response is returned from port Y to port X.

Which from what I understand is exactly what WireGuard does.

daakus · on Feb 3, 2016

One aspect is that upgrading Go only requires upgrading it on the build infrastructure rather than a deployment of a new JVM. The "next build" will simply be a binary built with a new compiler version.