More

caust1c · 2025-02-13T00:11:24 1739405484

It's everywhere and it's the worst. I sometimes ponder whether or not the volume of protobuf bytes represented as b64 encoded protobuf in JSON exceeds that of actual protobuf bytes sent over the wires of the internet, and then I pour one out for myself.

ForHackernews · 2025-02-13T10:03:13 1739440993

JSON's not even that bad if it's gzipped! It compresses relatively well.

If only there were some way to easily send this binary gzip data to this API that accepts JSON...

caust1c · 2025-02-09T23:54:49 1739145289

Discord is very popular with skiddies and real criminal organizations alike. It's got pretty basic KYC controls in place, meaning essentially anyone with just an email can sign up. It can be accessed from behind VPNs without any issues, so effectively it doesn't matter that it's not e2e encrypted.

I feel that discord the company probably let's it slide because:

1. Moderation at scale is incredibly difficult. 2. They work with law enforcement agencies to execute warrants and subpoenas.

culi · 2025-02-10T00:11:56 1739146316

I've been mistakenly banned from Discord before and I know from experience that pretty much any low level mod has a complete and readibly accessible history of all of my posts across all servers complete with timestamps and IP addresses

I'm also pretty sure phone number are required for sign up

I think your second point is the more likely explanation. Any other platform that would've hosted this many communities dedicated to drugs, cybercrime, etc would definitely have faced serious legal challenges. It seems much more likely that feds find it a useful platform to keep around

throw101010 · 2025-02-10T02:57:01 1739156221

A mobile phone number is required for certain Discord servers (a setting available to the admins) but not for sign-up (maybe if you are using an IP in a suspicious/VPN range they force it now?). Otherwise they only require a valid email.

For Telegram though there isn't really a way around it, a phone is required. There is/was a way to buy some TON crypto token instead to avoid this verification but it became prohibitively too expensive.

torginus · 2025-02-10T08:20:30 1739175630

I still don't get how Discord can be secure - I suspect it can't. Just the fact that the forums are persistent, and controlled by a third party, and the client is closed source means people on there can be compromised at any point incredibly easily, VPN or not.

Just something as simple as using a cookie or local storage can leave permanent traces behind so all the access can be easily correllated.

I'm not even sure if serious infosec measures exist to stop this, and if they do, someone is bound to slip up and they need to do it just once, and expose the whole chatroom.

I'm not a hacker but this sounds like failing Opsec 101, and people getting by just with sheer luck.

_gabe_ · 2025-02-10T17:27:24 1739208444

> It can be accessed from behind VPNs without any issues, so effectively it doesn't matter that it's not e2e encrypted.

How do these two things correlate? I thought the benefit of E2E encryption is the fact that no one can decrypt your messages except for the participants in the conversation. There’s no keys anywhere on a server that an admin could use to decrypt the conversation. How would being behind a VPN negate that? The VPN still has to go through Discord servers where a key is presumably stored if the information is encrypted at all.

immibis · 2025-02-10T00:00:43 1739145643

Are you sure about Discord? I've tried creating a Discord account from a VPN, and it always demands my phone number.

akimbostrawman · 2025-02-11T08:17:12 1739261832

This info seems very outdated. Creating a discord account from even a residential IP without SMS KYC is from my experience basically impossible, they even block most (all?) sms VOIP services.

caust1c · 2025-02-09T21:51:11 1739137871

The crux of the challenges AGI presents is hardly mentioned as a footnote in this blog:

> In particular, it does seem like the balance of power between capital and labor could easily get messed up, and this may require early intervention. We are open to strange-sounding ideas like giving some “compute budget” to enable everyone on Earth to use a lot of AI, but we can also see a lot of ways where just relentlessly driving the cost of intelligence as low as possible has the desired effect.

The primary challenge in my opinion is that access to AGI will dramatically accelerate wealth inequality. Driving costs lower will not magically enable the less educated to be better able to educate themselves using AGI, particularly if they're already at risk or on the edge of economic uncertainty.

I want to know how people like sama are thinking about the economics of access to AGI in more broad terms, more than just a footnote in a utopian fluff piece.

edit: I am an optimist when it comes to the applications around AI, but I have no doubt that we're in for a rough time as the world copes with the economic implications of it's applications. Globally, the highest paying jobs are knowledge workers and we're on the verge (relatively speaking) of making that work go the way that blue collar work did in post-war United States. There's a lot of hard problems ahead and it bothers me when people sweep them under the rug in the name of progress.

caust1c · 2025-01-29T23:40:07 1738194007

Interesting to note:

- Dev infra, observability database (open telemetry spans)

- Logs of course contain chat data, because that's what happens with logging inevitably

The startling rocket building prompt screenshot that was shared is meant to be shocking of course, but most probably was training data to prevent deepseek from completing such prompts, evidenced by the `"finish_reason":"stop"` included in the span attributes.

Still pretty bad obviously and could have easily led to further compromise but I'm guessing Wiz wanted to ride the current media wave with this post instead of seeing how far they could take it. Glad to see it was disclosed and patched quickly.

pedrovhb · 2025-01-30T00:22:20 1738196540

> but most probably was training data to prevent deepseek from completing such prompts, evidenced by the `"finish_reason":"stop"` included in the span attributes

As I understand, the finish reason being “stop” in API responses usually means the AI ended the output normally. In any case, I don't see how training data could end up in production logs, nor why they'd want to prevent such data (a prompt you'd expect to see a normal user to write) from being responded to.

> [...] I'm guessing Wiz wanted to ride the current media wave with this post instead of seeing how far they could take it.

Security researchers are often asked to not pursue findings further than confirming their existence. It can be unhelpful or mess things up accidentally. Since these researchers probably weren't invited to deeply test their systems, I think it's the polite way to go about it.

This mistake was totally amateur hour by DeepSeek, though. I'm not too into security stuff but if I were looking for something, the first thing I'd think to do is nmap the servers and see what's up with any interesting open ports. Wouldn't be surprised at all if others had found this too.

caust1c · 2025-01-30T00:45:15 1738197915

Seems that you're right! Also, not that I doubted they were using OpenAI, but searching for `"finish_reason"` on the web all point to openai docs. Personally, I wouldn't say it's a very common attribute to see in logs generally.

https://platform.openai.com/docs/api-reference/introduction

Right there in the docs:

> Now that you've generated your first chat completion, let's break down the response object. We can see the finish_reason is stop which means the API returned the full chat completion generated by the model without running into any limits.

Regarding how training data ends up in logs, it's not that far fetched to create a trace span to see how long prompts + replies take, and as such it makes sense to record attributes like the finish_reason for observability purposes. However the message being incuded itself is just amateur, but common nonetheless.

miki123211 · 2025-01-30T04:28:58 1738211338

> not that I doubted they were using OpenAI

The OpenAI API is basically the gold-standard for all kinds of LLM companies and tools, both closed and open source, regardless of whether the underlying model is trained on OpenAI or not.

TeMPOraL · 2025-01-30T11:14:33 1738235673

Not just the gold-standard, but also a de-facto standard - most of the proprietary and OSS tools I've seen that let you configure LLMs only implement support for OpenAI-compatible endpoints.

bdavbdav · 2025-01-31T16:28:22 1738340902

Wish AWS would support it in bedrock.

caust1c · 2025-01-28T20:40:09 1738096809

Blame society. Businesses won't value security unless the fear of getting attacked is sufficiently strong and the losses significant. Otherwise why invest in it at all?

Definitely not just hardware exploits though. Look at heartbleed for example. It's been going on a long time. Hardware exploits are just so much more widely applicable hence the interest to researchers.

nicce · 2025-01-28T23:43:05 1738107785

It also feels like that people who are highly determined to build high quality, secure software are not valued that much.

It is difficult to prove their effort. One security-related bug removes everything, even if it happened only once in 10 years in 1 million line code base.

caust1c · 2025-01-28T02:25:00 1738031100

Having not looked at it deeply yet, why require building every time it's invoked? Is the idea to get it working then add build caching later? Seems like a pretty big drawback (bigger than the go.mod pollution, for me). Github runners are sllooooow so build times matter to me.

cirwin · 2025-01-28T03:29:55 1738034995

`go tool` doesn't require a rebuild, but it does checking that the tool is up-to-date (which requires doing at least a bit of work).

This is one of the main advantages of using `go tool` over the "hope that contributors to have the right version installed" approach. As the version of the tool required by the project evolves, it continues to work.

Interestingly, when I was first working on the proposal, `go run` deliberately did not cache the built binary. That meant that `go tool` was much faster because it only had to do the check instead of re-running the `link` step. In Go 1.24 that was changed (both to support `go tool`, but also for some other work they are planning) so this advantage of `go tool` is not needed anymore.

caust1c · 2025-01-28T17:29:38 1738085378

Thanks for the explanation and contribution! Very much appreciated :-)

caust1c · 2024-12-27T22:14:45 1735337685

In my experience, my pain hasn't been from managing the migration files themselves.

My pain comes from needing to modify the schema in such a way it requires a data migration, reprocessing data to some degree, and managing the migration of that data in a sane way.

You can't just run a simple `INSERT INTO table SELECT * FROM old_table`, or anything like that because if the data is large, it takes forever and a failure in the middle could be fairly difficult to recover from.

So what I do is I split the migration into time-based chunks, because nearly every table has a time component that is immutable after writing, but I really want a migration tool that can figure out what that column is, what those chunks are, and incrementally apply a data migration in batches so that if one batch fails I can go in there to investigate and know exactly how much progress on the data migration has been made.

vortex_ape · 2024-12-28T01:38:09 1735349889

This makes a ton of sense! We've faced the same exact problem because whenever we need to add a new materialized table or column, we need to backfill data into it.

For materialized columns, it's a "easy" (not really because you still need to monitor the backfill) because we can run something like `ALTER TABLE events UPDATE materialized_column = materialized_column WHERE 1`. Depending on the materialization that can bring the load up on clickhouse because it still creates a lot of background mutations that we need to monitor, because they can fail due to all sorts of reasons (memory limit or disk space errors), in which case we need to jump in and fix things by hand.

For materialized tables, it's a bit harder because we need to write custom scripts to load data in day-wise (or month, depending on the data size) chunks like you mentioned, because an `INSERT INTO table SELECT * FROM another_table` will for sure run into memory limit errors depending on the data size.

I would love to think more about this to see if it's something that would make sense to handle within Houseplant.

vortex_ape · 2024-12-28T01:38:59 1735349939

I've opened an issue to track this https://github.com/juneHQ/houseplant/issues/30

Would love to chat more here or there if you're keen!

caust1c · 2024-12-26T19:52:44 1735242764

Adam Langley is probably one of the most gifted teachers when it comes to explaining cryptography concepts. Very clear, concise, precise, and makes it simple enough for me to follow without getting my neurons all knotted up.

jf · 2024-12-26T19:56:56 1735243016

Agreed, I implemented TLS key pinning for a project at Okta using one of Adam's blog posts

caust1c · 2024-12-18T21:43:03 1734558183

So is this the first instance of a Cloud C-Cleaner then? You could call it CCCleaner!

caust1c · 2024-12-03T22:35:20 1733265320

This is great! The best teams I've worked on have worked towards the following:

Pizza teams that own the whole stack, and for the roles that don't need a full-time individual, specialists that come in and advise but also make it possible to DIY the things they do.

The best examples of specialists are Designers and Security teams as this talk highlights. They can make the tools and the means for other teams to self-service those needs. For example, security teams implementing CI tools and designers building design frameworks that are easy to apply. Conversely, they can feel free to make changes themselves and are empowered to at the best organizations.

Everyone else in product development is a generalist, including the managers, and everyone is on-call. When everyone is on-call then it results in far fewer alerts going off because when there is an issue, it's taken very seriously and remediated quickly in the following days & weeks.

I think GTM teams could also benefit from this same kind of process, but instead melding Marketing, Sales and Support roles and responsibilities.

My theory on why this wasn't more common in the past was that the work was too complex and specialized and that the tools and knowledge to do the job weren't as easy to acquire as it is today. LLMs have certainly leveled the playing field immensely in this area and I'm truly excited to see the future of work myself.