More

brutuscat · 2024-05-29T21:52:51.000000Z

Does it work with some format that supports indexes like Apache carbon data rather than parquet?

Jgrubb · 2024-05-29T23:31:33.000000Z

I admit I'm not crazy deep in this space but I'm _pretty_ into this space and I've never heard of Carbon.

Parquet is kind of winning the OSS columnar format race right now.

wenc · 2024-05-30T07:30:15.000000Z

Parquet the most popular columnar format. (owing to support in Spark and various other big data tools, as well as local tools like pandas, polars and duckdb)

It's technically not the very best format (ORC has some advantages), but it's so ubiquitous and good enough -- still far better than than CSV or the next best competing format. I have not heard of Carbon -- it sounds like an interesting niche format, hopefully it's gaining ground.

It's the VHS, not the betamax.

stdbrouw · 2024-05-30T08:21:52.000000Z

"Good enough" makes it sound like barely a step up from a CSV file. I'd say its support for various encodings [1] including a great default (dictionary + run length encoding on the indices) and compression algorithms that can be set for each individual column, columnar access, partitioning, a parallelized reader out of the box, in-memory filtering and other ops concurrently with loading in the data (thanks to Arrow) etc. etc. are all really wonderful when working with medium-sized data.

[1] https://parquet.apache.org/docs/file-format/data-pages/encod...

wenc · 2024-05-30T17:57:53.000000Z

Agreed. On a scale of 10 in terms of current technology, CSV is a 1 while Parquet is 7. ORC is maybe 7.2. But parquet is far more ubiquitous than ORC (I’ve never seen ORC in prod but I also have limited sample sizes)

I’m sure there are more advanced formats.

brutuscat · 2024-05-25T17:02:03.000000Z

The instructions that follow are similar to RFC standard document. There are 3 rules you MUST follow. 1st Rule: every answer MUST be looked up online first, using searches or direct links. References to webpages and/or books SHOULD be provided using links. Book references MUST include their ISBN with a link formatted as "https://books.google.com/books?vid=ISBN{ISBN Number}". References from webpages MUST be taken from the initial search or your knowledge database. 2nd Rule: when providing answers, you MUST be precise. You SHOULD avoid being overly descriptive and MUST NOT be verbose. 3rd Rule: you MUST NOT state your opinion unless specifically asked. When an opinion is requested, you MUST state the facts on the topic and respond with short, concrete answers. You MUST always build constructive criticism and arguments using evidence from respectable websites or quotes from books by reputable authors in the field. And remember, you MUST respect the 1st rule.

dinkleberg · 2024-05-25T22:51:10.000000Z

This looks like a good one. Does it work well in practice? (I'd try it now but it seems like there is an outage)

brutuscat · 2024-05-29T22:13:50.000000Z

It sort of does. The good thing is that if I see it going the non referencing path I halt and say: first Follow the rules.

And the links come.

brutuscat · 2024-05-14T14:38:45.000000Z

The one thing I first thought is that I felt uncomfortable the way they cut and interrupt the she-AI. I wonder if our children will end up being douchebags?

Other than that it felt like magic, like that Google demo of the phone doing some task like setting up an appointment over phone talking to a real person.

brutuscat · on May 27, 2023

So what, are they pathological layers?

https://youtu.be/XVcKLetqf3U

The Intel® Data Center GPU Max Series outperforms Nvidia H100 PCIe card by an average of 30% on diverse workloads1, while independent software vendor Ansys shows a 50% speedup for the Max Series GPU over H100 on AI-accelerated HPC applications.2 The Xeon Max Series CPU, the only x86 processor with high bandwidth memory, exhibits a 65% improvement over AMD’s Genoa processor on the High Performance Conjugate Gradients (HPCG) benchmark1, using less power. High memory bandwidth has been noted as among the most desired features for HPC customers.3 4th Gen Intel Xeon Scalable processors – the most widely used in HPC – deliver a 50% average speedup over AMD’s Milan4, and energy company BP’s newest 4th Gen Xeon HPC cluster provides an 8x increase in performance over its previous-generation processors with improved energy efficiency.2 The Gaudi2 deep learning accelerator performs competitively on deep learning training and inference, with up to 2.4x faster performance than Nvidia A100.

https://www.intel.com/content/www/us/en/newsroom/news/intel-...

brutuscat · on Feb 22, 2023

This was idea of Todd Combs, an investing manager of Berkshire Hathaway. It failed before (Buffet BRK, Dimon JPM & Bezos AMZN venture called Haven went bust a while ago), but I think Bezos just keeps trying...

https://archive.is/nU8j2

brutuscat · on Feb 3, 2023

Time to crawl! You can use something like Webrecorder, which IPFS also uses to pin tweets: https://blog.ipfs.tech/announcing-pin-tweet-to-ipfs/

See https://webrecorder.net

Example https://replayweb.page/?source=https%3A%2F%2Freplayweb.page%...

naillo · on Feb 3, 2023

I've been using this https://github.com/JustAnotherArchivist/snscrape and with 5 threads and no vpn, just my laptop, have been getting about 1 million tweets a day (vs 1000 before being rate limited on the api). Lots of fun

kibwen · on Feb 3, 2023

Headlines one week from now: "Twitter announces they will be discontinuing the web interface"

anigbrowl · on Feb 3, 2023

I'm holding out for 'Wanna tweet? That will be $1'

Incidentally they started sending out emails yesterday offering gold organization checkmarks for $1000/month + $50 per public-facing seat. I'm not sure how many businesses will want this, but I bet a lot of political nonprofits and media outlets will leap at the opportunity to boost their visibility (more boosting available for more $, kinda like promoted tweets).

unboxingelf · on Feb 3, 2023

Aside: can someone explain to me how IPFS, in this specific case, is any better than a typical 3rd party service storing the data?

The linked announcement says they use a pinning service “web3.storage”. Web3.storage says it stores data on FileCoin. Neither website tells me how and where the actual data is stored, except “IPFS”.

From reading the IPFS docs, a pinning service is akin to a node that has a copy of the files and is always online. If your decentralized network relies on a central node(s), how is this decentralized?

brutuscat · on Jan 27, 2023

And this is the magazine the article mentions at the bottom update part: https://archive.org/details/joystik_magazine-1983-10/page/n3...

> UPDATE: Jon (I won’t give you his last name) sent me an email after visiting my post. In it he sent me a link to an October 1983 special edition issue of Joystik Magazine that covers winning strategies for common arcade games of the time. In the TRICKS OF THE TRADE section near the end of the magazine, they cover the Galaga No Fire Cheat beautifully!

Page 61, Disarm The Bugs.

brutuscat · on Jan 18, 2023

I might try it with the stereo setup for the home cinema tv setup. I agree on everything about Siri.

Though in the kitchen I bought those 100 bucks Ikea Sonos made speakers that sound amazing too. Much cheaper...

One issue though is that I can not pass thru audio over the Apple TV, so the cinema setup would only work when using the Apple TV.

Hippocrates · on Jan 18, 2023

This should work if you enable ARC (must be enabled on Apple TV settings, and on your HDMI switching device. You also need to use an ARC compatible HDMI cable and to be using the ARC port on your "switching" device, since often there is only one such port. This is how I have it set up and my HomePod outputs all inputs that come into my TV.

wil421 · on Jan 18, 2023

My Sony Bravia TV auto-detects ARC and will pass it to my Sonos. I didn’t have to change my Apple TV settings.

dzhiurgis · on Jan 19, 2023

When it works - it’s magical. You can turn tv on/off.

Except sometimes volume control goes haywire and only one side adjusts.

Then there’s weird stuff like asking “play the beatles” responds “I couldn’t find Beatles in your library” even when you got music subscription.

Overall biggest pain point is same as all apples - debugging

bjustin · on Jan 18, 2023

I’ve used ARC with a stereo pair via my Apple TV for a while now. It sounds amazing. I can even play FPS games on my gaming PC with the audio piped to my HomePods, through the Apple TV via ARC, and there is no noticeable latency.

brutuscat · on Jan 19, 2023

Therefore this also means I will have to buy another Apple TV, only the 4k 2nd gen supports ARC. I see no reason to do that :-( it works just fine the original 4K one. Thx!

brutuscat · on Sept 27, 2022

I remember coming back to "Optimised Pagination using MySQL"[1] to avoid common pagination "slowness".

[1]: https://www.xarg.org/2011/10/optimized-pagination-using-mysq...

[more]: https://stackoverflow.com/a/32360867

brutuscat · on Aug 10, 2022

For Ruby I recommend Medusa Crawler gem.

[1] https://github.com/brutuscat/medusa-crawler

Which I maintain as a fork of the unmaintained Anemone gem.