Hacker News new | past | comments | ask | show | jobs | submit | brutuscat's comments login

Does it work with some format that supports indexes like Apache carbon data rather than parquet?

https://github.com/apache/carbondata


I admit I'm not crazy deep in this space but I'm _pretty_ into this space and I've never heard of Carbon.

Parquet is kind of winning the OSS columnar format race right now.


Parquet the most popular columnar format. (owing to support in Spark and various other big data tools, as well as local tools like pandas, polars and duckdb)

It's technically not the very best format (ORC has some advantages), but it's so ubiquitous and good enough -- still far better than than CSV or the next best competing format. I have not heard of Carbon -- it sounds like an interesting niche format, hopefully it's gaining ground.

It's the VHS, not the betamax.


"Good enough" makes it sound like barely a step up from a CSV file. I'd say its support for various encodings [1] including a great default (dictionary + run length encoding on the indices) and compression algorithms that can be set for each individual column, columnar access, partitioning, a parallelized reader out of the box, in-memory filtering and other ops concurrently with loading in the data (thanks to Arrow) etc. etc. are all really wonderful when working with medium-sized data.

[1] https://parquet.apache.org/docs/file-format/data-pages/encod...


Agreed. On a scale of 10 in terms of current technology, CSV is a 1 while Parquet is 7. ORC is maybe 7.2. But parquet is far more ubiquitous than ORC (I’ve never seen ORC in prod but I also have limited sample sizes)

I’m sure there are more advanced formats.


The instructions that follow are similar to RFC standard document. There are 3 rules you MUST follow. 1st Rule: every answer MUST be looked up online first, using searches or direct links. References to webpages and/or books SHOULD be provided using links. Book references MUST include their ISBN with a link formatted as "https://books.google.com/books?vid=ISBN{ISBN Number}". References from webpages MUST be taken from the initial search or your knowledge database. 2nd Rule: when providing answers, you MUST be precise. You SHOULD avoid being overly descriptive and MUST NOT be verbose. 3rd Rule: you MUST NOT state your opinion unless specifically asked. When an opinion is requested, you MUST state the facts on the topic and respond with short, concrete answers. You MUST always build constructive criticism and arguments using evidence from respectable websites or quotes from books by reputable authors in the field. And remember, you MUST respect the 1st rule.


This looks like a good one. Does it work well in practice? (I'd try it now but it seems like there is an outage)


It sort of does. The good thing is that if I see it going the non referencing path I halt and say: first Follow the rules.

And the links come.


The one thing I first thought is that I felt uncomfortable the way they cut and interrupt the she-AI. I wonder if our children will end up being douchebags?

Other than that it felt like magic, like that Google demo of the phone doing some task like setting up an appointment over phone talking to a real person.


So what, are they pathological layers?

https://youtu.be/XVcKLetqf3U

The Intel® Data Center GPU Max Series outperforms Nvidia H100 PCIe card by an average of 30% on diverse workloads1, while independent software vendor Ansys shows a 50% speedup for the Max Series GPU over H100 on AI-accelerated HPC applications.2 The Xeon Max Series CPU, the only x86 processor with high bandwidth memory, exhibits a 65% improvement over AMD’s Genoa processor on the High Performance Conjugate Gradients (HPCG) benchmark1, using less power. High memory bandwidth has been noted as among the most desired features for HPC customers.3 4th Gen Intel Xeon Scalable processors – the most widely used in HPC – deliver a 50% average speedup over AMD’s Milan4, and energy company BP’s newest 4th Gen Xeon HPC cluster provides an 8x increase in performance over its previous-generation processors with improved energy efficiency.2 The Gaudi2 deep learning accelerator performs competitively on deep learning training and inference, with up to 2.4x faster performance than Nvidia A100.

https://www.intel.com/content/www/us/en/newsroom/news/intel-...


This was idea of Todd Combs, an investing manager of Berkshire Hathaway. It failed before (Buffet BRK, Dimon JPM & Bezos AMZN venture called Haven went bust a while ago), but I think Bezos just keeps trying...

https://archive.is/nU8j2


Time to crawl! You can use something like Webrecorder, which IPFS also uses to pin tweets: https://blog.ipfs.tech/announcing-pin-tweet-to-ipfs/

See https://webrecorder.net

Example https://replayweb.page/?source=https%3A%2F%2Freplayweb.page%...


I've been using this https://github.com/JustAnotherArchivist/snscrape and with 5 threads and no vpn, just my laptop, have been getting about 1 million tweets a day (vs 1000 before being rate limited on the api). Lots of fun


Headlines one week from now: "Twitter announces they will be discontinuing the web interface"


I'm holding out for 'Wanna tweet? That will be $1'

Incidentally they started sending out emails yesterday offering gold organization checkmarks for $1000/month + $50 per public-facing seat. I'm not sure how many businesses will want this, but I bet a lot of political nonprofits and media outlets will leap at the opportunity to boost their visibility (more boosting available for more $, kinda like promoted tweets).


Aside: can someone explain to me how IPFS, in this specific case, is any better than a typical 3rd party service storing the data?

The linked announcement says they use a pinning service “web3.storage”. Web3.storage says it stores data on FileCoin. Neither website tells me how and where the actual data is stored, except “IPFS”.

From reading the IPFS docs, a pinning service is akin to a node that has a copy of the files and is always online. If your decentralized network relies on a central node(s), how is this decentralized?


And this is the magazine the article mentions at the bottom update part: https://archive.org/details/joystik_magazine-1983-10/page/n3...

> UPDATE: Jon (I won’t give you his last name) sent me an email after visiting my post. In it he sent me a link to an October 1983 special edition issue of Joystik Magazine that covers winning strategies for common arcade games of the time. In the TRICKS OF THE TRADE section near the end of the magazine, they cover the Galaga No Fire Cheat beautifully!

Page 61, Disarm The Bugs.


I might try it with the stereo setup for the home cinema tv setup. I agree on everything about Siri.

Though in the kitchen I bought those 100 bucks Ikea Sonos made speakers that sound amazing too. Much cheaper...

One issue though is that I can not pass thru audio over the Apple TV, so the cinema setup would only work when using the Apple TV.


This should work if you enable ARC (must be enabled on Apple TV settings, and on your HDMI switching device. You also need to use an ARC compatible HDMI cable and to be using the ARC port on your "switching" device, since often there is only one such port. This is how I have it set up and my HomePod outputs all inputs that come into my TV.


My Sony Bravia TV auto-detects ARC and will pass it to my Sonos. I didn’t have to change my Apple TV settings.


When it works - it’s magical. You can turn tv on/off.

Except sometimes volume control goes haywire and only one side adjusts.

Then there’s weird stuff like asking “play the beatles” responds “I couldn’t find Beatles in your library” even when you got music subscription.

Overall biggest pain point is same as all apples - debugging


I’ve used ARC with a stereo pair via my Apple TV for a while now. It sounds amazing. I can even play FPS games on my gaming PC with the audio piped to my HomePods, through the Apple TV via ARC, and there is no noticeable latency.


Therefore this also means I will have to buy another Apple TV, only the 4k 2nd gen supports ARC. I see no reason to do that :-( it works just fine the original 4K one. Thx!


I remember coming back to "Optimised Pagination using MySQL"[1] to avoid common pagination "slowness".

[1]: https://www.xarg.org/2011/10/optimized-pagination-using-mysq...

[more]: https://stackoverflow.com/a/32360867


For Ruby I recommend Medusa Crawler gem.

[1] https://github.com/brutuscat/medusa-crawler

Which I maintain as a fork of the unmaintained Anemone gem.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: