Hacker News new | past | comments | ask | show | jobs | submit | geuis's comments login

  SEEKING WORK
  Location: San Francisco
  Remote: remote/hybrid/in-office
  Willing to relocate: no
  Technologies: full stack engineer, JavaScript, node.js, Python, html/css, React and other modern web frameworks
  Résumé/CV: on request 
  Email: charles@geuis.com
  Github: https://github.com/geuis
  LinkedIn: https://linkedin.com/in/geuis

  Location: San Francisco
  Remote: remote/hybrid/in-office
  Willing to relocate: no
  Technologies: full stack engineer, JavaScript, node.js, Python, html/css, React and other modern web frameworks
  Résumé/CV: on request 
  Email: charles@geuis.com
  Github: https://github.com/geuis
  LinkedIn: https://linkedin.com/in/geuis

Hey Google. Here's a really stupid idea.

Knock it off.

Your core search result product has gotten increasingly worse and less reliable over at least the last 5 years. YouTube's search results are nearly unusable.

I can't imagine almost any external customer is asking for the AI bullshit thing that's just being shovelwared into everything Alphabet product now.

I just noticed a couple days ago the gmail iOS app now does the same predictive completion that Copilot tries to do when I'm working. It's annoying as hell and I can't find how or if I can turn it off.

Stop bullshitting around with ruining your products and get back to making money by making accessing information easier and more accurate.


Google: Hey geuis, our revenue is record, our stock value is record, our metrics are all at record. The execs making decisions have just paid of millions in stock [1] making them staggeringly rich no matter what happens in the future. We can't hear your over the sound of green bills going BRRRRR.

[1]: https://www.businessinsider.com/alphabet-google-executive-pa...


Most accurate description of Google I have seen. YT search is so, so bad. Three relevant results followed by twelve "people also watched" results then back to the good results.

Although ChatGPT is a great product, I rely on it more and more not because it's improving, but because Google results are getting worse.

Yeah I would still fact check for complex, indepth things...but for quick things where I'm knowledgeable enough I can smell the hallucinations from a mile away, ChatGPT 100%.


Why? What's the point?

All you're doing I making it slightly more difficult for the people that want to contact you to do so.

OCR has been a thing for years.

Just put your email out there. That's what spam filters are for.

charles@geuis.com. There. Scrape it. Spam it. I don't care.

Edit:

Yes, thank you for signing me up for the DNC (already a member), some random Trump org, something about Scientology, and another random christian-based website. Honestly, I'm kind of sad at the lack of originality given the otherwise extremely ingenious community we have here.


But you just proved the point. You might not care to be signed up for some random Trump org, Scientology, or whatever, but other people do care and if you want to author a website that responsibly uses people's emails without subjecting them to unnecessary spam, then it's worth taking these techniques (not necessarily this specific one) into consideration.

While OCR does exist it's incredibly expensive compared to text scraping. The main way to combat spam is to make the cost of spamming more expensive than the benefit.


I love the concept of ED and spent many an hour exploring. But when you have a full size galaxy, everything gets fairly repetitive after a while.

I had a lot of hope for KSP2 and everything that was on the roadmap. Nowhere near a galaxy scale, but much more realistic combined with having to build all of the infrastructure to reach the nearest star. Maybe one day we'll get a game that delivers without being shut down.


For those like me who were wondering, it seems that the game studio in charge of Kerbal Space Program 2 (KSP2 above) has announced it will be shuttered down [0]. This is a few days old.

[0] https://www.ign.com/articles/take-two-shutters-kerbal-space-...



A lot of larger LLM's have been trained on millions of pages of html. They have the ability to understand raw html structure and extract content from them. I've been having some success with this using Mixtral 8x7B.


It is potentially expensive, but here's a different take.

Instead of writing a bunch of selectors that break often, imagine just being able to write a paragraph telling the LLM to fetch the top 10 headlines and their links on a news site. Or to fetch the images, titles, and prices off a store front?

It abstracts away a lot of manual fragile work.


I get that and LLMs are expected to get better.

Today, would you build a scraper with current LLMs that randomly hallucinate? I wouldn't.

The idea of a LLM powered scraper adapting the selectors every time the website owner updates it, it's pretty cool.


At my job we are scraping using LLMs. For a 10M sector of the company. GPT4 turbo has never not once out of 1.5 million API requests hallucinated. We however use it to parse data and interpret it from webpages, this is something you wouldn't be able to do with a regular scraper. Not well atleast.


Bold claim, did you review all 1.5 million requests?


I guess the claim is based on statistical sampling at reasonably high level to be sure that if there were hallucinations you would catch them? Or is there something else you're doing?

Do you have any workflow tools etc. to find hallucinations, I've got a project in backlog to build that kind of thing and would be interested in how you sort through bad and good results.


in this case we had 1.5 millioon ground truths for our testing purposes. we now have run it over 10 million, but i didnt want to claim it had 0 hallucinations on those as technically we cant say for sure, but considering the hallucination rate was 0% for 1.5 million when compared to ground truths im fairly confident.


How do you know that's true?


the 1.5 million was our test set. we had 1.5 million ground truths, and it didnt make up fake data for a single one


That's not what I asked. I asked "How did you determine that it didn't make up/get information wrong for all 1.5m?"


I've written thousands of scrapers and trust me, they don't break often.


Me too but for adversaries that obfuscate and change their site often to prevent scrapping. It can happen depending on what you are looking at.


Scrapers well written should be able to cope with site changes.


https://github.com/Skyvern-AI/skyvern

This is pretty much what we're building at Skyvern. The only problem is that inference cost is still a little bit too high for scraping, but we expect that to change in the next year


That's an interesting take. I've been experimenting with reducing the overall rendered html size to just structure and content and using the LLM to extract content from that. It works quite well. But I think your approach might be more efficient and faster.


One fun mechanism I've been using for reducing html size is diffing (with some leniency) pages from same domain to exclude common parts (ie headers/footers). That preprocessing can be useful for any parsing mechanism..


@dang can we please update this title to actually reflect the source title: "Controversial paper claims satellite 'megaconstellations' like SpaceX's could weaken Earth's magnetic field and cause 'atmospheric stripping.' Should we be worried?"

Generally, no this isnt possible. The planet's magnetic field is generated naturally by the movement of so much hot iron and other elements it eclipses the total of most of the other freely available metals in the solar system.

The Earth isn't the biggest planet, but we have the biggest single conglomeration of metals outside of the gas giants. Combine Luna, Mars, and all of the known asteroids and it only almost comes to a similar mass.

A few dozen tons of launched material isn't going to have any noticeable impact on our magnetic field.


> @dang can we please update this title to actually reflect the source title: "Controversial paper claims satellite 'megaconstellations' like SpaceX's could weaken Earth's magnetic field and cause 'atmospheric stripping.' Should we be worried?"

I tried to keep the same name as the article, but had to shorten it due to HN's character restrictions. The shortening could be interpreted to sensationalize. But it's purely intended to include all relevant elements in the title. While within the character limit. I tried & this was the best I could do.

I was hoping that the word *could* would imply the speculative nature of the article.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: