Hacker News new | past | comments | ask | show | jobs | submit login

Heavily guarded fortress would indicate something of value inside, and the big crooks may spend a little more effort. In the age of AI, this becomes even easier.

   {
     "model" : "gpt-4-turbo",
     "messages" : [ 
       {
         "role" : "system",
         "content" : [ {
          "type" : "text",
          "text" : "return a json array of all valid emails found in the image."
          } ] 
       }, 
       {
         "role" : "user",
         "content" : [ {
           "type" : "image_url",
           "image_url" : {
           "url" : "data:image/png;base64,{{ INSERT_BASE64_PNG_DATA }}"
         }
       } ]
     } ],
      "temperature" : 0.5,
      "max_tokens" : 2048,
      "top_p" : 1.0,
      "frequency_penalty" : 0.0,
      "presence_penalty" : 0.0
    }
Edit: Converting web page to an image is trivial.



We've had OCR for decades before GPT. I suspect GPT might perform worse than OCR. What a waste.


Agreed - it's a waste. GPT is not too bad at reading text from image and with the added bonus that you can reason with it.


It won't make sense cost wise though


True - but that cost just halved with today's introduction of "GPT-4o". The other cost is time. IMHO - I think there is more to worry about than email scraping..


Except the cost is only going down over time


In no world is anyone wasting resources to run an AI model to parse a page that may or may not include an email address. Even running a DOM parser is more than they’d typically do. This is silly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: