Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
IncreasePosts
3 months ago
|
parent
|
context
|
favorite
| on:
Minifying HTML for GPT-4o: Remove all the HTML tag...
Isn't GPT-4o multimodal? Shouldn't I be able to just feed in an image of the rendered HTML, instead of doing work to strip tags out?
spencerchubb
3 months ago
|
next
[–]
it is theoretically possible, but the results and bandwidth would be worse. sending an image that large would take a lot longer than sending text
brookst
3 months ago
|
parent
|
next
[–]
This. Images passed to LLMs are typically downsampled to something like 512x512 because that’s perfectly good for feature extraction. Getting text would mean very large images so the text is still readable.
tedsanders
3 months ago
|
prev
[–]
Images are much less reliable than text, unfortunately.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: