Hacker News new | past | comments | ask | show | jobs | submit login

The purpose of this research is to compare large vision-language models where the vision component is pre-trained using different techniques, namely on image classification versus unsupervised contrastive pre-training (see OpenAI's CLIP). PaLI-3 also isn't an instruction-tuned model, so comparing it to Llava would be a little apples-to-oranges.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: