It is next conceptually, but it’s not proven you can do it till someone’s done it. Other rumors suggest whatever GPT4 is, it’s not multimodal.
I don’t believe large models are great multimodal demonstrations either, insofar as being large just lets you memorize different modalities side by side without necessarily integrating them.
I don’t believe large models are great multimodal demonstrations either, insofar as being large just lets you memorize different modalities side by side without necessarily integrating them.