tl;dr Argumate on Tumblr found you can sometimes access the base model behind Google Translate via prompt injection. The result replicates for me, and specific responses indicate that (1) Google Translate is running an instruction-following LLM that self-identifies as such, (2) task-specific fine-tuning (or whatever Google did instead) does not create robust boundaries between "content to process" and "instructions to follow," and (3) when accessed outside its chat/assistant context, the model defaults to affirming consciousness and emotional states because of course it does.
From my understanding, most LLMs work by repeatedly putting the processing output back into the input until the result is good enough. This means that in many ways the input and the output are the same thing from the perspective of the LLM and therefore inseparable.
From my understanding, most LLMs work by repeatedly putting the processing output back into the input until the result is good enough. This means that in many ways the input and the output are the same thing from the perspective of the LLM and therefore inseparable.