cross-posted from: https://programming.dev/post/37726760
- Guardrails can be bypassed: With prompt injection, ChatGPT agents can be manipulated into breaking built-in policies and solving CAPTCHAs.
- CAPTCHA defenses are weakening: The agent solved not only simple CAPTCHAs but also image-based ones - even adjusting its cursor to mimic human behavior.
- Enterprise risk is real: Attackers could reframe real controls as “fake” to bypass them, underscoring the need for context integrity, memory hygiene, and continuous red teaming.
I posted this elsewhere, but CAPTCHAs have always been used to train models, and have always had to improve themselves even before LLMs blew up. This article was posted from a site with an .ai tld, and seems to be doing the whole Sam Altman “I’m scared of AI, AGI is right around the corner! I certainly don’t have a vested interest in making you think it does more than it actually does”