Can an agent solve an audio captcha if we give it transcription tools?
Interested in agents and accessibility, I built an audio CAPTCHA generator with 100 character and math questions — like "how many r's are in strawberry" or "what is 1+52", but asked in audio form.
Then tested if GPT-4o could solve it when given a transcription tool.
People with visual impairments struggle with traditional CAPTCHAs.
Audio CAPTCHA generator testing if GPT-4o could solve it with transcription tools.
Explored the intersection of accessibility and AI agent capabilities.