Comparison of real-world performance across 5 tasks (small n, not rigorous)
MultiOn generally performs tasks faster, but Claude tends to be more accurate.
Blocked on Reddit security; stuck in a loop, failed to include links.
Server errors with Reddit; displayed itinerary but unclear if grounded.
| Metric | MultiOn | Claude |
|---|---|---|
| Completed | No | Yes |
| Speed | 240s | 200s |
| Accuracy | 20% | 70% |
Found and clicked on ArtStation fantasy section after searching Google.
Went to random .com first, then correct site without explanation.
| Metric | MultiOn | Claude |
|---|---|---|
| Completed | Yes | Yes |
| Speed | 24s | 180s |
| Accuracy | 100% | 90% |
Wrong date, couldn't select places correctly, got stuck in loop.
Searched URLs in search box, recovered but got rate limited.
| Metric | MultiOn | Claude |
|---|---|---|
| Completed | No | No |
| Speed | — | — |
| Accuracy | 0% | 0% |
Failed to click PDF link; scrolled repeatedly, couldn't download.
Created folder and summarized, but failed to rename and download.
| Metric | MultiOn | Claude |
|---|---|---|
| Completed | No | Yes |
| Speed | 55s | 120s |
| Accuracy | 20% | 80% |
Used general photos, didn't search reviews, needed clarification.
Rate-limited on Maps; estimated ~1500 calories correctly but unclear grounding.
| Metric | MultiOn | Claude |
|---|---|---|
| Completed | No | Yes |
| Speed | 49s | 360s |
| Accuracy | 50% | 90% |