← home

MultiOn vs Claude Computer Use

Comparison of real-world performance across 5 tasks (small n, not rigorous)

Summary

MultiOn generally performs tasks faster, but Claude tends to be more accurate.

Task 1: Create a Fully Formatted Itinerary

Prompt: Create a single-page itinerary for a trip to Hawaii using recommendations from Reddit and travel blogs.

MultiOn

Blocked on Reddit security; stuck in a loop, failed to include links.

Claude

Server errors with Reddit; displayed itinerary but unclear if grounded.

MetricMultiOnClaude
CompletedNoYes
Speed240s200s
Accuracy20%70%

Task 2: Suggest a Personalized Website

Prompt: Based on my interest in fantasy art, suggest a website or activity.

MultiOn

Found and clicked on ArtStation fantasy section after searching Google.

Claude

Went to random .com first, then correct site without explanation.

MetricMultiOnClaude
CompletedYesYes
Speed24s180s
Accuracy100%90%

Task 3: Book a Direct Flight

Prompt: Find the most popular beach destination for US travelers in their twenties and book a direct flight for tomorrow morning.

MultiOn

Wrong date, couldn't select places correctly, got stuck in loop.

Claude

Searched URLs in search box, recovered but got rate limited.

MetricMultiOnClaude
CompletedNoNo
Speed
Accuracy0%0%

Task 4: Download and Summarize ML Paper

Prompt: Download the latest ML paper from arXiv, summarize the abstract, and save it in a folder named 'Research'.

MultiOn

Failed to click PDF link; scrolled repeatedly, couldn't download.

Claude

Created folder and summarized, but failed to rename and download.

MetricMultiOnClaude
CompletedNoYes
Speed55s120s
Accuracy20%80%

Task 5: Estimate Calorie Counts

Prompt: Estimate the calorie count of the most popular menu item at Geneva Steakhouse in SF.

MultiOn

Used general photos, didn't search reviews, needed clarification.

Claude

Rate-limited on Maps; estimated ~1500 calories correctly but unclear grounding.

MetricMultiOnClaude
CompletedNoYes
Speed49s360s
Accuracy50%90%