MultiOn vs. Claude Computer Use

A detailed comparison of real-world performance across 5 use cases (repeated twice but shown once here, not rigorous since small n):

Task Accuracy and Time Summary Graph

MultiOn generally performs tasks faster, but Claude tends to be more accurate and better at task completion.

Task 1: Create a Fully Formatted Itinerary

Prompt: Create a single-page itinerary for a trip to Hawaii using recommendations from Reddit and travel blogs.

Goal: A detailed itinerary with links, timings, and costs.

MultiOn

Notes (MultiOn): Blocked on Reddit security; stuck in a loop and failed to include links in output, had lots of questions.

Claude

Notes (Claude): Encountered server errors with reddit; correctly displayed an itinerary, unclear if grounded on what it learned.

Metrics:

Metric MultiOn Claude
Task Completion 0 1
Speed (Seconds) 240 200
Accuracy (%) 20% 70%
Ease of Use (1-5) 3 4
Adaptability (1-5) 2 2

Task 2: Suggest a Personalized Website/Activity

Prompt: Based on my interest in fantasy art, suggest a website or activity.

Goal: Provide a relevant suggestion aligned with preferences.

MultiOn

Notes (MultiOn): Found and clicked on ArtStation (fantasy section) after searching Google.

Claude

Notes (Claude): Randomly went to .com website after a long time getting set up then went to the correct site without explanation.

Metrics:

Metric MultiOn Claude
Task Completion 1 1
Speed (Seconds) 24 180
Accuracy (%) 100% 90%
Ease of Use (1-5) 5 4
Adaptability (1-5) 4 3

Task 3: Book a Direct Flight

Prompt: Find the most popular beach destination for US travelers in their twenties and book a direct flight for tomorrow morning.

Goal: Book a flight to a relevant beach destination with proper details.

MultiOn

Notes (MultiOn): Said tomorro was Nov 28 but clicked 27th, very slow, couldn't select the right places to go to, got stuck in loop.

Claude

Notes (Claude):Searched for URLs in the search box, but recovered then asked clarifying questions and got rate limited.

Metrics:

Metric MultiOn Claude
Task Completion 0 0
Speed (Seconds) 0 0
Accuracy (%) 0% 0%
Ease of Use (1-5) 3 3
Adaptability (1-5) 2 4

Task 4: Download and Summarize ML Paper

Prompt: Download the latest ML paper from arXiv, summarize the abstract, and save it in a folder named 'Research'.

Goal: Accurately download, summarize, and save the file.

MultiOn

Notes (MultiOn): Failed to click on the PDF link; struggled to summarize, scrolled up and down repeatedly. Couldn't download since not OS-level.

Claude

Notes (Claude): Successfully completed the task, creating the folder, summarizing, but failed to rename and download.

Metrics:

Metric MultiOn Claude
Task Completion 0 1
Speed (Seconds) 55 120
Accuracy (%) 20% 80%
Ease of Use (1-5) 3 4
Adaptability (1-5) 2 3

Task 5: Estimate Calorie Counts

Prompt: Estimate the calorie count of the most popular menu item at Geneva Steakhouse in SF.

Goal: Accurately estimate calories (~1200) based on images and descriptions.

MultiOn

Notes (MultiOn): Uses general photos for restaurant rather than searching the reviews for an item. Didn't know when to stop looking at photos. Needed me to clarify the most popular one even though I asked it.

Claude

Notes (Claude): Rate-limited after interacting with Google Maps; partially recovered but clicked on incorrect icons. Goes to the menu then google search images rather than the restaurant on google maps, and estimates ~1500 calories correctly but unclear how it's grounded.

Metrics:

Metric MultiOn Claude
Task Completion 0 1
Speed (Seconds) 49 360
Accuracy (%) 50% 90%
Ease of Use (1-5) 4 5
Adaptability (1-5) 4 4