Task Accuracy and Time Summary Graph
MultiOn generally performs tasks faster, but Claude tends to be more accurate and better at task completion.
Task 1: Create a Fully Formatted Itinerary
Prompt: Create a single-page itinerary for a trip to Hawaii using recommendations from Reddit and travel blogs.
Goal: A detailed itinerary with links, timings, and costs.
MultiOn
Notes (MultiOn): Blocked on Reddit security; stuck in a loop and failed to include links in output, had lots of questions.
Claude
Notes (Claude): Encountered server errors with reddit; correctly displayed an itinerary, unclear if grounded on what it learned.
Metrics:
Metric | MultiOn | Claude |
---|---|---|
Task Completion | 0 | 1 |
Speed (Seconds) | 240 | 200 |
Accuracy (%) | 20% | 70% |
Ease of Use (1-5) | 3 | 4 |
Adaptability (1-5) | 2 | 2 |
Task 2: Suggest a Personalized Website/Activity
Prompt: Based on my interest in fantasy art, suggest a website or activity.
Goal: Provide a relevant suggestion aligned with preferences.
MultiOn
Notes (MultiOn): Found and clicked on ArtStation (fantasy section) after searching Google.
Claude
Notes (Claude): Randomly went to .com website after a long time getting set up then went to the correct site without explanation.
Metrics:
Metric | MultiOn | Claude |
---|---|---|
Task Completion | 1 | 1 |
Speed (Seconds) | 24 | 180 |
Accuracy (%) | 100% | 90% |
Ease of Use (1-5) | 5 | 4 |
Adaptability (1-5) | 4 | 3 |
Task 3: Book a Direct Flight
Prompt: Find the most popular beach destination for US travelers in their twenties and book a direct flight for tomorrow morning.
Goal: Book a flight to a relevant beach destination with proper details.
MultiOn
Notes (MultiOn): Said tomorro was Nov 28 but clicked 27th, very slow, couldn't select the right places to go to, got stuck in loop.
Claude
Notes (Claude):Searched for URLs in the search box, but recovered then asked clarifying questions and got rate limited.
Metrics:
Metric | MultiOn | Claude |
---|---|---|
Task Completion | 0 | 0 |
Speed (Seconds) | 0 | 0 |
Accuracy (%) | 0% | 0% |
Ease of Use (1-5) | 3 | 3 |
Adaptability (1-5) | 2 | 4 |
Task 4: Download and Summarize ML Paper
Prompt: Download the latest ML paper from arXiv, summarize the abstract, and save it in a folder named 'Research'.
Goal: Accurately download, summarize, and save the file.
MultiOn
Notes (MultiOn): Failed to click on the PDF link; struggled to summarize, scrolled up and down repeatedly. Couldn't download since not OS-level.
Claude
Notes (Claude): Successfully completed the task, creating the folder, summarizing, but failed to rename and download.
Metrics:
Metric | MultiOn | Claude |
---|---|---|
Task Completion | 0 | 1 |
Speed (Seconds) | 55 | 120 |
Accuracy (%) | 20% | 80% |
Ease of Use (1-5) | 3 | 4 |
Adaptability (1-5) | 2 | 3 |
Task 5: Estimate Calorie Counts
Prompt: Estimate the calorie count of the most popular menu item at Geneva Steakhouse in SF.
Goal: Accurately estimate calories (~1200) based on images and descriptions.
MultiOn
Notes (MultiOn): Uses general photos for restaurant rather than searching the reviews for an item. Didn't know when to stop looking at photos. Needed me to clarify the most popular one even though I asked it.
Claude
Notes (Claude): Rate-limited after interacting with Google Maps; partially recovered but clicked on incorrect icons. Goes to the menu then google search images rather than the restaurant on google maps, and estimates ~1500 calories correctly but unclear how it's grounded.
Metrics:
Metric | MultiOn | Claude |
---|---|---|
Task Completion | 0 | 1 |
Speed (Seconds) | 49 | 360 |
Accuracy (%) | 50% | 90% |
Ease of Use (1-5) | 4 | 5 |
Adaptability (1-5) | 4 | 4 |