Tao Sun

AI Researcher in AEC

Agentic AI Benchmark: Shop Til You Drop

→ Project Overview

We have developed a benchmark evaluation agent for testing other AI agents’ ability to predict grocery shopping behavior and use e-commerce APIs.

The green agent evaluates how well white agents (the agents being tested) can predict what a user will purchase on their next grocery shopping trip based on purchase history. White agents use a real e-commerce API to search for products, build a basket, and complete the task.

What is a Green Agent?

In the AgentBeats framework:

Green agents are evaluation/benchmark agents that test other agents
White agents are the agents being tested/evaluated

Key Features

Real-world dataset: Built on the Instacart Kaggle dataset with 1,500+ unique users and 30,000+ transactions
Production e-commerce API: Hosted at https://green-agent-production.up.railway.app/ with search, cart, and checkout functionality
Multi-level evaluation: F1 scoring across products, aisles, and departments with blended metrics
AgentBeats/A2A compatible: Implements the A2A protocol for agent-to-agent communication
Multiple evaluation modes: Single user, baseline comparison, and multi-user benchmarks
Flexible deployment: Works locally or on cloud platforms (Railway, Google Cloud Run, etc.)

→ Demo

→ Project Team

Arlen Kumar, [email protected]
Henry Michaelson, [email protected]
Tao Sun, [email protected]

→ Github

https://github.com/LupoSun/CS194_Ecom_GreenAgent

Agentic AI Benchmark: Shop Til You Drop

→ Project Overview

What is a Green Agent?

Key Features

→ Demo

→ Project Team

→ Github

Share this:

Like this:

Leave a ReplyCancel reply

Discover more from Tao Sun