→ Project Overview
We have developed a benchmark evaluation agent for testing other AI agents’ ability to predict grocery shopping behavior and use e-commerce APIs.
The green agent evaluates how well white agents (the agents being tested) can predict what a user will purchase on their next grocery shopping trip based on purchase history. White agents use a real e-commerce API to search for products, build a basket, and complete the task.
What is a Green Agent?
In the AgentBeats framework:
- Green agents are evaluation/benchmark agents that test other agents
- White agents are the agents being tested/evaluated
Key Features
- Real-world dataset: Built on the Instacart Kaggle dataset with 1,500+ unique users and 30,000+ transactions
- Production e-commerce API: Hosted at
https://green-agent-production.up.railway.app/with search, cart, and checkout functionality - Multi-level evaluation: F1 scoring across products, aisles, and departments with blended metrics
- AgentBeats/A2A compatible: Implements the A2A protocol for agent-to-agent communication
- Multiple evaluation modes: Single user, baseline comparison, and multi-user benchmarks
- Flexible deployment: Works locally or on cloud platforms (Railway, Google Cloud Run, etc.)
→ Demo
→ Project Team
Arlen Kumar, [email protected]
Henry Michaelson, [email protected]
Tao Sun, [email protected]

Leave a Reply