{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Welcome to ITBench-Lite-Space!\n",
    "\n",
    "Welcome! This interactive environment lets you run and evaluate AI agents on real-world IT automation tasks."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quick Start Guide"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 1: Duplicate This Space First (If You Have Not!)\n",
    "\n",
    "**Important:** You need your own copy to set up API keys.\n",
    "\n",
    "1. Click the **⋮ menu** at the top of the page\n",
    "2. Select **\"Duplicate this Space\"**\n",
    "3. Choose a name and wait for it to build"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 2: Set Up Your API Keys (Required)\n",
    "\n",
    "Once you have your duplicated Space:\n",
    "\n",
    "1. Get your API keys:\n",
    "   - [HuggingFace Token](https://huggingface.co/settings/tokens) - for agent execution\n",
    "   - [OpenRouter Key](https://openrouter.ai) - for Gemini Judge evaluation\n",
    "2. In **your Space**, go to **Settings → Repository secrets**\n",
    "3. Add secrets:\n",
    "   - `HF_TOKEN` = your HuggingFace token\n",
    "   - `OPENROUTER_API_KEY` = your OpenRouter key\n",
    "4. (Optional) Before using llama-3.3-70b, **accept Llama license**: Visit [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) and click \"Agree and access repository\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 3: Choose Your Path\n",
    "\n",
    "**New to ITBench?** → Start with `download_run_scenario.ipynb`\n",
    "- Download a scenario from the ITBench-Lite dataset\n",
    "- Run an agent interactively to see how it works\n",
    "- Experiment with different models (HuggingFace: Llama, Qwen, GPT-OSS | OpenRouter: Gemini, Claude, GPT)\n",
    "\n",
    "**Ready to Evaluate?** → Jump to `evaluation.ipynb`\n",
    "- Analyze agent performance across multiple scenarios\n",
    "- View detailed metrics, trajectories, and visualizations\n",
    "- Generate comprehensive evaluation reports"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 4: Open a Notebook\n",
    "\n",
    "Click on one of the notebook files in the left sidebar:\n",
    "- **download_run_scenario.ipynb** - Download scenarios and run agents\n",
    "- **evaluation.ipynb** - Comprehensive evaluation and analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What's in This Space?\n",
    "\n",
    "- `download_run_scenario.ipynb` - Interactive agent execution notebook\n",
    "- `evaluation.ipynb` - Evaluation and analysis notebook\n",
    "- `analysis_src/` - Python modules for evaluation metrics\n",
    "- `ITBench-SRE-Agent/` - Reference agent implementation (pre-installed)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "## Useful Links\n\n- [Why Do Enterprise Agents Fail? Insights from IT-Bench using MAST](https://ucb-mast.notion.site/) - Research insights and analysis\n- [ITBench-Lite Dataset](https://huggingface.co/datasets/ibm-research/ITBench-Lite) - 50 scenarios across SRE and FinOps\n- [ITBench-Trajectories](https://huggingface.co/datasets/ibm-research/ITBench-Trajectories) - Complete execution traces\n- [ITBench GitHub](https://github.com/itbench-hub/ITBench) - Main repository\n- [ITBench-SRE-Agent](https://github.com/itbench-hub/ITBench-SRE-Agent) - Agent implementation\n\n**Happy benchmarking!**"
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}