Spaces:

sanilahmed2019
/

backend-deploy

Sleeping

App Files Files Community

sanilahmed2019 commited on Jan 2

Commit

295bc31

1 Parent(s): ce1386b

Fix README for docker SDK

Browse files

Files changed (13) hide show

.README.md.swp +0 -0
README.md +25 -7
book_ingestor.egg-info/PKG-INFO +24 -49
check_qdrant.py +0 -59
rag_agent_api/README.md +9 -9
rag_agent_api/__init__.py +2 -2
rag_agent_api/agent.py +0 -363
rag_agent_api/config.py +1 -0
rag_agent_api/main.py +11 -6
rag_agent_api/retrieval.py +35 -126
requirements.txt +11 -9
test_retrieval.py +0 -60
tests/test_integration.py +21 -18

.README.md.swp DELETED Viewed

Binary file (1.02 kB)

README.md CHANGED Viewed

@@ -1,14 +1,32 @@
 ---
-title: Physical AI Book Backend
-emoji: 🤖
-colorFrom: indigo
 colorTo: purple
 sdk: docker
-sdk_version: "3.10"
-app_file: app/main.py
 pinned: false
 ---
-# Book Content Ingestor & RAG Verification
-A system to extract content from Docusaurus-based book websites, chunk and embed it using Cohere, and store embeddings in Qdrant for RAG applications.

 ---
+title: Backend Deploy
+emoji: 🚀
+colorFrom: blue
 colorTo: purple
 sdk: docker
 pinned: false
 ---
+# RAG Agent and API Layer
+This is a FastAPI application that provides a question-answering API using Gemini agents and Qdrant retrieval for RAG (Retrieval Augmented Generation) functionality.
+## API Endpoints
+- `GET /` - Root endpoint with API information
+- `POST /ask` - Main question-answering endpoint
+- `GET /health` - Health check endpoint
+- `GET /ready` - Readiness check endpoint
+- `/docs` - API documentation (Swagger UI)
+- `/redoc` - API documentation (Redoc)
+## Configuration
+The application requires the following environment variables:
+- `GEMINI_API_KEY` - API key for Google Gemini
+- `QDRANT_URL` - URL for Qdrant vector database
+- `QDRANT_API_KEY` - API key for Qdrant database
+## Deployment
+This application is configured for deployment on Hugging Face Spaces using Docker.

book_ingestor.egg-info/PKG-INFO CHANGED Viewed

@@ -14,60 +14,35 @@ Requires-Dist: uvicorn>=0.24.0
 Requires-Dist: openai>=1.0.0
 Requires-Dist: pydantic>=2.0.0
-# Book Content Ingestor & RAG Verification
-A system to extract content from Docusaurus-based book websites, chunk and embed it using Cohere, store embeddings in Qdrant Cloud for RAG applications, and verify the retrieval pipeline functionality.
-## Setup
-1. Install dependencies using uv:
-```bash
-cd backend
-uv sync
-```
-2. Create a `.env` file with your API keys:
-```bash
-cp .env.example .env
-# Edit .env with your actual API keys
-```
-## Environment Variables
-- `COHERE_API_KEY`: Your Cohere API key
-- `QDRANT_URL`: Your Qdrant Cloud URL
-- `QDRANT_API_KEY`: Your Qdrant API key
-- `QDRANT_COLLECTION_NAME`: Name of the collection to use (default: "rag_embedding")
-## Usage
-### Run the ingestion pipeline:
-```bash
-cd backend
-uv run python main.py
-```
-This will:
-1. Collect all URLs from the target book (https://sanilahmed.github.io/hackathon-ai-book/)
-2. Extract text content from each URL
-3. Chunk the content into fixed-size segments
-4. Generate embeddings using Cohere
-5. Store embeddings with metadata in Qdrant Cloud collection named "rag_embedding"
-### Run the verification pipeline:
-```bash
-cd backend
-python -m verify_retrieval.main
-```
-Or with specific options:
-```bash
-python -m verify_retrieval.main --query "transformer architecture in NLP" --top-k 10
-```
-The verification system will:
-1. Load vectors and metadata stored in Qdrant from the original ingestion
-2. Implement retrieval functions to query Qdrant using sample keywords or phrases
-3. Validate that retrieved chunks are accurate and relevant
-4. Check that metadata (URL, title, chunk_id) matches source content
-5. Log results and confirm the pipeline executes end-to-end without errors

 Requires-Dist: openai>=1.0.0
 Requires-Dist: pydantic>=2.0.0
+---
+title: Backend Deploy
+emoji: 🚀
+colorFrom: blue
+colorTo: purple
+sdk: docker
+pinned: false
+---
+# RAG Agent and API Layer
+This is a FastAPI application that provides a question-answering API using Gemini agents and Qdrant retrieval for RAG (Retrieval Augmented Generation) functionality.
+## API Endpoints
+- `GET /` - Root endpoint with API information
+- `POST /ask` - Main question-answering endpoint
+- `GET /health` - Health check endpoint
+- `GET /ready` - Readiness check endpoint
+- `/docs` - API documentation (Swagger UI)
+- `/redoc` - API documentation (Redoc)
+## Configuration
+The application requires the following environment variables:
+- `GEMINI_API_KEY` - API key for Google Gemini
+- `QDRANT_URL` - URL for Qdrant vector database
+- `QDRANT_API_KEY` - API key for Qdrant database
+## Deployment
+This application is configured for deployment on Hugging Face Spaces using Docker.

check_qdrant.py DELETED Viewed

@@ -1,59 +0,0 @@
-#!/usr/bin/env python3
-"""
-Script to check if Qdrant collection exists and has data.
-"""
-import os
-from qdrant_client import QdrantClient
-from dotenv import load_dotenv
-# Load environment variables
-load_dotenv()
-# Get environment variables
-qdrant_url = os.getenv('QDRANT_URL')
-qdrant_api_key = os.getenv('QDRANT_API_KEY')
-if not qdrant_url or not qdrant_api_key:
-    print("Error: QDRANT_URL or QDRANT_API_KEY not found in environment variables")
-    exit(1)
-# Initialize Qdrant client
-client = QdrantClient(
-    url=qdrant_url,
-    api_key=qdrant_api_key,
-    timeout=30
-)
-try:
-    # List all collections
-    collections = client.get_collections()
-    print("Available collections:")
-    for collection in collections.collections:
-        # For newer Qdrant versions, get the collection info to get point count
-        collection_info = client.get_collection(collection.name)
-        print(f"  - {collection.name} (points: {collection_info.points_count})")
-    # Check specifically for the rag_embedding collection
-    try:
-        collection_info = client.get_collection("rag_embedding")
-        print(f"\nCollection 'rag_embedding' exists with {collection_info.points_count} points")
-        if collection_info.points_count > 0:
-            # Get a sample point to verify data exists
-            points = client.scroll(
-                collection_name="rag_embedding",
-                limit=1
-            )
-            if len(points[0]) > 0:
-                sample_point = points[0][0]
-                print(f"Sample point ID: {sample_point.id}")
-                print(f"Sample point payload keys: {list(sample_point.payload.keys())}")
-                print(f"Sample text preview: {sample_point.payload.get('text', '')[:100]}...")
-        else:
-            print("Collection 'rag_embedding' exists but is empty")
-    except Exception as e:
-        print(f"\nCollection 'rag_embedding' does not exist: {e}")
-except Exception as e:
-    print(f"Error connecting to Qdrant: {e}")

rag_agent_api/README.md CHANGED Viewed

@@ -1,17 +1,17 @@
 # RAG Agent and API Layer
-A FastAPI-based question-answering system that uses OpenAI Agents and Qdrant retrieval to generate grounded responses based on book content.
 ## Overview
-The RAG Agent and API Layer provides a question-answering API that retrieves relevant content from Qdrant and uses an OpenAI agent to generate accurate, source-grounded responses. The system ensures that all answers are based only on the provided context to prevent hallucinations.
 ## Architecture
 The system consists of several key components:
 - **FastAPI Application**: Main entry point for the question-answering API
-- **OpenAI Agent**: Generates responses based on retrieved context
 - **Qdrant Retriever**: Retrieves relevant content chunks from Qdrant database
 - **Configuration Manager**: Handles environment variables and settings
 - **Data Models**: Pydantic models for API requests/responses
@@ -22,7 +22,7 @@ The system consists of several key components:
 ### Prerequisites
 - Python 3.9+
-- OpenAI API key
 - Qdrant Cloud instance with book content embeddings
 - Cohere API key (for query embeddings)
@@ -42,7 +42,7 @@ The system consists of several key components:
 3. Edit `.env` with your API keys and configuration:
    ```env
-   OPENAI_API_KEY=your-openai-api-key-here
    QDRANT_URL=your-qdrant-instance-url
    QDRANT_API_KEY=your-qdrant-api-key
    QDRANT_COLLECTION_NAME=rag_embedding
@@ -103,7 +103,7 @@ Root endpoint with API information.
 ### Environment Variables
-- `OPENAI_API_KEY`: Your OpenAI API key
 - `QDRANT_URL`: URL of your Qdrant instance
 - `QDRANT_API_KEY`: Your Qdrant API key
 - `QDRANT_COLLECTION_NAME`: Name of the collection with book embeddings (default: `rag_embedding`)
@@ -123,8 +123,8 @@ Pydantic models for API request/response schemas.
 ### Schemas (`schemas.py`)
 Additional schemas for internal data structures.
-### Agent (`agent.py`)
-OpenAI agent implementation with context injection and response validation.
 ### Retrieval (`retrieval.py`)
 Qdrant integration for content retrieval with semantic search.
@@ -160,7 +160,7 @@ pytest
 # Run specific test files
 pytest tests/test_api.py
-pytest tests/test_agent.py
 pytest tests/test_retrieval.py
 ```

 # RAG Agent and API Layer
+A FastAPI-based question-answering system that uses OpenRouter Agents and Qdrant retrieval to generate grounded responses based on book content.
 ## Overview
+The RAG Agent and API Layer provides a question-answering API that retrieves relevant content from Qdrant and uses an OpenRouter agent to generate accurate, source-grounded responses. The system ensures that all answers are based only on the provided context to prevent hallucinations.
 ## Architecture
 The system consists of several key components:
 - **FastAPI Application**: Main entry point for the question-answering API
+- **OpenRouter Agent**: Generates responses based on retrieved context
 - **Qdrant Retriever**: Retrieves relevant content chunks from Qdrant database
 - **Configuration Manager**: Handles environment variables and settings
 - **Data Models**: Pydantic models for API requests/responses
 ### Prerequisites
 - Python 3.9+
+- OpenRouter API key
 - Qdrant Cloud instance with book content embeddings
 - Cohere API key (for query embeddings)
 3. Edit `.env` with your API keys and configuration:
    ```env
+   OPENROUTER_API_KEY=your-openrouter-api-key-here
    QDRANT_URL=your-qdrant-instance-url
    QDRANT_API_KEY=your-qdrant-api-key
    QDRANT_COLLECTION_NAME=rag_embedding
 ### Environment Variables
+- `OPENROUTER_API_KEY`: Your OpenRouter API key
 - `QDRANT_URL`: URL of your Qdrant instance
 - `QDRANT_API_KEY`: Your Qdrant API key
 - `QDRANT_COLLECTION_NAME`: Name of the collection with book embeddings (default: `rag_embedding`)
 ### Schemas (`schemas.py`)
 Additional schemas for internal data structures.
+### Agent (`openrouter_agent.py`)
+OpenRouter agent implementation with context injection and response validation.
 ### Retrieval (`retrieval.py`)
 Qdrant integration for content retrieval with semantic search.
 # Run specific test files
 pytest tests/test_api.py
+pytest tests/test_openrouter_agent.py
 pytest tests/test_retrieval.py
 ```

rag_agent_api/__init__.py CHANGED Viewed

@@ -10,7 +10,7 @@ __license__ = "MIT"
 # Import main components for easy access
 from .main import app
 from .config import Config, get_config, validate_config
-from .agent import GeminiAgent
 from .retrieval import QdrantRetriever
 # Define what gets imported with "from rag_agent_api import *"
@@ -19,6 +19,6 @@ __all__ = [
     "Config",
     "get_config",
     "validate_config",
-    "GeminiAgent",
     "QdrantRetriever"
 ]

 # Import main components for easy access
 from .main import app
 from .config import Config, get_config, validate_config
+from .openrouter_agent import OpenRouterAgent
 from .retrieval import QdrantRetriever
 # Define what gets imported with "from rag_agent_api import *"
     "Config",
     "get_config",
     "validate_config",
+    "OpenRouterAgent",
     "QdrantRetriever"
 ]

rag_agent_api/agent.py DELETED Viewed

@@ -1,363 +0,0 @@
-"""
-Google Gemini Agent module for the RAG Agent and API Layer system.
-This module provides functionality for creating and managing a Google Gemini agent
-that generates responses based on retrieved context.
-"""
-import asyncio
-import logging
-from typing import List, Dict, Any, Optional
-import google.generativeai as genai
-from .config import get_config
-from .schemas import AgentContext, AgentResponse, SourceChunkSchema
-from .utils import format_confidence_score
-class GeminiAgent:
-    """
-    A class to manage the Google Gemini agent for generating responses based on context.
-    """
-    def __init__(self, model_name: str = "gemini-2.5-flash"):
-        """
-        Initialize the Google Gemini agent with configuration.
-        Args:
-            model_name: Name of the Gemini model to use (default: gemini-2.5-flash)
-        """
-        config = get_config()
-        api_key = config.gemini_api_key
-        if not api_key:
-            raise ValueError("GEMINI_API_KEY environment variable not set")
-        # Configure the Gemini client
-        genai.configure(api_key=api_key)
-        # Create the generative model instance
-        self.model = genai.GenerativeModel(model_name)
-        self.model_name = model_name
-        self.default_temperature = config.default_temperature
-        logging.info(f"Gemini agent initialized with model: {model_name}")
-    async def generate_response(self, context: AgentContext) -> AgentResponse:
-        """
-        Generate a response based on the provided context.
-        Args:
-            context: AgentContext containing the query and retrieved context chunks
-        Returns:
-            AgentResponse with the generated answer and metadata
-        """
-        # Check if retrieved context is empty (no chunks at all)
-        if not context.retrieved_chunks:
-            return AgentResponse(
-                raw_response="I could not find this information in the book.",
-                used_sources=[],
-                confidence_score=0.0,
-                is_valid=True,
-                validation_details="No context chunks retrieved from the database",
-                unsupported_claims=[]
-            )
-        # Check if context is insufficient (very short content)
-        total_context_length = sum(len(chunk.content) for chunk in context.retrieved_chunks)
-        if total_context_length < 10:  # Much lower threshold, but still meaningful
-            return AgentResponse(
-                raw_response="I could not find this information in the book.",
-                used_sources=[],
-                confidence_score=0.0,
-                is_valid=True,
-                validation_details="No sufficient context provided to answer the question",
-                unsupported_claims=[]
-            )
-        try:
-            # Prepare the system message with instructions for grounding responses
-            system_message = self._create_system_message(context)
-            # Prepare the user message with the query
-            user_message = self._create_user_message(context)
-            # For Google Gemini, we need to format the prompt differently
-            # Combine system instructions and user query
-            full_prompt = f"{system_message}\n\n{user_message}"
-            # Generate response from Google Gemini
-            # For async generation, we need to use the appropriate async method
-            chat = self.model.start_chat()
-            response = await chat.send_message_async(
-                full_prompt,
-                generation_config={
-                    "temperature": context.source_policy if hasattr(context, 'temperature') else self.default_temperature,
-                    "max_output_tokens": 1000
-                }
-            )
-            # Extract the response text
-            raw_response = response.text if response and hasattr(response, 'text') else str(response)
-            # If the response indicates no information was found, return the exact message
-            if "I could not find this information in the book" in raw_response:
-                return AgentResponse(
-                    raw_response="I could not find this information in the book.",
-                    used_sources=[],
-                    confidence_score=0.0,
-                    is_valid=True,
-                    validation_details="No relevant information found in the provided context",
-                    unsupported_claims=[]
-                )
-            # Determine which sources were used (this is a simplified approach)
-            used_sources = self._identify_used_sources(raw_response, context.retrieved_chunks)
-            # Calculate confidence score (based on similarity scores of used sources)
-            confidence_score = self._calculate_confidence_score(used_sources, context.retrieved_chunks)
-            # Validate that the response is grounded in the provided context
-            grounding_validation = self._validate_response_grounding(
-                raw_response, context.retrieved_chunks, context.query
-            )
-            # Create and return the agent response
-            agent_response = AgentResponse(
-                raw_response=raw_response,
-                used_sources=used_sources,
-                confidence_score=confidence_score,
-                is_valid=grounding_validation["is_valid"],
-                validation_details=grounding_validation["details"],
-                unsupported_claims=grounding_validation["unsupported_claims"]
-            )
-            logging.info(f"Agent response generated successfully. Confidence: {confidence_score:.2f}")
-            return agent_response
-        except Exception as e:
-            logging.error(f"Error generating response from Google Gemini agent: {e}", exc_info=True)
-            # Return the specific message when there's an error
-            return AgentResponse(
-                raw_response="I could not find this information in the book.",
-                used_sources=[],
-                confidence_score=0.0,
-                is_valid=False,
-                validation_details=f"Error generating response: {str(e)}",
-                unsupported_claims=[]
-            )
-    def _create_system_message(self, context: AgentContext) -> str:
-        """
-        Create the system message that instructs the agent on how to behave.
-        Args:
-            context: AgentContext containing the query and retrieved context chunks
-        Returns:
-            Formatted system message string
-        """
-        system_prompt = """You are a documentation-based assistant.
-Answer ONLY using the provided context from the book
-"Physical AI & Humanoid Robotics".
-If the answer is not found, reply EXACTLY:
-"I could not find this information in the book."""
-        return system_prompt
-    def _create_user_message(self, context: AgentContext) -> str:
-        """
-        Create the user message containing the query.
-        Args:
-            context: AgentContext containing the query and retrieved context chunks
-        Returns:
-            Formatted user message string
-        """
-        return f"""CONTEXT:
-{self._format_context_chunks(context.retrieved_chunks)}
-QUESTION:
-{context.query}"""
-    def _format_context_chunks(self, chunks: List[SourceChunkSchema]) -> str:
-        """
-        Format the context chunks for the prompt.
-        Args:
-            chunks: List of source chunks to format
-        Returns:
-            Formatted context string
-        """
-        if not chunks:
-            return ""
-        formatted_chunks = []
-        for i, chunk in enumerate(chunks):
-            formatted_chunks.append(f"[Chunk {i+1}]\n{chunk.content}\n[/Chunk {i+1}]")
-        return "\n".join(formatted_chunks)
-    def _create_context_messages(self, context: AgentContext) -> List[Dict[str, str]]:
-        """
-        Create context messages from the retrieved chunks.
-        With the new format, context is now provided in the user message,
-        so this method returns an empty list to avoid duplication.
-        Args:
-            context: AgentContext containing the query and retrieved context chunks
-        Returns:
-            Empty list since context is now in user message
-        """
-        return []
-    def _identify_used_sources(self, response: str, chunks: List[SourceChunkSchema]) -> List[str]:
-        """
-        Identify which sources were likely used in the response.
-        This is a simplified approach - in a real implementation, you might use
-        more sophisticated techniques like semantic similarity.
-        Args:
-            response: The agent's response text
-            chunks: List of source chunks that were provided to the agent
-        Returns:
-            List of source IDs that were likely used
-        """
-        used_sources = []
-        response_lower = response.lower()
-        for chunk in chunks:
-            # Check if any significant words from the chunk appear in the response
-            content_words = set(chunk.content.lower().split()[:20])  # Check first 20 words
-            response_words = set(response_lower.split())
-            # If there's significant overlap, consider this chunk as used
-            overlap = content_words.intersection(response_words)
-            if len(overlap) > 2:  # Arbitrary threshold
-                used_sources.append(chunk.id)
-        # If no sources were identified, return all sources (conservative approach)
-        if not used_sources:
-            used_sources = [chunk.id for chunk in chunks]
-        return used_sources
-    def _calculate_confidence_score(self, used_sources: List[str], chunks: List[SourceChunkSchema]) -> float:
-        """
-        Calculate a confidence score based on the quality of the used sources.
-        Args:
-            used_sources: List of source IDs that were used
-            chunks: List of all source chunks that were provided to the agent
-        Returns:
-            Confidence score between 0.0 and 1.0
-        """
-        if not used_sources:
-            return 0.1  # Low confidence if no sources were used
-        # Calculate average similarity score of used sources
-        total_similarity = 0.0
-        used_count = 0
-        for chunk in chunks:
-            if chunk.id in used_sources:
-                total_similarity += chunk.similarity_score
-                used_count += 1
-        if used_count == 0:
-            return 0.1  # Low confidence if no matching chunks found
-        avg_similarity = total_similarity / used_count
-        # If similarity scores are very low (e.g., due to embedding issues),
-        # but we have content, still provide some confidence
-        if avg_similarity < 0.1 and len(used_sources) > 0:
-            # If we have relevant content but low similarity scores,
-            # it might be due to embedding issues, not lack of relevance
-            # So we'll set a minimum confidence if content exists
-            return 0.3  # Low but not zero confidence
-        else:
-            # Normalize the confidence score (adjust based on your requirements)
-            # Higher similarity scores contribute to higher confidence
-            confidence = avg_similarity
-        return format_confidence_score(confidence)
-    def _validate_response_grounding(self, response: str, chunks: List[SourceChunkSchema], query: str) -> Dict[str, Any]:
-        """
-        Validate that the response is grounded in the provided context.
-        Args:
-            response: The agent's response text
-            chunks: List of source chunks that were provided to the agent
-            query: The original query
-        Returns:
-            Dictionary with validation results
-        """
-        # Check if the response contains elements from the provided context
-        response_lower = response.lower()
-        context_text = " ".join([chunk.content.lower() for chunk in chunks])
-        # Simple heuristic: check if response contains significant terms from context
-        response_words = set(response_lower.split())
-        context_words = set(context_text.split())
-        # Calculate overlap between response and context
-        overlap = response_words.intersection(context_words)
-        total_response_words = len(response_words)
-        overlap_count = len(overlap)
-        # If less than 30% of response words come from context, flag as potentially ungrounded
-        is_grounded = True
-        unsupported_claims = []
-        if total_response_words > 0:
-            grounding_ratio = overlap_count / total_response_words
-            is_grounded = grounding_ratio >= 0.3  # At least 30% of words should come from context
-        # For now, we'll just return the basic validation
-        # In a more sophisticated implementation, you'd analyze the response more deeply
-        details = f"Response grounding validation completed. Context overlap ratio: {overlap_count/total_response_words if total_response_words > 0 else 0:.2f}"
-        return {
-            "is_valid": is_grounded,
-            "details": details,
-            "unsupported_claims": unsupported_claims
-        }
-    async def validate_response_quality(self, response: str, context: AgentContext) -> bool:
-        """
-        Validate the quality of the agent's response.
-        Args:
-            response: The agent's response text
-            context: AgentContext containing the query and retrieved context chunks
-        Returns:
-            True if response meets quality standards, False otherwise
-        """
-        # Check for common signs of poor quality responses
-        if not response or response.strip() == "":
-            logging.warning("Agent returned an empty response")
-            return False
-        # Check if response contains generic fallback phrases
-        lower_response = response.lower()
-        if "i don't know" in lower_response or "i don't have" in lower_response:
-            # This might be a valid response if there's no relevant context
-            if len(context.retrieved_chunks) == 0:
-                return True  # Valid response if no context was provided
-            else:
-                # Check if the response is justified given the context
-                # For now, we'll consider it valid if it acknowledges the lack of relevant information
-                return True
-        # In a more sophisticated implementation, you'd validate against the context more rigorously
-        return True
-# Global agent instance (if needed)
-# agent_instance = OpenAIAgent()

rag_agent_api/config.py CHANGED Viewed

@@ -19,6 +19,7 @@ class Config:
     def __init__(self):
         """Initialize configuration by loading environment variables."""
         self.cohere_api_key = os.getenv('COHERE_API_KEY')
         self.openrouter_api_key = os.getenv('OPENROUTER_API_KEY')
         self.qdrant_url = os.getenv('QDRANT_URL')

     def __init__(self):
         """Initialize configuration by loading environment variables."""
+        self.openai_api_key = os.getenv('OPENAI_API_KEY')
         self.cohere_api_key = os.getenv('COHERE_API_KEY')
         self.openrouter_api_key = os.getenv('OPENROUTER_API_KEY')
         self.qdrant_url = os.getenv('QDRANT_URL')

rag_agent_api/main.py CHANGED Viewed

@@ -82,22 +82,22 @@ async def health_check() -> HealthResponse:
         HealthResponse with status of services
     """
     # Check if all required components are initialized
-    gemini_status = "up" if agent else "down"
     qdrant_status = "up" if retriever else "down"
     agent_status = "up" if agent else "down"
     # Determine overall status
     overall_status = "healthy"
-    if gemini_status == "down" or qdrant_status == "down":
         overall_status = "unhealthy"
-    elif gemini_status == "degraded" or qdrant_status == "degraded":
         overall_status = "degraded"
     return HealthResponse(
         status=overall_status,
         timestamp=format_timestamp(),
         services={
-            "gemini": gemini_status,
             "qdrant": qdrant_status,
             "agent": agent_status
         }
@@ -194,7 +194,7 @@ async def root() -> Dict[str, Any]:
     return {
         "message": "RAG Agent and API Layer",
         "version": "1.0.0",
-        "description": "Question-answering API using OpenAI Agents and Qdrant retrieval",
         "endpoints": {
             "POST /ask": "Main question-answering endpoint",
             "GET /health": "Health check endpoint",
@@ -243,4 +243,9 @@ async def readiness_check() -> Dict[str, str]:
     if retriever and agent:
         return {"status": "ready"}
     else:
-        raise HTTPException(status_code=503, detail="Service not ready")

         HealthResponse with status of services
     """
     # Check if all required components are initialized
+    openrouter_status = "up" if agent else "down"
     qdrant_status = "up" if retriever else "down"
     agent_status = "up" if agent else "down"
     # Determine overall status
     overall_status = "healthy"
+    if openrouter_status == "down" or qdrant_status == "down":
         overall_status = "unhealthy"
+    elif openrouter_status == "degraded" or qdrant_status == "degraded":
         overall_status = "degraded"
     return HealthResponse(
         status=overall_status,
         timestamp=format_timestamp(),
         services={
+            "openrouter": openrouter_status,
             "qdrant": qdrant_status,
             "agent": agent_status
         }
     return {
         "message": "RAG Agent and API Layer",
         "version": "1.0.0",
+        "description": "Question-answering API using OpenRouter Agents and Qdrant retrieval",
         "endpoints": {
             "POST /ask": "Main question-answering endpoint",
             "GET /health": "Health check endpoint",
     if retriever and agent:
         return {"status": "ready"}
     else:
+        raise HTTPException(status_code=503, detail="Service not ready")
+if __name__ == "__main__":
+    import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=8000)

rag_agent_api/retrieval.py CHANGED Viewed

@@ -76,16 +76,6 @@ class QdrantRetriever:
             # Embed the query using Cohere
             query_embedding = await self._embed_query(query)
-            # Check if we got a zero vector fallback (indicating embedding service failure)
-            is_zero_vector = all(x == 0.0 for x in query_embedding)
-            if is_zero_vector:
-                # If we have a zero vector, try a different approach - keyword search
-                logging.warning("Zero vector detected, attempting keyword-based fallback search")
-                retrieved_chunks = await self._keyword_search_fallback(query, top_k)
-                logging.info(f"Keyword fallback search retrieved {len(retrieved_chunks)} chunks from Qdrant")
-                return retrieved_chunks
             # Perform semantic search in Qdrant
             search_results = await self.client.query_points(
                 collection_name=self.collection_name,
@@ -126,134 +116,53 @@ class QdrantRetriever:
             # Return empty list instead of raising exception to allow graceful handling
             return []
-    async def _keyword_search_fallback(self, query: str, top_k: int = 5) -> List[SourceChunkSchema]:
         """
-        Fallback method to search using keyword matching when embedding service is unavailable.
         Args:
-            query: The user's query string
-            top_k: Number of results to return (default: 5)
         Returns:
-            List of SourceChunkSchema objects containing relevant content
         """
         try:
-            # Use Qdrant's full-text search capability or filter-based approach
-            # For now, we'll use a scroll + filter approach to find relevant chunks
-            from qdrant_client.http import models
-            # Simple approach: get all points and filter based on keyword matching
-            # In a production system, you'd want to use Qdrant's text indexing capabilities
-            all_points = await self.client.scroll(
-                collection_name=self.collection_name,
-                limit=10000,  # Get up to 10000 points (or as many as exist)
-                with_payload=True,
-                with_vectors=False
             )
-            # Extract points from the result (structure may vary depending on Qdrant client version)
-            points = all_points[0] if isinstance(all_points, tuple) else all_points
-            # Score points based on keyword matching
-            scored_chunks = []
-            query_lower = query.lower()
-            query_words = set(query_lower.split())
-            for point in points:
-                payload = point.payload if hasattr(point, 'payload') else point
-                content = payload.get('text', '') if isinstance(payload, dict) else getattr(payload, 'text', '')
-                content_lower = content.lower()
-                # Calculate a simple keyword match score
-                content_words = set(content_lower.split())
-                overlap = query_words.intersection(content_words)
-                score = len(overlap) / len(query_words) if query_words else 0  # Jaccard similarity
-                if score > 0 or query_lower in content_lower:  # Only include if there's some match
-                    chunk = SourceChunkSchema(
-                        id=point.id if hasattr(point, 'id') else getattr(point, 'point_id', None),
-                        url=payload.get('url', '') if isinstance(payload, dict) else getattr(payload, 'url', ''),
-                        title=payload.get('title', '') if isinstance(payload, dict) else getattr(payload, 'title', ''),
-                        content=content,
-                        similarity_score=score,
-                        chunk_index=payload.get('chunk_index', 0) if isinstance(payload, dict) else getattr(payload, 'chunk_index', 0)
-                    )
-                    if self._validate_chunk(chunk):
-                        scored_chunks.append((chunk, score))
-            # Sort by score and return top_k
-            scored_chunks.sort(key=lambda x: x[1], reverse=True)
-            top_chunks = [chunk for chunk, score in scored_chunks[:top_k]]
-            return top_chunks
         except Exception as e:
-            logging.error(f"Error in keyword fallback search: {e}", exc_info=True)
-            return []
-    async def _embed_query(self, query: str) -> List[float]:
-        """
-        Embed the query using Cohere to prepare for semantic search with retry logic for rate limits.
-        Args:
-            query: The query string to embed
-        Returns:
-            List of floats representing the query embedding
-        """
-        import time
-        import random
-        from cohere.errors.too_many_requests_error import TooManyRequestsError
-        # Try Cohere with retry logic for rate limits
-        for attempt in range(3):  # Try up to 3 times
             try:
-                # Use Cohere to embed the query
-                # The original book content was likely embedded with Cohere embed-english-v3.0
-                response = await self.cohere_client.embed(
-                    texts=[query],
-                    model="embed-english-v3.0",  # 1024-dimensional embedding model
-                    input_type="search_query"  # Specify this is a search query
-                )
-                # Extract the embedding from the response
-                embedding = response.embeddings[0]  # Get the first (and only) embedding
-                return embedding
-            except TooManyRequestsError as e:
-                if attempt < 2:  # Don't wait after the last attempt
-                    # Exponential backoff with jitter
-                    wait_time = (2 ** attempt) + random.uniform(0, 1)
-                    logging.warning(f"Cohere rate limited (attempt {attempt + 1}), waiting {wait_time:.2f}s: {e}")
-                    await asyncio.sleep(wait_time)
-                else:
-                    logging.error(f"Cohere rate limited after {attempt + 1} attempts: {e}")
-            except Exception as e:
-                logging.error(f"Error embedding query with Cohere: {e}", exc_info=True)
-                break  # Don't retry for other types of errors
-        # If Cohere fails, try using OpenAI embeddings as fallback if available
-        try:
-            from openai import OpenAI
-            from .config import get_config
-            config = get_config()
-            if config.openai_api_key:
-                client = OpenAI(api_key=config.openai_api_key)
-                response = client.embeddings.create(
-                    input=query,
-                    model="text-embedding-ada-002"
-                )
-                embedding = response.data[0].embedding
-                logging.info("Successfully used OpenAI embedding as fallback")
-                return embedding
-        except Exception as openai_error:
-            logging.warning(f"OpenAI fallback also failed: {openai_error}")
-        # If all fail, return a zero vector of the correct size (1024) as a last resort
-        # This will result in poor semantic matches but won't crash the system
-        logging.warning("Using zero vector as final fallback for query embedding")
-        return [0.0] * 1024
     def _validate_chunk(self, chunk: SourceChunkSchema) -> bool:
         """

             # Embed the query using Cohere
             query_embedding = await self._embed_query(query)
             # Perform semantic search in Qdrant
             search_results = await self.client.query_points(
                 collection_name=self.collection_name,
             # Return empty list instead of raising exception to allow graceful handling
             return []
+    async def _embed_query(self, query: str) -> List[float]:
         """
+        Embed the query using Cohere to prepare for semantic search.
         Args:
+            query: The query string to embed
         Returns:
+            List of floats representing the query embedding
         """
         try:
+            # Use Cohere to embed the query
+            # The original book content was likely embedded with Cohere embed-english-v3.0
+            response = await self.cohere_client.embed(
+                texts=[query],
+                model="embed-english-v3.0",  # 1024-dimensional embedding model
+                input_type="search_query"  # Specify this is a search query
             )
+            # Extract the embedding from the response
+            embedding = response.embeddings[0]  # Get the first (and only) embedding
+            return embedding
         except Exception as e:
+            logging.error(f"Error embedding query with Cohere: {e}", exc_info=True)
+            # Try using OpenAI embeddings as fallback if available
             try:
+                from openai import OpenAI
+                from .config import get_config
+                config = get_config()
+                if config.openai_api_key:
+                    client = OpenAI(api_key=config.openai_api_key)
+                    response = client.embeddings.create(
+                        input=query,
+                        model="text-embedding-ada-002"
+                    )
+                    embedding = response.data[0].embedding
+                    logging.info("Successfully used OpenAI embedding as fallback")
+                    return embedding
+            except Exception as openai_error:
+                logging.warning(f"OpenAI fallback also failed: {openai_error}")
+            # If both fail, return a zero vector of the correct size (1024) as a last resort
+            # This will result in poor semantic matches but won't crash the system
+            logging.warning("Using zero vector as final fallback for query embedding")
+            return [0.0] * 1024
     def _validate_chunk(self, chunk: SourceChunkSchema) -> bool:
         """

requirements.txt CHANGED Viewed

@@ -1,10 +1,12 @@
-fastapi>=0.104.1
-uvicorn[standard]>=0.24.0
-qdrant-client>=1.8.0
-python-dotenv>=1.0.0
-httpx>=0.25.0
 cohere>=4.9.0
-google-generativeai>=0.4.0
-openai>=1.6.0
-pydantic>=2.5.0
-typing-extensions>=4.8.0

+# Backend Service Dependencies
+requests>=2.31.0
+beautifulsoup4>=4.12.0
 cohere>=4.9.0
+qdrant-client>=1.7.0
+python-dotenv>=1.0.0
+fastapi>=0.104.0
+uvicorn>=0.24.0
+openai>=1.0.0
+pydantic>=2.0.0
+numpy>=1.21.0
+httpx>=0.27.0

test_retrieval.py DELETED Viewed

@@ -1,60 +0,0 @@
-#!/usr/bin/env python3
-"""
-Test script to directly test the Qdrant retrieval functionality
-"""
-import asyncio
-import os
-from dotenv import load_dotenv
-from rag_agent_api.retrieval import QdrantRetriever
-from rag_agent_api.config import get_config
-# Load environment variables
-load_dotenv()
-async def test_retrieval():
-    print("Testing Qdrant retrieval functionality...")
-    # Create a QdrantRetriever instance
-    retriever = QdrantRetriever()
-    print("1. Testing collection existence...")
-    exists = await retriever.validate_collection_exists()
-    print(f"   Collection exists: {exists}")
-    if exists:
-        print("2. Getting total points in collection...")
-        total_points = await retriever.get_total_points()
-        print(f"   Total points: {total_points}")
-    print("3. Testing query embedding...")
-    try:
-        query = "what about this book?"
-        embedding = await retriever._embed_query(query)
-        print(f"   Query embedding successful, length: {len(embedding)}")
-    except Exception as e:
-        print(f"   Query embedding failed: {e}")
-        return
-    print("4. Testing direct search...")
-    try:
-        results = await retriever.retrieve_context(query, top_k=5)
-        print(f"   Retrieved {len(results)} results")
-        if results:
-            print("   Sample results:")
-            for i, result in enumerate(results[:2]):  # Show first 2 results
-                print(f"     Result {i+1}:")
-                print(f"       ID: {result.id}")
-                print(f"       Title: {result.title}")
-                print(f"       Content preview: {result.content[:100]}...")
-                print(f"       Similarity: {result.similarity_score}")
-                print(f"       URL: {result.url}")
-        else:
-            print("   No results retrieved - this indicates the main issue")
-    except Exception as e:
-        print(f"   Direct search failed: {e}")
-        import traceback
-        traceback.print_exc()
-if __name__ == "__main__":
-    asyncio.run(test_retrieval())

tests/test_integration.py CHANGED Viewed

@@ -7,7 +7,7 @@ from fastapi.testclient import TestClient
 from unittest.mock import Mock, patch, AsyncMock
 from rag_agent_api.main import app, retriever, agent
 from rag_agent_api.retrieval import QdrantRetriever
-from rag_agent_api.agent import OpenAIAgent
 from rag_agent_api.schemas import SourceChunkSchema, AgentResponse, AgentContext
@@ -17,13 +17,13 @@ def test_full_query_flow_with_mocked_components():
         'QDRANT_URL': 'http://test-qdrant:6333',
         'QDRANT_API_KEY': 'test-api-key',
         'COHERE_API_KEY': 'test-cohere-key',
-        'OPENAI_API_KEY': 'test-openai-key'
     }):
         with patch('rag_agent_api.main.QdrantRetriever') as mock_retriever_class:
-            with patch('rag_agent_api.main.OpenAIAgent') as mock_agent_class:
                 # Create mock instances
                 mock_retriever = Mock(spec=QdrantRetriever)
-                mock_agent = Mock(spec=OpenAIAgent)
                 # Configure the class mocks to return our instance mocks
                 mock_retriever_class.return_value = mock_retriever
@@ -84,11 +84,11 @@ async def test_agent_context_creation():
         'QDRANT_URL': 'http://test-qdrant:6333',
         'QDRANT_API_KEY': 'test-api-key',
         'COHERE_API_KEY': 'test-cohere-key',
-        'OPENAI_API_KEY': 'test-openai-key'
     }):
         with patch('rag_agent_api.retrieval.AsyncQdrantClient') as mock_qdrant_client:
             with patch('rag_agent_api.retrieval.cohere.Client') as mock_cohere_client:
-                with patch('rag_agent_api.agent.AsyncOpenAI'):
                     # Mock the Qdrant client
                     mock_qdrant_instance = Mock()
                     mock_qdrant_client.return_value = mock_qdrant_instance
@@ -101,7 +101,7 @@ async def test_agent_context_creation():
                     # Initialize components
                     retriever = QdrantRetriever(collection_name="test_collection")
-                    agent = OpenAIAgent(model_name="gpt-4-test")
                     # Create test chunks
                     test_chunk = SourceChunkSchema(
@@ -145,7 +145,7 @@ def test_health_endpoint_integration():
             assert "services" in data
             # Check that services status is included
-            assert "openai" in data["services"]
             assert "qdrant" in data["services"]
             assert "agent" in data["services"]
@@ -157,11 +157,11 @@ async def test_retrieval_and_agent_integration():
         'QDRANT_URL': 'http://test-qdrant:6333',
         'QDRANT_API_KEY': 'test-api-key',
         'COHERE_API_KEY': 'test-cohere-key',
-        'OPENAI_API_KEY': 'test-openai-key'
     }):
         with patch('rag_agent_api.retrieval.AsyncQdrantClient') as mock_qdrant_client:
             with patch('rag_agent_api.retrieval.cohere.Client') as mock_cohere_client:
-                with patch('rag_agent_api.agent.AsyncOpenAI') as mock_openai:
                     # Mock the Qdrant client
                     mock_qdrant_instance = Mock()
                     mock_qdrant_client.return_value = mock_qdrant_instance
@@ -172,18 +172,21 @@ async def test_retrieval_and_agent_integration():
                     mock_cohere_client.return_value = mock_cohere_instance
                     mock_cohere_instance.embed.return_value = Mock(embeddings=[[0.1, 0.2, 0.3]])
-                    # Mock the OpenAI client
-                    mock_openai_instance = Mock()
-                    mock_openai.return_value = mock_openai_instance
                     mock_completion = Mock()
-                    mock_completion.choices = [Mock()]
-                    mock_completion.choices[0].message = Mock()
-                    mock_completion.choices[0].message.content = "This is a test response"
-                    mock_openai_instance.chat.completions.create = AsyncMock(return_value=mock_completion)
                     # Initialize components
                     test_retriever = QdrantRetriever(collection_name="test_collection")
-                    test_agent = OpenAIAgent(model_name="gpt-4-test")
                     # Mock the retrieval result
                     mock_chunk = SourceChunkSchema(

 from unittest.mock import Mock, patch, AsyncMock
 from rag_agent_api.main import app, retriever, agent
 from rag_agent_api.retrieval import QdrantRetriever
+from rag_agent_api.openrouter_agent import OpenRouterAgent
 from rag_agent_api.schemas import SourceChunkSchema, AgentResponse, AgentContext
         'QDRANT_URL': 'http://test-qdrant:6333',
         'QDRANT_API_KEY': 'test-api-key',
         'COHERE_API_KEY': 'test-cohere-key',
+        'OPENROUTER_API_KEY': 'test-openrouter-key'
     }):
         with patch('rag_agent_api.main.QdrantRetriever') as mock_retriever_class:
+            with patch('rag_agent_api.main.OpenRouterAgent') as mock_agent_class:
                 # Create mock instances
                 mock_retriever = Mock(spec=QdrantRetriever)
+                mock_agent = Mock(spec=OpenRouterAgent)
                 # Configure the class mocks to return our instance mocks
                 mock_retriever_class.return_value = mock_retriever
         'QDRANT_URL': 'http://test-qdrant:6333',
         'QDRANT_API_KEY': 'test-api-key',
         'COHERE_API_KEY': 'test-cohere-key',
+        'OPENROUTER_API_KEY': 'test-openrouter-key'
     }):
         with patch('rag_agent_api.retrieval.AsyncQdrantClient') as mock_qdrant_client:
             with patch('rag_agent_api.retrieval.cohere.Client') as mock_cohere_client:
+                with patch('rag_agent_api.openrouter_agent.httpx.AsyncClient'):
                     # Mock the Qdrant client
                     mock_qdrant_instance = Mock()
                     mock_qdrant_client.return_value = mock_qdrant_instance
                     # Initialize components
                     retriever = QdrantRetriever(collection_name="test_collection")
+                    agent = OpenRouterAgent(model_name="gpt-4-test")
                     # Create test chunks
                     test_chunk = SourceChunkSchema(
             assert "services" in data
             # Check that services status is included
+            assert "openrouter" in data["services"]
             assert "qdrant" in data["services"]
             assert "agent" in data["services"]
         'QDRANT_URL': 'http://test-qdrant:6333',
         'QDRANT_API_KEY': 'test-api-key',
         'COHERE_API_KEY': 'test-cohere-key',
+        'OPENROUTER_API_KEY': 'test-openrouter-key'
     }):
         with patch('rag_agent_api.retrieval.AsyncQdrantClient') as mock_qdrant_client:
             with patch('rag_agent_api.retrieval.cohere.Client') as mock_cohere_client:
+                with patch('rag_agent_api.openrouter_agent.httpx.AsyncClient') as mock_httpx_client:
                     # Mock the Qdrant client
                     mock_qdrant_instance = Mock()
                     mock_qdrant_client.return_value = mock_qdrant_instance
                     mock_cohere_client.return_value = mock_cohere_instance
                     mock_cohere_instance.embed.return_value = Mock(embeddings=[[0.1, 0.2, 0.3]])
+                    # Mock the httpx client for OpenRouter
+                    mock_httpx_instance = Mock()
+                    mock_httpx_client.return_value.__aenter__.return_value = mock_httpx_instance
                     mock_completion = Mock()
+                    mock_completion.json.return_value = {
+                        "choices": [
+                            {"message": {"content": "This is a test response"}}
+                        ]
+                    }
+                    mock_httpx_instance.post = AsyncMock(return_value=mock_completion)
+                    mock_httpx_instance.post.return_value.status_code = 200
                     # Initialize components
                     test_retriever = QdrantRetriever(collection_name="test_collection")
+                    test_agent = OpenRouterAgent(model_name="gpt-4-test")
                     # Mock the retrieval result
                     mock_chunk = SourceChunkSchema(