SXD390
MCP Server
SXD390
public

EAG V1 Assignment 7

一个基于检索增强生成(RAG)的全栈系统,支持语义搜索和智能问答。

Repository Info

0
Stars
0
Forks
0
Watchers
0
Issues
Python
Language
-
License

About This Server

一个基于检索增强生成(RAG)的全栈系统,支持语义搜索和智能问答。

Model Context Protocol (MCP) - This server can be integrated with AI applications to provide additional context and capabilities, enabling enhanced AI interactions and functionality.

Documentation

# 🎬 YouTube Transcript RAG Agent

**A full-stack, Retrieval-Augmented Generation (RAG) system for YouTube video transcripts, featuring a beautiful Chrome extension, blazing-fast semantic search with FAISS, and an intelligent agent powered by Google Gemini LLM.**

---

## 🌟 Features

- **Chrome Extension**: Index and query YouTube videos directly from your browser.
- **Semantic Search**: Instantly search across indexed video transcripts using vector embeddings and FAISS.
- **PADM Agent**: Modular Perception-Action-Decision-Memory agent architecture for intelligent, context-aware answers.
- **LLM Integration**: Uses Google Gemini for intent extraction, planning, and answer generation.
- **API Server**: Flask backend with endpoints for indexing, querying, and status.
- **Beautiful UI**: Modern, vibrant extension popup with smooth user experience.
- **Open Source**: Easily extensible and hackable.

---

## 🖼️ System Architecture

### 1️⃣ Indexing Flow

```mermaid
sequenceDiagram
    participant User as User (Browser/Extension)
    participant Ext as Chrome Extension
    participant API as Flask API Server
    participant TM as TranscriptManager
    participant YT as YouTube
    participant FAISS as FAISS Index

    User->>Ext: Click "Index Video" on YouTube
    Ext->>API: POST /index_video (YouTube URL)
    API->>TM: index_video(url)
    TM->>YT: Fetch video metadata (yt-dlp)
    TM->>YT: Fetch transcript (youtube-transcript-api)
    TM->>TM: Chunk transcript (by time)
    TM->>TM: Generate embeddings (local model)
    TM->>FAISS: Add embeddings + metadata
    TM->>API: Return operation_id/status
    API->>Ext: Respond with indexing status
    Ext->>User: Show progress/status
```

---

### 2️⃣ Query & PADM Agent Flow

```mermaid
sequenceDiagram
    participant User as User (Browser/Extension)
    participant Ext as Chrome Extension
    participant API as Flask API Server
    participant Agent as PADM Agent
    participant MCP as MCP Tool Server
    participant TM as TranscriptManager
    participant FAISS as FAISS Index
    participant Gemini as Gemini LLM

    User->>Ext: Enter query & submit
    Ext->>API: POST /query (query text)
    API->>Agent: process_query(query)
    Agent->>Gemini: Perception (extract intent/entities)
    Agent->>TM: Memory (retrieve relevant transcript chunks)
    TM->>FAISS: Semantic search (embeddings)
    FAISS-->>TM: Top-k transcript chunks
    TM-->>Agent: Relevant transcript segments
    Agent->>Gemini: Decision (generate plan/tool call)
    alt Tool call needed
        Agent->>MCP: Call tool (e.g., search_transcripts)
        MCP->>TM: search(query)
        TM->>FAISS: Semantic search
        FAISS-->>TM: Results
        TM-->>MCP: Results
        MCP-->>Agent: Tool output
        Agent->>Gemini: Decision (final answer)
    end
    Agent->>API: Return answer + sources
    API->>Ext: Respond with answer/sources
    Ext->>User: Show answer, highlight sources
```

---

## 🚀 Quickstart

### 1. Clone & Install

```bash
git clone https://github.com/SXD390/EAG-V1-Assignment-7.git
cd yt_rag
python -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows (complications regarding using FAISS in GPU mode will arise, use conda with py version = 3.10)
pip install -r requirements.txt
```

### 2. Set Up Environment

- **Google Gemini API**: Get an API key and set `GEMINI_API_KEY` in a `.env` file.
- **Local Embedding Model**: Start a local embedding server (e.g., Ollama with `nomic-embed-text`).

Example `.env`:
```
GEMINI_API_KEY=your-gemini-key
```

### 3. Run the Backend

```bash
python agent.py
```

### 4. Load the Chrome Extension

- Go to `chrome://extensions`
- Enable "Developer mode"
- Click "Load unpacked" and select the `chrome_extension` folder

---

## 🧩 Project Structure

```
yt_rag/
│
├── agent.py                # Main Flask API & PADM agent
├── mcp_server.py           # MCP tool server for transcript search
├── models.py               # Pydantic models for data interchange
├── memory.py               # Memory component (retrieval)
├── perception.py           # Perception (intent/entity extraction)
├── decision.py             # Decision (planning, LLM)
├── action.py               # Action (tool execution, formatting)
├── utils/
│   ├── transcript_manager.py  # Transcript download, chunk, embed, index/search
│   └── status_tracker.py      # Indexing status tracking
├── data/
│   ├── transcripts/        # Raw transcript JSONs
│   └── faiss_index/        # FAISS index + metadata
├── chrome_extension/
│   ├── popup.html          # Extension UI
│   ├── js/                 # JS logic
│   ├── css/                # Styles
│   └── manifest.json       # Extension manifest
└── requirements.txt
```

---

## 🧠 PADM Agent: How It Works

- **Perception**: Extracts user intent and entities using Gemini LLM.
- **Memory**: Retrieves relevant transcript chunks (semantic search via FAISS).
- **Decision**: Plans next steps (tool call or answer) using Gemini LLM.
- **Action**: Executes tool calls (via MCP) or formats the final answer.

The agent loops through these steps, using retrieved transcript data and LLM reasoning, until a final answer is produced.

---

## 🖥️ Chrome Extension

- **Index**: One-click to index the current YouTube video.
- **Query**: Ask questions about any indexed video.
- **Results**: Answers are shown with direct transcript quotes and timestamps, plus clickable sources.

---

## 🛠️ API Endpoints

- `POST /index_video` — Index a new YouTube video.
- `GET /indexing_status/<operation_id>` — Check indexing progress.
- `POST /query` — Ask a question (RAG agent).
- `GET /list_indexed_videos` — List all indexed videos.

---

## 🧬 Dependencies

- `flask`, `flask-cors`
- `faiss-cpu`
- `pydantic`
- `requests`
- `google-generativeai`
- `youtube-transcript-api`, `yt-dlp`
- `mcp`
- `numpy`, `tqdm`

---

## 💡 Example Use Case

1. **Index**: On a YouTube video, click the extension and hit "Index Video".
2. **Query**: Ask, "Why are tech companies pulling job postings?"
3. **Result**: The agent returns a synthesized answer, quoting transcript segments and providing clickable sources.

---


## 📄 License

MIT

---

**Enjoy your new YouTube RAG agent! 🚀**

---

Let me know if you want to further customize the README, add badges, or include more technical details! 

Quick Start

1

Clone the repository

git clone https://github.com/SXD390/EAG-V1-Assignment-7
2

Install dependencies

cd EAG-V1-Assignment-7
npm install
3

Follow the documentation

Check the repository's README.md file for specific installation and usage instructions.

Repository Details

OwnerSXD390
RepoEAG-V1-Assignment-7
Language
Python
License-
Last fetched8/8/2025

Recommended MCP Servers

💬

Discord MCP

Enable AI assistants to seamlessly interact with Discord servers, channels, and messages.

integrationsdiscordchat
🔗

Knit MCP

Connect AI agents to 200+ SaaS applications and automate workflows.

integrationsautomationsaas
🕷️

Apify MCP Server

Deploy and interact with Apify actors for web scraping and data extraction.

apifycrawlerdata
🌐

BrowserStack MCP

BrowserStack MCP Server for automated testing across multiple browsers.

testingqabrowsers

Zapier MCP

A Zapier server that provides automation capabilities for various apps.

zapierautomation