MCP Server

zrlck

public

DexTest

Agent designed by Dex and configured by us to implement new user-interactive functions and memory efficiency, load, and management.

Repository Info

Stars

Forks

Watchers

Issues

TypeScript

Language

License

View on GitHubGitHub Download DocumentationDocs

About This Server

Agent designed by Dex and configured by us to implement new user-interactive functions and memory efficiency, load, and management.

Model Context Protocol (MCP) - This server can be integrated with AI applications to provide additional context and capabilities, enabling enhanced AI interactions and functionality.

Documentation

# Dex MCP Server

A Model Context Protocol (MCP) server that enables AI models to perform complete browser automation through the [Dex browser extension](https://chromewebstore.google.com/detail/dex/ignaljemchdlmoimaliinmonecgbdnkk?authuser=3&hl=en). You'll have to login to the account you signed up with for your hackathon to download the extension.

Note: you can choose which MCP tools to expose to the agent, and in fact define your own agents as MCP tool calls (eg, @mcp_tool github_agent(instruction: str)), and you can internally chain together the tool calls already built out to interface with the user's browser.

## From author

There might be a couple of bugs so would appreciate feedback / reports so I can put out some hotfixes as we go :)

## Overview

This MCP server provides browser automation capabilities including tab management, navigation, DOM interaction, and visual analysis.

## Features

### 📑 Tab Management
- **`get_tabs`**: Get all open browser tabs with titles and URLs
- **`select_tab(tab_id)`**: Switch to a specific browser tab
- **`new_tab(url?)`**: Create new tab, optionally with a URL
- **`close_tab(tab_id?)`**: Close specific tab or active tab

### 🧭 Navigation
- **`navigate(url, tab_id?)`**: Navigate to URL in active or specific tab
- **`search_google(query, tab_id?)`**: Perform Google search

### 🖱️ DOM Interaction
- **`click_element(element_id, tab_id?)`**: Click DOM elements by ID
- **`input_text(element_id, text, tab_id?)`**: Type text into form fields
- **`send_keys(keys, tab_id?)`**: Send keyboard shortcuts (Ctrl+C, Enter, etc.)

### 📸 Visual Analysis
- **`screenshot()`**: Capture screenshot of active tab
- **`capture_with_highlights(tab_id?)`**: Screenshot with interactive element highlights
- **`grab_dom(tab_id?)`**: Get formatted DOM structure with XPath mappings

## Setup

1. Install dependencies:
```bash
pip install -r requirements.txt
```

2. Start the MCP server:
```bash
uv run main.py
```

The server will:
- Start an MCP server with SSE transport on the default port
- Start a WebSocket server on `ws://127.0.0.1:8765` for browser extension connections

## Browser Extension Integration

The server connects via WebSocket to `ws://127.0.0.1:8765` and handles these message types:

### Message Types

| Type | Parameters | Description |
|------|------------|-------------|
| `get_tabs` | None | Get all browser tabs |
| `screenshot` | None | Screenshot active tab |
| `navigate` | `url`, `tab_id?` | Navigate to URL |
| `select_tab` | `tab_id` | Switch to tab |
| `new_tab` | `url?` | Create new tab |
| `close_tab` | `tab_id?` | Close tab |
| `search_google` | `query`, `tab_id?` | Google search |
| `click_element` | `element_id`, `tab_id?` | Click DOM element |
| `input_text` | `element_id`, `text`, `tab_id?` | Type into element |
| `send_keys` | `keys`, `tab_id?` | Send keyboard input |
| `grab_dom` | `tab_id?` | Get DOM structure |
| `capture_with_highlights` | `tab_id?` | Screenshot with highlights |

### Extension Response Examples

**get_tabs:**
```json
{
  "result": {
    "action": "get_all_tabs",
    "success": true,
    "data": [
      {"id": 1, "title": "Example", "url": "https://example.com"},
      {"id": 2, "title": "Google", "url": "https://google.com"}
    ]
  }
}
```

**screenshot:**
```json
{
  "result": {
    "action": "screenshot_active_tab",
    "success": true,
    "message": "Screenshot captured",
    "data": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA..."
  }
}
```

**new_tab:**
```json
{
  "result": {
    "action": "new_tab",
    "success": true,
    "message": "New tab opened",
    "data": {"id": 1234}
  }
}
```

**grab_dom:**
```json
{
  "result": {
    "success": true,
    "data": {
      "processedOutput": "Formatted DOM structure...",
      "highlightToXPath": {
        "1": "/html/body/button[1]",
        "2": "/html/body/form/input[1]"
      },
      "html": "<html>...</html>"
    }
  }
}
```

**capture_with_highlights:**
```json
{
  "result": {
    "action": "capture_tab_with_highlights",
    "success": true,
    "message": "Screenshot captured with highlight data",
    "data": {
      "dataUrl": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA...",
      "highlightCount": 15
    }
  }
}
```

## Tool Parameters Reference

### Required Parameters
- `navigate`: `url`
- `select_tab`: `tab_id`  
- `search_google`: `query`
- `click_element`: `element_id`
- `input_text`: `element_id`, `text`
- `send_keys`: `keys`

### Optional Parameters
- `tab_id`: Most tools support targeting specific tabs (defaults to active tab)
- `url`: `new_tab` can optionally specify initial URL

## Example Workflows

**Multi-step Automation:**
1. `get_tabs()` - See all open tabs
2. `new_tab("https://example.com")` - Open new tab
3. `grab_dom()` - Analyze page structure  
4. `click_element("search-button")` - Interact with page
5. `input_text("search-input", "query")` - Fill forms
6. `send_keys("Enter")` - Submit forms
7. `capture_with_highlights()` - Take annotated screenshot

## Architecture

- **main.py**: Entry point and MCP tool definitions
- **context.py**: WebSocket connection management and message handling
- **ws_server.py**: WebSocket server for browser extension connections
- **tools/browser.py**: Complete browser action implementations

## Development

To add new browser tools:

1. Create the tool function in `tools/browser.py` with proper parameter documentation
2. Add the MCP tool wrapper in `main.py` 
3. Add message handler in browser extension's WebSocket bridge
4. Update this documentation

All tools return formatted strings describing action results and extracted data.

## Logging

The server logs all important events:
- WebSocket connections/disconnections
- Message exchanges with browser extension
- Action successes/failures
- Errors and timeouts

Quick Start

Clone the repository

git clone https://github.com/zrlck/DexTest

Install dependencies

cd DexTest
npm install

Follow the documentation

Check the repository's README.md file for specific installation and usage instructions.

Repository Details

Ownerzrlck

RepoDexTest

Language

TypeScript

License-

Last fetched8/8/2025

Quick Links

Issues

Releases

License

Recommended MCP Servers

💬

Discord MCP

Enable AI assistants to seamlessly interact with Discord servers, channels, and messages.

integrationsdiscordchat

🔗

Knit MCP

Connect AI agents to 200+ SaaS applications and automate workflows.

integrationsautomationsaas

🕷️

Apify MCP Server

Deploy and interact with Apify actors for web scraping and data extraction.

apifycrawlerdata

🌐

BrowserStack MCP

BrowserStack MCP Server for automated testing across multiple browsers.

testingqabrowsers

⚡

Zapier MCP

A Zapier server that provides automation capabilities for various apps.

zapierautomation