Video Context MCP

Give your AI coding assistant the ability to understand video content.

What it does

Video Context MCP is a Model Context Protocol server that gives AI coding assistants — GitHub Copilot, Cursor, and Claude Code — the ability to analyze video content using natural language.

Just ask your assistant a question about a video and it figures out the rest: extracting frames, calling the right AI provider, and returning an answer grounded in the actual video content.

Features

Video Q&A

Ask any question about video content. The AI analyzes frames and the optional audio transcript to answer based on what’s actually in the video.

Video Summarization

Generate structured summaries with key scenes, topics, and timelines. Ideal for long recordings, lectures, and meetings.

Frame Extraction

Extract frames at specific timestamps, scene changes, or fixed intervals. Four flexible extraction modes.

Timestamp Search

Describe what you’re looking for and get back the exact timestamp — no manual scrubbing required.

Audio Transcription

Transcribe speech with paragraph-level timestamps. Export as SRT, VTT, or JSON. Includes speaker diarization and speech-to-English translation.

Video Metadata

Retrieve duration, resolution, fps, codec, and other technical details instantly — no AI API call needed.

Multi-Provider Support

Five AI video providers and four audio providers keep the server running even when one is unavailable:

Video: Gemini (default, free) → GLM-4.6V → Qwen3.6 → Kimi K2.5 → MiMo-V2 Omni

Audio: Deepgram (default) → AssemblyAI → Groq/Whisper → Gemini

The server automatically falls back to the next available provider.