Video Q&A
Ask any question about video content. The AI analyzes frames and the optional audio transcript to answer based on what’s actually in the video.
Video Context MCP is a Model Context Protocol server that gives AI coding assistants — GitHub Copilot, Cursor, and Claude Code — the ability to analyze video content using natural language.
Just ask your assistant a question about a video and it figures out the rest: extracting frames, calling the right AI provider, and returning an answer grounded in the actual video content.
Video Q&A
Ask any question about video content. The AI analyzes frames and the optional audio transcript to answer based on what’s actually in the video.
Video Summarization
Generate structured summaries with key scenes, topics, and timelines. Ideal for long recordings, lectures, and meetings.
Frame Extraction
Extract frames at specific timestamps, scene changes, or fixed intervals. Four flexible extraction modes.
Timestamp Search
Describe what you’re looking for and get back the exact timestamp — no manual scrubbing required.
Audio Transcription
Transcribe speech with paragraph-level timestamps. Export as SRT, VTT, or JSON. Includes speaker diarization and speech-to-English translation.
Video Metadata
Retrieve duration, resolution, fps, codec, and other technical details instantly — no AI API call needed.
Five AI video providers and four audio providers keep the server running even when one is unavailable:
Video: Gemini (default, free) → GLM-4.6V → Qwen3.6 → Kimi K2.5 → MiMo-V2 Omni
Audio: Deepgram (default) → AssemblyAI → Groq/Whisper → Gemini
The server automatically falls back to the next available provider.