
Open-source real-time digital human agent platform

Open-source real-time digital human agent platform

CyberVerse is an open-source digital human agent platform with real-time video calling. It lets users create an AI agent from a single photo and interact face to face through low-latency live video. The platform supports real-time facial animation, natural lip-sync, plugin-based AI components, and configurable LLM, TTS, ASR, and avatar backends.
CyberVerse is an open-source digital human agent platform that enables real-time audio/video interaction with AI-powered characters. Built on WebRTC technology, it allows users to create a lifelike AI agent from a single photo and engage in face-to-face conversations through low-latency live video calls. The platform supports real-time facial animation, natural lip-sync, plugin-based AI components, and configurable backends for LLM, TTS, ASR, and avatar systems.
CyberVerse uses WebRTC for low-latency audio/video streaming, supporting direct P2P connections or LiveKit SFU mode. The agent can receive user camera frames or screen-sharing input, enabling face-to-face interaction where the AI sees and hears you simultaneously.
The platform separates foreground and background processing: PersonaAgent handles real-time conversation, interruptions, and context switches, while SubAgents asynchronously handle long-running tasks like search, research, summarization, and report generation. This keeps voice interactions responsive even during complex operations.
Every part of the agent—brain, voice, hearing, tools, memory, and face—is a replaceable module. You can combine different omni models, LLMs, TTS, ASR, embeddings, RAG, tool calls, and avatar backends through a single configuration file, then switch providers and model combinations per scenario via the web UI.
Conversation history is saved to local disk and automatically loaded when resuming a session. You can import knowledge bases, documents, and biographical material for each character, which the system indexes for retrieval-augmented generation to keep answers aligned with the character's persona.
Just one photo. CyberVerse makes them alive. This captures the platform's core magic: transforming a single reference image into a fully interactive digital human with real-time facial animation, lip-sync, and cached idle video playback. Unlike many AI agents that remain text-only or audio-only, CyberVerse offers a visible, emotionally engaging presence that can see you, hear you, and respond with natural expressions.
You want to build a custom AI agent that goes beyond text chat—one that can hold real-time voice conversations, see your face, and display a lifelike digital avatar. CyberVerse is especially compelling if you value open-source flexibility, modular architecture, and the ability to switch between pure voice mode (no GPU required) and full video avatar mode depending on your hardware and use case.
Other tools you might consider
Loading comments…
Product Keywords