Run vMCP locally with the CLI
Virtual MCP Server (vMCP) is usually deployed on Kubernetes through the
VirtualMCPServer custom resource, but you can also run it locally from the
ToolHive CLI. The thv vmcp subcommands aggregate the MCP servers in a local
ToolHive group behind a single endpoint, without a
cluster or operator.
Use this mode for local development, quick evaluation, or any case where you want vMCP's aggregation, tool routing, and optimizer capabilities without the operational overhead of Kubernetes.
When to use the local CLI
- You are developing or evaluating vMCP on your workstation.
- You run MCP servers locally with
thv runand want to expose them through a single endpoint. - You want to use the vMCP optimizer to reduce token usage across a local group.
- You don't yet need the clustered, operator-managed deployment model covered in the Quickstart.
For production and multi-tenant deployments, use the Kubernetes
VirtualMCPServer resource instead.
Prerequisites
-
ToolHive CLI v0.24.0 or later. Check with
thv version. -
A container runtime (Docker, Podman, or OrbStack) available to ToolHive.
-
A ToolHive group with one or more running MCP servers. To create one:
thv group create my-groupthv run --group my-group fetchthv run --group my-group githubSee Manage ToolHive groups for details.
Subcommands at a glance
The thv vmcp command has three subcommands:
| Subcommand | Purpose |
|---|---|
thv vmcp init | Generate a starter YAML config from a running group |
thv vmcp validate | Validate a YAML config for syntax and semantic errors |
thv vmcp serve | Start the aggregated vMCP server |
There are two ways to run the server:
- Quick mode uses
thv vmcp serve --group <name>to generate an in-memory config from a group. No YAML file is required. - Config-file mode uses
thv vmcp init→ edit →thv vmcp validate→thv vmcp serve --config vmcp.yamlfor reproducible or customized setups.
Quick mode
Quick mode is the fastest way to aggregate a local group. Run the server with just a group name:
thv vmcp serve --group my-group
By default, the server binds to 127.0.0.1:4483. Point your MCP client at
http://127.0.0.1:4483 to access all tools from the group through a single
endpoint.
Quick mode always uses anonymous authentication, so thv vmcp serve --group
only accepts loopback bind addresses (127.0.0.1, ::1, localhost, or the
default empty value). Binding to a non-loopback interface is rejected to avoid
exposing an unauthenticated server on the network. To bind to a non-loopback
address, use config-file mode and configure client
authentication.
Enable the optimizer in quick mode
Add --optimizer or --optimizer-embedding to replace the full tool list with
find_tool and call_tool primitives:
# Tier 1: FTS5 keyword search (no external container)
thv vmcp serve --group my-group --optimizer
# Tier 2: FTS5 + semantic search using a managed TEI container
thv vmcp serve --group my-group --optimizer-embedding
See Optimizer tiers for the full comparison.
Config-file mode
Config-file mode is recommended when you need to customize backend settings, authentication, or aggregation rules, or when you want a reproducible setup checked into version control.
Step 1: Generate a starter config
thv vmcp init discovers running workloads in a group and writes a starter YAML
file with one backend entry per accessible workload:
thv vmcp init --group my-group --output vmcp.yaml
Omit --output to write the generated YAML to standard output instead.
The generated file includes inline comments describing each section. A minimal example looks like this:
# Generated by `thv vmcp init`. Review and customize before use.
name: my-group-vmcp
groupRef: my-group
incomingAuth:
type: anonymous
outgoingAuth:
source: inline
aggregation:
conflictResolution: prefix
conflictResolutionConfig:
prefixFormat: '{workload}_'
backends:
- name: fetch
url: http://127.0.0.1:12345/sse
transport: sse
- name: github
url: http://127.0.0.1:12346/mcp
transport: streamable-http
Step 2: Review and edit
Customize the generated config. Common edits include:
- Changing
incomingAuthfromanonymoustooidcto require authenticated clients. - Adding tool filters, renames, or overrides under each backend.
- Configuring the optimizer under an
optimizersection.
See Configure vMCP for the full schema.
Step 3: Validate the config
thv vmcp validate --config vmcp.yaml
Validation checks YAML syntax, required fields, middleware configuration, and
backend settings. It exits 0 on success and non-zero with a descriptive
message otherwise.
Step 4: Start the server
thv vmcp serve --config vmcp.yaml
When both --config and --group are set, --config takes precedence.
Optimizer tiers
thv vmcp serve supports four tiers of tool optimization. Tier 0 is the
default; tiers 1 through 3 replace the full backend tool list with find_tool
and call_tool primitives that search the aggregated tool set. Tier 1 uses FTS5
keyword search only; tiers 2 and 3 add semantic embeddings on top for hybrid
search.
| Tier | Flag or setting | Search | External service |
|---|---|---|---|
| 0 | (none) | None - all tools passed through | None |
| 1 | --optimizer | FTS5 keyword (in-process) | None |
| 2 | --optimizer-embedding | FTS5 + TEI semantic | Managed TEI container |
| 3 | optimizer.embeddingService in config YAML | FTS5 + external embedding | User-managed embedding server |
Tier 2 implies Tier 1: --optimizer-embedding also enables the keyword index.
For Tier 2, ToolHive starts and stops a HuggingFace Text Embeddings Inference
(TEI) container named thv-embedding-<hash> automatically. Customize the model
and image with --embedding-model and --embedding-image.
For the conceptual background and tuning parameters, see Optimize tool discovery and Tool optimization.
Enable audit logging
Add --enable-audit to thv vmcp serve to turn on audit logging with default
settings when the loaded config doesn't already define an audit section:
thv vmcp serve --group my-group --enable-audit
For audit configuration options, see Audit logging.
Command reference
All thv vmcp flags, with their defaults:
thv vmcp serve
| Flag | Default | Description |
|---|---|---|
--config, -c | (empty) | Path to a vMCP configuration file |
--group | (empty) | ToolHive group name for quick mode (used when --config is not set) |
--host | 127.0.0.1 | Bind address (quick mode requires a loopback address) |
--port | 4483 | TCP port to listen on |
--enable-audit | false | Enable audit logging with default configuration |
--optimizer | false | Enable Tier 1 FTS5 keyword optimizer |
--optimizer-embedding | false | Enable Tier 2 semantic optimizer (implies --optimizer) |
--embedding-model | BAAI/bge-small-en-v1.5 | HuggingFace model name for the managed TEI container |
--embedding-image | ghcr.io/huggingface/text-embeddings-inference:cpu-latest | TEI container image |
thv vmcp init
| Flag | Default | Description |
|---|---|---|
--group, -g | (required) | ToolHive group name whose workloads are discovered |
--output, -o | stdout | Output file path for the generated config |
--config, -c | stdout | Alias for --output |
thv vmcp validate
| Flag | Default | Description |
|---|---|---|
--config, -c | (required) | Path to the vMCP configuration file to validate |
For full CLI help, run thv vmcp --help or see
thv vmcp in the reference.
Compared to the Kubernetes deployment
| Aspect | Local CLI (thv vmcp) | Kubernetes (VirtualMCPServer CRD) |
|---|---|---|
| Runtime | Foreground process | Pod managed by the operator |
| Configuration | CLI flags or local YAML file | VirtualMCPServer custom resource |
| Backend discovery | Reads ToolHive groups on the local machine | Reads MCPGroup resources in the cluster |
| Authentication | Anonymous in quick mode; configurable in files | Full OIDC integration via CRD fields |
| Lifecycle | Tied to the terminal session | Managed declaratively, survives restarts |
| Embedding server | Managed TEI container (Tier 2) | EmbeddingServer custom resource |
The underlying aggregation, tool routing, and optimizer logic are the same. Use the local CLI for development and single-user workflows; use the Kubernetes deployment for shared, production, or multi-user environments.
Next steps
- Configure vMCP to customize backends, authentication, and aggregation rules.
- Optimize tool discovery to tune
find_toolandcall_toolfor large toolsets. - Deploy vMCP on Kubernetes when you're ready to move to a production-grade deployment.