Run vMCP locally with the CLI

Virtual MCP Server (vMCP) is usually deployed on Kubernetes through the VirtualMCPServer custom resource, but you can also run it locally from the ToolHive CLI. The thv vmcp subcommands aggregate the MCP servers in a local ToolHive group behind a single endpoint, without a cluster or operator.

Use this mode for local development, quick evaluation, or any case where you want vMCP's aggregation, tool routing, and optimizer capabilities without the operational overhead of Kubernetes.

When to use the local CLI

You are developing or evaluating vMCP on your workstation.
You run MCP servers locally with thv run and want to expose them through a single endpoint.
You want to use the vMCP optimizer to reduce token usage across a local group.
You don't yet need the clustered, operator-managed deployment model covered in the Quickstart.

For production and multi-tenant deployments, use the Kubernetes VirtualMCPServer resource instead.

Prerequisites

ToolHive CLI v0.24.0 or later. Check with thv version.
A container runtime (Docker, Podman, or OrbStack) available to ToolHive.
A ToolHive group with one or more running MCP servers. To create one:
```
thv group create my-group
thv run --group my-group fetch
thv run --group my-group github
```
See Manage ToolHive groups for details.

Subcommands at a glance

The thv vmcp command has three subcommands:

Subcommand	Purpose
`thv vmcp init`	Generate a starter YAML config from a running group
`thv vmcp validate`	Validate a YAML config for syntax and semantic errors
`thv vmcp serve`	Start the aggregated vMCP server

There are two ways to run the server:

Quick mode uses thv vmcp serve --group <name> to generate an in-memory config from a group. No YAML file is required.
Config-file mode uses thv vmcp init → edit → thv vmcp validate → thv vmcp serve --config vmcp.yaml for reproducible or customized setups.

Quick mode

Quick mode is the fastest way to aggregate a local group. Run the server with just a group name:

thv vmcp serve --group my-group

By default, the server binds to 127.0.0.1:4483 using the Streamable HTTP transport protocol. Point your MCP client at http://127.0.0.1:4483/mcp to access all tools from the group through a single endpoint.

Loopback-only

Quick mode always uses anonymous authentication, so thv vmcp serve --group only accepts loopback bind addresses (127.0.0.1, ::1, localhost, or the default empty value). Binding to a non-loopback interface is rejected to avoid exposing an unauthenticated server on the network. To bind to a non-loopback address, use config-file mode and configure client authentication.

Enable the optimizer in quick mode

Add --optimizer or --optimizer-embedding to replace the full tool list with find_tool and call_tool primitives:

# Tier 1: FTS5 keyword search (no external container)
thv vmcp serve --group my-group --optimizer

# Tier 2: FTS5 + semantic search using a managed TEI container
thv vmcp serve --group my-group --optimizer-embedding

See Optimizer tiers for the full comparison.

Config-file mode

Config-file mode is recommended when you need to customize backend settings, authentication, or aggregation rules, or when you want a reproducible setup checked into version control.

Step 1: Generate a starter config

thv vmcp init discovers running workloads in a group and writes a starter YAML file with one backend entry per accessible workload:

thv vmcp init --group my-group --output vmcp.yaml

Omit --output to write the generated YAML to standard output instead.

The generated file includes inline comments describing each section. A minimal example looks like this:

vmcp.yaml
# Generated by `thv vmcp init`. Review and customize before use.

name: my-group-vmcp
groupRef: my-group

incomingAuth:
  type: anonymous

outgoingAuth:
  source: inline

aggregation:
  conflictResolution: prefix
  conflictResolutionConfig:
    prefixFormat: '{workload}_'

backends:
  - name: fetch
    url: http://127.0.0.1:12345/sse
    transport: sse
  - name: github
    url: http://127.0.0.1:12346/mcp
    transport: streamable-http

Step 2: Review and edit

Customize the generated config. Common edits include:

Changing incomingAuth from anonymous to oidc to require authenticated clients.
Adding tool filters, renames, or overrides under aggregation.tools.
Configuring the optimizer under an optimizer section.

Filter tools per workload

Use aggregation.tools to expose only a curated subset of tools from each backend. Tools not listed in filter are hidden from tools/list responses.

vmcp.yaml
aggregation:
  conflictResolution: prefix
  conflictResolutionConfig:
    prefixFormat: '{workload}_'
  tools:
    - workload: fetch
      filter:
        - fetch
    - workload: filesystem
      filter:
        - read_file
        - write_file
        - list_directory

With this config, a client calling tools/list sees three tools (filesystem_read_file, filesystem_write_file, filesystem_list_directory) plus the single fetch_fetch tool — instead of all tools exposed by both backends.

You can also rename tools or override descriptions without modifying the backends:

vmcp.yaml
aggregation:
  tools:
    - workload: fetch
      overrides:
        fetch:
          description: 'Retrieve any URL and return its content as text'

To hide all backend tools globally (or per workload) and expose only composite tools to clients, use aggregation.excludeAllTools or aggregation.tools[].excludeAll. Hidden tools are removed from tools/list but remain routable internally. See Excluding all tools for examples.

For the full filter and override reference, see Tool aggregation.

See Configure vMCP for the full schema.

Step 3: Validate the config

thv vmcp validate --config vmcp.yaml

Validation checks YAML syntax, required fields, middleware configuration, and backend settings. It exits 0 on success and non-zero with a descriptive message otherwise.

Step 4: Start the server

thv vmcp serve --config vmcp.yaml

When both --config and --group are set, --config takes precedence.

Optimizer tiers

thv vmcp serve supports four tiers of tool optimization. Tier 0 is the default; tiers 1 through 3 replace the full backend tool list with find_tool and call_tool primitives that search the aggregated tool set. Tier 1 uses FTS5 keyword search only; tiers 2 and 3 add semantic embeddings on top for hybrid search.

Tier	Flag or setting	Search	External service
0	(none)	None - all tools passed through	None
1	`--optimizer`	FTS5 keyword (in-process)	None
2	`--optimizer-embedding`	FTS5 + TEI semantic	Managed TEI container
3	`optimizer.embeddingService` in config YAML	FTS5 + external embedding	User-managed embedding server

Tier 2 implies Tier 1: --optimizer-embedding also enables the keyword index. For Tier 2, ToolHive starts and stops a HuggingFace Text Embeddings Inference (TEI) container named thv-embedding-<hash> automatically. Customize the model and image with --embedding-model and --embedding-image.

For Tier 3, you can point at any HuggingFace TEI server or at an OpenAI-compatible /embeddings endpoint (OpenAI, Azure OpenAI, or another compatible gateway). Set embeddingProvider: openai and embeddingModel alongside embeddingService, and supply the API key via the OPENAI_API_KEY environment variable (omit it for keyless gateways). The default is tei, so existing Tier 3 configs continue to work unchanged.

For the conceptual background and tuning parameters, see Optimize tool discovery and Tool optimization.

Enable audit logging

Add --enable-audit to thv vmcp serve to turn on audit logging with default settings when the loaded config doesn't already define an audit section:

thv vmcp serve --group my-group --enable-audit

For audit configuration options, see Audit logging.

Command reference

All thv vmcp flags, with their defaults:

`thv vmcp serve`

Flag	Default	Description
`--config`, `-c`	(empty)	Path to a vMCP configuration file
`--group`	(empty)	ToolHive group name for quick mode (used when `--config` is not set)
`--host`	`127.0.0.1`	Bind address (quick mode requires a loopback address)
`--port`	`4483`	TCP port to listen on
`--enable-audit`	`false`	Enable audit logging with default configuration
`--optimizer`	`false`	Enable Tier 1 FTS5 keyword optimizer
`--optimizer-embedding`	`false`	Enable Tier 2 semantic optimizer (implies `--optimizer`)
`--embedding-model`	`BAAI/bge-small-en-v1.5`	HuggingFace model name for the managed TEI container
`--embedding-image`	`ghcr.io/huggingface/text-embeddings-inference:cpu-latest`	TEI container image
`--session-ttl`	`30m`	Session inactivity timeout as a Go duration (`30m`, `2h`, `168h`)

`thv vmcp init`

Flag	Default	Description
`--group`, `-g`	(required)	ToolHive group name whose workloads are discovered
`--output`, `-o`	stdout	Output file path for the generated config
`--config`, `-c`	stdout	Alias for `--output`

`thv vmcp validate`

Flag	Default	Description
`--config`, `-c`	(required)	Path to the vMCP configuration file to validate

For full CLI help, run thv vmcp --help or see thv vmcp in the reference.

Compared to the Kubernetes deployment

Aspect	Local CLI (`thv vmcp`)	Kubernetes (`VirtualMCPServer` CRD)
Runtime	Foreground process	Pod managed by the operator
Configuration	CLI flags or local YAML file	`VirtualMCPServer` custom resource
Backend discovery	Reads ToolHive groups on the local machine	Reads `MCPGroup` resources in the cluster
Authentication	Anonymous in quick mode; configurable in files	Full OIDC integration via CRD fields
Lifecycle	Tied to the terminal session	Managed declaratively, survives restarts
Embedding server	Managed TEI container (Tier 2)	`EmbeddingServer` custom resource

The underlying aggregation, tool routing, and optimizer logic are the same. Use the local CLI for development and single-user workflows; use the Kubernetes deployment for shared, production, or multi-user environments.

Next steps

Configure vMCP to customize backends, authentication, and aggregation rules.
Optimize tool discovery to tune find_tool and call_tool for large toolsets.
Deploy vMCP on Kubernetes when you're ready to move to a production-grade deployment.

When to use the local CLI​

Prerequisites​

Subcommands at a glance​

Quick mode​

Enable the optimizer in quick mode​

Config-file mode​

Step 1: Generate a starter config​

Step 2: Review and edit​

Filter tools per workload​

Step 3: Validate the config​

Step 4: Start the server​

Optimizer tiers​

Enable audit logging​

Command reference​

thv vmcp serve​

thv vmcp init​

thv vmcp validate​

Compared to the Kubernetes deployment​

Next steps​

Related information​