Skip to main content

Run vMCP locally with the CLI

Virtual MCP Server (vMCP) is usually deployed on Kubernetes through the VirtualMCPServer custom resource, but you can also run it locally from the ToolHive CLI. The thv vmcp subcommands aggregate the MCP servers in a local ToolHive group behind a single endpoint, without a cluster or operator.

Use this mode for local development, quick evaluation, or any case where you want vMCP's aggregation, tool routing, and optimizer capabilities without the operational overhead of Kubernetes.

When to use the local CLI

  • You are developing or evaluating vMCP on your workstation.
  • You run MCP servers locally with thv run and want to expose them through a single endpoint.
  • You want to use the vMCP optimizer to reduce token usage across a local group.
  • You don't yet need the clustered, operator-managed deployment model covered in the Quickstart.

For production and multi-tenant deployments, use the Kubernetes VirtualMCPServer resource instead.

Prerequisites

  • ToolHive CLI v0.24.0 or later. Check with thv version.

  • A container runtime (Docker, Podman, or OrbStack) available to ToolHive.

  • A ToolHive group with one or more running MCP servers. To create one:

    thv group create my-group
    thv run --group my-group fetch
    thv run --group my-group github

    See Manage ToolHive groups for details.

Subcommands at a glance

The thv vmcp command has three subcommands:

SubcommandPurpose
thv vmcp initGenerate a starter YAML config from a running group
thv vmcp validateValidate a YAML config for syntax and semantic errors
thv vmcp serveStart the aggregated vMCP server

There are two ways to run the server:

  • Quick mode uses thv vmcp serve --group <name> to generate an in-memory config from a group. No YAML file is required.
  • Config-file mode uses thv vmcp init → edit → thv vmcp validatethv vmcp serve --config vmcp.yaml for reproducible or customized setups.

Quick mode

Quick mode is the fastest way to aggregate a local group. Run the server with just a group name:

thv vmcp serve --group my-group

By default, the server binds to 127.0.0.1:4483. Point your MCP client at http://127.0.0.1:4483 to access all tools from the group through a single endpoint.

Loopback-only

Quick mode always uses anonymous authentication, so thv vmcp serve --group only accepts loopback bind addresses (127.0.0.1, ::1, localhost, or the default empty value). Binding to a non-loopback interface is rejected to avoid exposing an unauthenticated server on the network. To bind to a non-loopback address, use config-file mode and configure client authentication.

Enable the optimizer in quick mode

Add --optimizer or --optimizer-embedding to replace the full tool list with find_tool and call_tool primitives:

# Tier 1: FTS5 keyword search (no external container)
thv vmcp serve --group my-group --optimizer

# Tier 2: FTS5 + semantic search using a managed TEI container
thv vmcp serve --group my-group --optimizer-embedding

See Optimizer tiers for the full comparison.

Config-file mode

Config-file mode is recommended when you need to customize backend settings, authentication, or aggregation rules, or when you want a reproducible setup checked into version control.

Step 1: Generate a starter config

thv vmcp init discovers running workloads in a group and writes a starter YAML file with one backend entry per accessible workload:

thv vmcp init --group my-group --output vmcp.yaml

Omit --output to write the generated YAML to standard output instead.

The generated file includes inline comments describing each section. A minimal example looks like this:

vmcp.yaml
# Generated by `thv vmcp init`. Review and customize before use.

name: my-group-vmcp
groupRef: my-group

incomingAuth:
type: anonymous

outgoingAuth:
source: inline

aggregation:
conflictResolution: prefix
conflictResolutionConfig:
prefixFormat: '{workload}_'

backends:
- name: fetch
url: http://127.0.0.1:12345/sse
transport: sse
- name: github
url: http://127.0.0.1:12346/mcp
transport: streamable-http

Step 2: Review and edit

Customize the generated config. Common edits include:

  • Changing incomingAuth from anonymous to oidc to require authenticated clients.
  • Adding tool filters, renames, or overrides under each backend.
  • Configuring the optimizer under an optimizer section.

See Configure vMCP for the full schema.

Step 3: Validate the config

thv vmcp validate --config vmcp.yaml

Validation checks YAML syntax, required fields, middleware configuration, and backend settings. It exits 0 on success and non-zero with a descriptive message otherwise.

Step 4: Start the server

thv vmcp serve --config vmcp.yaml

When both --config and --group are set, --config takes precedence.

Optimizer tiers

thv vmcp serve supports four tiers of tool optimization. Tier 0 is the default; tiers 1 through 3 replace the full backend tool list with find_tool and call_tool primitives that search the aggregated tool set. Tier 1 uses FTS5 keyword search only; tiers 2 and 3 add semantic embeddings on top for hybrid search.

TierFlag or settingSearchExternal service
0(none)None - all tools passed throughNone
1--optimizerFTS5 keyword (in-process)None
2--optimizer-embeddingFTS5 + TEI semanticManaged TEI container
3optimizer.embeddingService in config YAMLFTS5 + external embeddingUser-managed embedding server

Tier 2 implies Tier 1: --optimizer-embedding also enables the keyword index. For Tier 2, ToolHive starts and stops a HuggingFace Text Embeddings Inference (TEI) container named thv-embedding-<hash> automatically. Customize the model and image with --embedding-model and --embedding-image.

For the conceptual background and tuning parameters, see Optimize tool discovery and Tool optimization.

Enable audit logging

Add --enable-audit to thv vmcp serve to turn on audit logging with default settings when the loaded config doesn't already define an audit section:

thv vmcp serve --group my-group --enable-audit

For audit configuration options, see Audit logging.

Command reference

All thv vmcp flags, with their defaults:

thv vmcp serve

FlagDefaultDescription
--config, -c(empty)Path to a vMCP configuration file
--group(empty)ToolHive group name for quick mode (used when --config is not set)
--host127.0.0.1Bind address (quick mode requires a loopback address)
--port4483TCP port to listen on
--enable-auditfalseEnable audit logging with default configuration
--optimizerfalseEnable Tier 1 FTS5 keyword optimizer
--optimizer-embeddingfalseEnable Tier 2 semantic optimizer (implies --optimizer)
--embedding-modelBAAI/bge-small-en-v1.5HuggingFace model name for the managed TEI container
--embedding-imageghcr.io/huggingface/text-embeddings-inference:cpu-latestTEI container image

thv vmcp init

FlagDefaultDescription
--group, -g(required)ToolHive group name whose workloads are discovered
--output, -ostdoutOutput file path for the generated config
--config, -cstdoutAlias for --output

thv vmcp validate

FlagDefaultDescription
--config, -c(required)Path to the vMCP configuration file to validate

For full CLI help, run thv vmcp --help or see thv vmcp in the reference.

Compared to the Kubernetes deployment

AspectLocal CLI (thv vmcp)Kubernetes (VirtualMCPServer CRD)
RuntimeForeground processPod managed by the operator
ConfigurationCLI flags or local YAML fileVirtualMCPServer custom resource
Backend discoveryReads ToolHive groups on the local machineReads MCPGroup resources in the cluster
AuthenticationAnonymous in quick mode; configurable in filesFull OIDC integration via CRD fields
LifecycleTied to the terminal sessionManaged declaratively, survives restarts
Embedding serverManaged TEI container (Tier 2)EmbeddingServer custom resource

The underlying aggregation, tool routing, and optimizer logic are the same. Use the local CLI for development and single-user workflows; use the Kubernetes deployment for shared, production, or multi-user environments.

Next steps