The key metrics to watch are benchmark task success rate (did your MCP actually help the agent solve more problems?), tool adoption rate (how often the agent chooses to call your tool), tool failure rate, and token efficiency (whether your tool saves or wastes context window).
Building mcpbr taught me that MCP servers should be tested like APIs, not like plugins. APIs have contracts they are expected to fill, while plugins mostly need to avoid crashing. An MCP server needs to not only return the right size and shape of data, it also needs to fulfill the implicit promise made in its tool description, or agents won't keep reaching for it. These metrics go beyond simple pass/fail and capture whether your server is actually pulling its weight in an agent workflow.