# semantic-code-search-mcp-server **Repository Path**: mirrors_elastic/semantic-code-search-mcp-server ## Basic Information - **Project Name**: semantic-code-search-mcp-server - **Description**: This project includes a Model Context Protocol (MCP) server that exposes the indexed data through a standardized set of tools. This allows AI coding agents to interact with the indexed codebase in a structured way. - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-09-05 - **Last Updated**: 2026-04-18 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Semantic Code Search MCP Server This project includes a Model Context Protocol (MCP) server that exposes the indexed data through a standardized set of tools. This allows AI coding agents to interact with the indexed codebase in a structured way. ## Prerequisites You must index your code base with the Semantic Code Search Indexer found here: https://github.com/elastic/semantic-code-search-indexer ### Index model expected by this MCP server This MCP server expects the **locations-first** index model from indexer PR `elastic/semantic-code-search-indexer#135`: - `` stores **content-deduplicated chunk documents** (semantic search + metadata). - `_locations` stores **one document per chunk occurrence** (file path + line ranges + directory/git metadata) and references chunks by `chunk_id`. Several tools query `_locations` and join back to `` via `chunk_id` (typically using `mget`). ## Running with Docker The easiest way to run the MCP server is with Docker. The server is available on Docker Hub as `simianhacker/semantic-code-search-mcp-server`. To ensure you have the latest version of the image, run the following command before running the server: ```bash docker pull simianhacker/semantic-code-search-mcp-server ``` ### HTTP Mode This mode is useful for running the server in a containerized environment where it needs to be accessible over the network. ```bash docker run --rm -p 3000:3000 \ -e ELASTICSEARCH_ENDPOINT= \ simianhacker/semantic-code-search-mcp-server ``` Replace `` with the actual endpoint of your Elasticsearch instance. ### STDIO Mode This mode is useful for running the server as a local process that an agent can communicate with over `stdin` and `stdout`. **With Elasticsearch Endpoint:** ```bash docker run -i --rm \ -e ELASTICSEARCH_ENDPOINT= \ simianhacker/semantic-code-search-mcp-server \ node dist/src/mcp_server/bin.js stdio ``` **With Elastic Cloud ID:** ```bash docker run -i --rm \ -e ELASTICSEARCH_CLOUD_ID= \ -e ELASTICSEARCH_API_KEY= \ simianhacker/semantic-code-search-mcp-server \ node dist/src/mcp_server/bin.js stdio ``` The `-i` flag is important as it tells Docker to run the container in interactive mode, which is necessary for the server to receive input from `stdin`. ### Connecting a Coding Agent You can connect a coding agent to the server in either HTTP or STDIO mode. **HTTP Mode:** For agents that connect over HTTP, like the Gemini CLI, you can add the following to your `~/.gemini/settings.json` file: ```json { "mcpServers": { "Semantic Code Search": { "trust": true, "httpUrl": "http://localhost:3000/mcp/", } } } ``` **STDIO Mode:** For agents that connect over STDIO, you need to configure them to run the Docker command directly. Here's an example for the Gemini CLI in your `~/.gemini/settings.json` file: ```json { "mcpServers": { "SemanticCodeSearch": { "command": "docker", "args": [ "run", "--rm", "-i", "-e", "ELASTICSEARCH_CLOUD_ID=", "-e", "ELASTICSEARCH_API_KEY=", "-e", "ELASTICSEARCH_INDEX=", "simianhacker/semantic-code-search-mcp-server", "node", "dist/src/mcp_server/bin.js", "stdio" ] } } } ``` Remember to replace the placeholder values for your Cloud ID, API key, and index name. ## Setup and Installation ### 1. Prerequisites - Node.js (v20 or later) - npm - An running Elasticsearch instance (v8.0 or later) with the **ELSER model downloaded and deployed**. ### 2. Clone the Repository and Install Dependencies ```bash git clone cd semantic-code-search-mcp-server npm install ``` ### 3. Configure Environment Variables Copy the `.env.example` file and update it with your Elasticsearch credentials. ```bash cp .env.example .env ``` ### 4. Compile the Code The multi-threaded worker requires the project to be compiled to JavaScript. ```bash npm run build ``` ### Running the Server The MCP server can be run in two modes: **1. Stdio Mode:** This is the default mode. The server communicates over `stdin` and `stdout`. ```bash npm run mcp-server ``` **2. HTTP Mode:** This mode is useful for running the server in a containerized environment like Docker. ```bash npm run mcp-server:http ``` The server will listen on port 3000 by default. You can change the port by setting the `PORT` environment variable. ### Usage with NPX You can also run the MCP server directly from the git repository using `npx`. This is a convenient way to run the server without having to clone the repository. **Stdio Mode:** ```bash ELASTICSEARCH_ENDPOINT=http://localhost:9200 npx github:elastic/semantic-code-search-mcp-server ``` **HTTP Mode:** ```bash PORT=8080 ELASTICSEARCH_ENDPOINT=http://localhost:9200 npx github:elastic/semantic-code-search-mcp-server http ``` ### Available Prompts | Prompt | Description | | --- | --- | | `StartInvestigation` | This prompt helps you start a "chain of investigation" to understand a codebase and accomplish a task. It follows a structured workflow that leverages the available tools to explore the code, analyze its components, and formulate a plan. | **Example:** ``` /StartInvestigation --task="add a new route to the kibana server" ``` ### Available Tools The MCP server provides the following tools: | Tool | Description | | --- | --- | | `semantic_code_search` | Performs a semantic search on the code chunks in the index. This tool can combine a semantic query with a KQL filter to provide flexible and powerful search capabilities. | | `map_symbols_by_query` | Query for a structured map of files containing specific symbols, grouped by file path. This is useful for finding all the symbols in a specific file or directory. Accepts an optional `size` parameter to control the number of files returned. | | `symbol_analysis` | Analyzes a symbol and returns a report of its definitions, call sites, and references. This is useful for understanding the role of a symbol in the codebase. | | `read_file_from_chunks` | Reads the content of a file from the index, providing a reconstructed view based on the most important indexed chunks. | | `document_symbols` | Analyzes a file to identify the key symbols that would most benefit from documentation. This is useful for automating the process of improving the semantic quality of a codebase. | | `auth_status` | Returns your current OAuth authentication status: client ID, granted scopes, and token expiry. Only available when `SCS_MCP_OAUTH_ENABLED=true`. Never includes the token value. | **Note:** All of the tools accept an optional `index` parameter that allows you to override the `ELASTICSEARCH_INDEX` for a single query. --- ## OAuth 2.0 Authentication (HTTP mode) The HTTP server supports OAuth 2.0 bearer token authentication. When enabled, MCP clients (Claude Code, VS Code, Cursor) automatically discover the authorization server, obtain a token, and present it on every request. The server only validates tokens — it never issues them. ### Prerequisites - An OIDC-compliant authorization server (Okta, Auth0, Keycloak, etc.) - **The server must be reachable at its own dedicated (sub)domain.** MCP clients fetch `/.well-known/oauth-protected-resource` from the root of the server's domain to discover the authorization server. This well-known URI must resolve at the domain root per [RFC 8615](https://www.rfc-editor.org/rfc/rfc8615) section 3 and [RFC 9728](https://www.rfc-editor.org/rfc/rfc9728) section 3. A subpath deployment (e.g. `https://shared.example.com/my-mcp`) will not work. - **Okta app type must be SPA (not Web).** MCP clients use the Authorization Code + PKCE flow ([RFC 7636](https://www.rfc-editor.org/rfc/rfc7636)) without a client secret. Web app type requires a client secret for code exchange and will fail. ### JWKS validation (default — no secrets required) The server validates JWTs locally using the provider's public JWKS endpoint discovered from the issuer's OIDC configuration. ```bash SCS_MCP_OAUTH_ENABLED=true SCS_MCP_OAUTH_ISSUER=https://your-okta.okta.com/oauth2/default SCS_MCP_SERVER_URL=https://your-server.example.com # must be the server's public URL # Optional: SCS_MCP_OAUTH_AUDIENCE=api://default # for Okta non-URL audience strings SCS_MCP_OAUTH_REQUIRED_SCOPES=openid # space-separated; minimum "openid" for Okta ``` ### Token introspection (opt-in — requires client credentials) Activated when both `SCS_MCP_OAUTH_CLIENT_ID` and `SCS_MCP_OAUTH_CLIENT_SECRET` are set. The server calls the provider's [RFC 7662](https://www.rfc-editor.org/rfc/rfc7662) introspection endpoint on every request. Use this when the provider issues opaque (non-JWT) tokens, or when real-time revocation checking is required. ```bash SCS_MCP_OAUTH_ENABLED=true SCS_MCP_OAUTH_ISSUER=https://your-keycloak.com/realms/myrealm SCS_MCP_OAUTH_CLIENT_ID=my-resource-server SCS_MCP_OAUTH_CLIENT_SECRET=super-secret SCS_MCP_SERVER_URL=https://your-server.example.com ``` ### Docker (HTTP mode with OAuth) ```bash docker run --rm -p 3000:3000 \ -e ELASTICSEARCH_ENDPOINT=https://... \ -e SCS_MCP_OAUTH_ENABLED=true \ -e SCS_MCP_OAUTH_ISSUER=https://your-okta.okta.com/oauth2/default \ -e SCS_MCP_SERVER_URL=https://your-server.example.com \ -e SCS_MCP_OAUTH_REQUIRED_SCOPES=openid \ simianhacker/semantic-code-search-mcp-server ``` ### Required scopes `SCS_MCP_OAUTH_REQUIRED_SCOPES` controls which scopes the server advertises and requires on every token. The minimum recommended value for Okta is `openid`. Setting it to an empty string causes Okta to reject the authorization request with a "no scopes configured" error. Note: scopes like `offline_access` and `email` work with Okta and the major IDEs but are not guaranteed by any standard. Avoid them if you need M2M (client credentials) access. ### Local development without OAuth When `SCS_MCP_SERVER_URL` is not set (or points to localhost), the server binds to `127.0.0.1` only. Set `SCS_MCP_SERVER_URL` to a non-localhost URL to bind to all interfaces (required for Docker containers and reverse proxy deployments). ### Restricting access to specific OAuth clients By default, any token issued by the configured authorization server is accepted. To restrict access to a specific app: ```bash SCS_MCP_OAUTH_ALLOWED_CLIENT_IDS=0oa1abc123def456gh78 # space-separated for multiple IDs ``` This is recommended when multiple OAuth apps share the same authorization server (common in Okta). Without it, tokens from any app in the tenant that targets the same audience will be accepted. The server checks the `client_id`, `azp`, or `cid` (Okta-specific) claim in the JWT, whichever is present. ### Checking your auth status When OAuth is enabled, an `auth_status` tool is available in all MCP clients. Ask the AI assistant to call it: > "Call the auth_status tool" It returns your client ID, granted scopes, and token expiry — nothing sensitive (the token itself is never included). ### Token lifetime Clients re-authenticate when their access token expires. To reduce auth prompts, increase the access token lifetime in your authorization server. For Okta: Admin → Security → API → Authorization Servers → default → Access Policies. --- ## Configuration Configuration is managed via environment variables in a `.env` file. | Variable | Description | Default | | --- | --- | --- | | `ELASTICSEARCH_CLOUD_ID` | The Cloud ID for your Elastic Cloud instance. | | | `ELASTICSEARCH_API_KEY` | An API key for Elasticsearch authentication. | | | `ELASTICSEARCH_INDEX` | The name of the Elasticsearch index to use. | `semantic-code-search` | | `SCS_MCP_OAUTH_ENABLED` | Enable OAuth 2.0 bearer token authentication (HTTP mode only). | `false` | | `SCS_MCP_OAUTH_ISSUER` | OIDC issuer URL. Required when `SCS_MCP_OAUTH_ENABLED=true`. | | | `SCS_MCP_OAUTH_AUDIENCE` | Expected `aud` claim override. Use for Okta non-URL audiences (e.g. `api://default`). | | | `SCS_MCP_OAUTH_REQUIRED_SCOPES` | Space-separated scopes the server requires on every token. Minimum `openid` for Okta. | | | `SCS_MCP_OAUTH_ALLOWED_CLIENT_IDS` | Space-separated allowlist of OAuth client IDs. Empty = any client from the issuer. | | | `SCS_MCP_OAUTH_CLIENT_ID` | Client ID for token introspection (opt-in). Requires `SCS_MCP_OAUTH_CLIENT_SECRET`. | | | `SCS_MCP_OAUTH_CLIENT_SECRET` | Client secret for token introspection (opt-in). | | | `SCS_MCP_SERVER_URL` | Public URL of the server. Required for OAuth and non-localhost deployments. | |