# nlpm-for-claude **Repository Path**: woohello/nlpm-for-claude ## Basic Information - **Project Name**: nlpm-for-claude - **Description**: https://github.com/xiaolai/nlpm-for-claude - **Primary Language**: Unknown - **License**: ISC - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-05-02 - **Last Updated**: 2026-05-02 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # nlpm Natural-Language Programming Manager -- discover, score, check, and fix NL artifacts with Claude-native intelligence. Part of the [xiaolai Claude plugin marketplace](https://github.com/xiaolai/claude-plugin-marketplace). ## What it does NLPM treats natural language artifacts as **programs that can be linted**. Just as ESLint scores JavaScript and ruff scores Python, NLPM scores the markdown files that drive AI behavior: skills, agents, commands, rules, hooks, prompts, CLAUDE.md, and memory files. Eight commands, each doing one thing: | Command | What it does | |---------|-------------| | `/nlpm:ls` | Discover and inventory all NL artifacts in a repo | | `/nlpm:score` | Score artifact quality (100-point scale) | | `/nlpm:check` | Cross-component consistency checks | | `/nlpm:fix` | Auto-fix fixable issues | | `/nlpm:trend` | Track quality score trends over time | | `/nlpm:test` | Run NL artifact tests against spec files (TDD) | | `/nlpm:init` | Initialize NLPM for a project | | `/nlpm:security-scan` | Scan plugins for security risks in executable artifacts | Claude-native -- no Codex, no external models, no API keys, no runtime dependencies. ## Installation ```bash # Project scope (recommended) claude plugin install nlpm@xiaolai --scope project # Global (all projects) claude plugin install nlpm@xiaolai --scope user ``` > **Install fails with "Plugin not found in marketplace 'xiaolai'"?** Your local marketplace clone is stale. Run `claude plugin marketplace update xiaolai` and retry — `plugin install` does not auto-refresh. ## Quick Start ``` /nlpm:ls # see what NL artifacts you have /nlpm:score # score them all /nlpm:score agents/ # score just agents /nlpm:score --changed # score only git-changed files /nlpm:check # check cross-component consistency /nlpm:fix # auto-fix what's fixable /nlpm:trend # track score history over time /nlpm:test # run NL-TDD specs ``` ## Scoring System Scores start at 100 and go down. Every issue has a fixed penalty. The score is deterministic: same artifact, same penalties, same number. | Score | Band | Meaning | |-------|------|---------| | 90-100 | Excellent | Production-ready | | 80-89 | Good | Minor gaps | | 70-79 | Adequate | Meets threshold, should improve | | 60-69 | Weak | Below threshold | | <60 | Rewrite | Fundamental problems | Default pass threshold: 70. Configure in `.claude/nlpm.local.md`. See `skills/nlpm/scoring/SKILL.md` for the full penalty tables. See `skills/nlpm/rules/SKILL.md` for the 50 Rules of Natural Language Programming. ## What it scores 13 artifact types across 3 categories: | Category | Artifacts | |----------|-----------| | A: Plugin | commands, shared partials, agents, skills, hooks, plugin.json, .mcp.json | | B: Project | CLAUDE.md, .claude/rules/, settings files | | F: Memory | ~/.claude/projects/*/memory/*.md | ## NL-TDD Write test specs BEFORE writing artifacts: ``` 1. Write spec: .nlpm-test/my-agent.spec.md 2. /nlpm:test -> RED (artifact doesn't exist) 3. Write artifact: agents/my-agent.md 4. /nlpm:test -> check trigger accuracy, output format, score 5. /nlpm:score -> verify quality score 6. Iterate -> fix until GREEN ``` See `skills/nlpm/testing/SKILL.md` for the full spec format. ## Configuration Create `.claude/nlpm.local.md` (or run `/nlpm:init`): ```yaml --- strictness: standard score_threshold: 70 rule_overrides: R09: { min_examples: 1 } # require only 1 example block R05: { threshold: 600 } # allow skills up to 600 lines R23: { budget: 800 } # increase rules budget --- ``` | Level | Threshold | Effect | |-------|-----------|--------| | Relaxed | 60 | Only flag seriously broken artifacts | | Standard | 70 | Flag artifacts that need improvement | | Strict | 80 | Flag anything below good quality | ## Continuous Enforcement NLPM ships a `PostToolUse` hook that fires when you write or edit files. A shell script (`scripts/check-artifact.sh`) classifies the file -- if it's an NL artifact, Claude reminds you to run `/nlpm:score`. Non-NL files produce no output. This is advisory -- it does not block writes. For blocking enforcement, use a `PreToolUse` hook (see tdd-guardian for an example). ## Architecture ``` commands/ User-facing commands (8 + 2 shared partials) ls.md Discover artifacts -> dispatches scanner score.md Score quality -> dispatches scorer + vague-scanner in parallel check.md Cross-component checks -> dispatches checker fix.md Auto-fix issues -> dispatches scorer trend.md Track score history -> dispatches scorer + vague-scanner test.md Run NL-TDD specs -> dispatches tester init.md Configure project security-scan.md Scan plugins for security risks -> dispatches security-scanner shared/ discover.md Artifact path patterns (not user-invocable) classify.md Type classification rules (not user-invocable) agents/ Dispatched by commands (6 agents) scanner.md haiku -- fast artifact discovery scorer.md sonnet -- 100-point quality scoring checker.md sonnet -- cross-component consistency vague-scanner.md haiku -- mechanical vague-word counting tester.md sonnet -- evaluates artifacts against test specs security-scanner.md sonnet -- security risk detection in executable artifacts skills/nlpm/ Knowledge base (13 skills) Core (loaded by agents): conventions/ Claude Code schemas, hook events, naming patterns patterns/ NL programming best practices + anti-patterns scoring/ Penalty tables with rule number cross-references rules/ The 50 Rules of Natural Language Programming (R01-R50) testing/ NL-TDD spec format, test patterns security/ Security pattern database for executable artifact scanning Writing Reference (loaded on demand): writing-skills/ How to write SKILL.md files writing-agents/ How to write agent definitions writing-rules/ How to write .claude/rules/ files writing-prompts/ Universal prompt engineering guide writing-hooks/ How to write Claude Code hooks writing-plugins/ How to design and build plugins orchestration/ Multi-agent workflow patterns hooks/ hooks.json PostToolUse advisory (command type + check-artifact.sh) scripts/ check-artifact.sh NL artifact classifier for the PostToolUse hook .nlpm-test/ Self-test specs (dogfooding NL-TDD) ``` ## Tips - **Score early, score often.** Run `/nlpm:score` after writing any new artifact. - **Use `--changed` for speed.** `score --changed` only scores git-modified files. - **Use `/nlpm:trend` before releases.** Catches regressions that individual scoring misses. - **Do not chase 100.** 85+ is excellent. The last 5-10 points are diminishing returns. - **R01 is the most common penalty.** "appropriate", "relevant", "as needed" each cost -2. Replace with measurable criteria. - **Auto-fix handles the mechanical stuff.** Focus your energy on descriptions, examples, and scope notes. ## Troubleshooting **"Score seems too low"** -- Check which penalties hit. Scoring is deterministic. Vague quantifiers stack up fast. **"Writing skill didn't load"** -- Use keywords from the skill's description: "write an agent definition", "create a new agent". **"Check found orphans that aren't really orphans"** -- Writing skills are on-demand (loaded by Claude, not referenced by agents). This is expected. **"Trend shows no history"** -- Run `/nlpm:score` first to create the baseline snapshot. ## Case Studies - [When the Linter Met Its Match](case-studies/2026-04-06-how-we-helped-gsd.md) -- Auditing the 48k-star `gsd-build/get-shit-done` project: 80 files scored, 5 PRs accepted, and the false-positive that improved NLPM itself. - [Four bytes of quoting, approved by two OpenAI engineers](case-studies/2026-04-07-openai-codex-plugin-cc.md) -- Auditing `openai/codex-plugin-cc`: 13 artifacts, 93/100 Gold tier, two shell-injection fixes reviewed and merged by OpenAI contributors in 39 hours. - [The frontmatter tax: 19 silent registration failures in a 33,000-star plugin collection](case-studies/2026-04-18-wshobson-agents.md) -- Auditing `wshobson/agents`: 100 artifacts sampled of 509, 5 PRs batched and agentically merged in 13 seconds. (Companion [learnings debrief](case-studies/2026-04-18-wshobson-agents-learnings.md) covers the sampling blind spot and pipeline race bugs surfaced by this run.) ## Auditor — Self-Evolution Pipeline The `auditor/` directory contains a GitHub Actions pipeline that systematically discovers, audits, and contributes to Claude Code repos across GitHub. Learnings feed back into NLPM's rules. ``` discover (weekly) → audit → contribute PRs → track merges → write case study ↓ feedback/log.json ↓ update NLPM rules → audit better ``` 8 workflows: `auditor-discover`, `auditor-batch-processor`, `auditor-audit`, `auditor-contribute`, `auditor-track`, `auditor-case-study`, `auditor-daily-report`, `auditor-integration-test`. Human-in-the-loop via issue labels at 3 decision points. See [auditor/README.md](auditor/README.md) for full documentation. ## Prerequisites None. Pure markdown plugin -- no Python, no Node.js, no compiled dependencies. The auditor workflows require `CLAUDE_CODE_OAUTH_TOKEN`, `PAT_TOKEN`, and `OPENAI_API_KEY` secrets on the GitHub repo. ## License ISC