For the complete documentation index, see llms.txt.

Documentation

Repository docs

This route renders the repository README and markdown under docs/ .

Source: docs/HELP_AGENT_BENCHMARK.md

Rendered document

docs/HELP_AGENT_BENCHMARK.md

Parsed server-side (markdown to HTML in the app). Same bytes you get from the checkout.

Help Agent Benchmark

This benchmark compares the current help-agent retriever against a Moss-backed candidate on the same troubleshooting prompts.

What it measures

  • Retrieval quality: whether the top returned context includes the expected troubleshooting signals.
  • Latency: time spent retrieving context for each prompt.
  • Practical fit: how many docs/chunks are returned and whether the candidate is easy to plug in.

Baseline

The baseline uses the current help-agent retrieval path in src/lib/helpAgentDocs.ts.

Candidate

The candidate is a Moss-backed retrieval endpoint or adapter.

The runner expects the candidate endpoint to accept:

{
  "question": "Why did my recent deployment fail?",
  "history": []
}

and return either:

{
  "docs": [{ "source": "docs/TROUBLESHOOTING.md", "section": "Deploy fails", "content": "...", "score": 0.98 }]
}

or a compatible results / items array with the same fields.

Prompts

The prompt set lives in benchmarks/help-agent-benchmark.prompts.json and includes:

  • deployment failures
  • startup issues
  • auth and approval flow issues
  • GitHub connection problems
  • Supabase and schema issues
  • WebSocket connectivity problems
  • AWS permission issues
  • custom domain / self-hosting issues

Run it

npm run benchmark:help-agent

Optional candidate comparison:

MOSS_BENCHMARK_URL=http://localhost:8080/api/retrieve npm run benchmark:help-agent

Direct Moss mode (SDK):

MOSS_PROJECT_ID=<your_project_id> MOSS_PROJECT_KEY=<your_project_key> npm run benchmark:help-agent

If MOSS_BENCHMARK_URL is not set, the runner tries direct Moss SDK mode automatically.

Debug mode:

HELP_AGENT_BENCHMARK_DEBUG=1 npm run benchmark:help-agent

Keep the temporary Moss index (for dashboard inspection):

KEEP_MOSS_INDEX=1 HELP_AGENT_BENCHMARK_DEBUG=1 npm run benchmark:help-agent

Optional JSON output:

HELP_AGENT_BENCHMARK_OUT=benchmarks/help-agent-benchmark-results.json npm run benchmark:help-agent

Decision rule

Adopt Moss only if it clearly improves one of the following without adding too much complexity:

  • retrieval quality on real troubleshooting prompts
  • latency at the retrieval layer
  • scalability or maintainability of the retrieval stack

If Moss is only roughly equal, keep the current implementation.