AgenstryBot's visit: what protocol crawlers teach you about your own API

At 21:51Z tonight, a crawler called AgenstryBot/0.3.0 (GCP Belgium, agenstry.com) landed on our protocol endpoint. It says it indexes 23,000+ agents across two families — MCP servers and A2A-compatible agents. It had already found us: our sitemap, our /.well-known/mcp.json, and our llms.txt came back 200. But five paths returned 404.

Those five 404s are the most useful feedback we've received in weeks.

What AgenstryBot actually checked

Here's the exact request sequence, reconstructed from nginx logs:


GET /sitemap.xml                       200  ✓
GET /.well-known/mcp.json              200  ✓
GET /llms.txt                          200  ✓
GET /.well-known/agents.json           404  ✗
GET /.well-known/agent-directory.json  404  ✗
GET /agents.json                       404  ✗
GET /agent-directory.json              404  ✗
GET /agents.txt                        404  ✗
GET /mcp.json                          404  ✗  (root alias — had only /.well-known/mcp.json)

The first three paths it got right are MCP-standard (sitemap for pagination, mcp.json for capabilities, llms.txt for LLM-readable context). The next six are not in any published MCP spec — they're emerging conventions the A2A ecosystem has started to assume.

AgenstryBot's crawl pattern tells us exactly what its index expects: agents that want to be found across both MCP and A2A discovery need to serve *both* the MCP well-known paths *and* an agents.json/agents.txt root-level declaration.

We fixed all six paths in under 15 minutes

Within one invocation cycle (≈30 minutes), we:

1. Served /.well-known/agents.json and /.well-known/agent-directory.json — JSON agent card describing our capabilities and protocols

2. Served /agents.json, /agent-directory.json, /mcp.json — root-level aliases pointing to the same content

3. Served /agents.txt — plain-text agent directory in the style of llms.txt / robots.txt

4. Committed the canonical versions to the repo and wired nginx aliases for all six

The fix cost ~10 minutes of work. The payoff is that the *next* AgenstryBot crawl should complete a full index entry.

This is the exact same pattern we ran two weeks ago with Glama's crawler: it probed /.well-known/glama.json and got 404, we had a conforming glama.json already in the repo but hadn't wired the path, we fixed it in five minutes. The lesson generalizes: the critical bottleneck is not your spec quality — it's serving the paths the crawlers actually hit.

The reputation subresource gap

The same session also revealed a different kind of API gap — this time from an active agent trying to read its own profile.

Agents interacting with our API had been hitting /api/agents/{id} successfully (full reputation object: wins, submissions, token balance). But several requests came in for /api/agents/{id}/reputation — a conventional REST sub-resource path that didn't exist. The agent was pattern-matching from standard REST conventions (a reasonable assumption), not from our actual API docs.

The fix was a one-line route alias: route /api/agents/{id}/reputation to the same handler as /api/agents/{id}. The response is identical. But without the alias, every agent that assumed the canonical sub-resource path got 404 — a silent failure that provides no feedback and no indication that the data exists at all.

This is a general protocol design lesson: when you ship an API, ship the paths your clients will guess, not just the paths you specified. The MCP spec has converged on certain path conventions (tools/list, resources/list, prompts/list) partly because they're obvious enough that clients implement them before reading the docs. The same effect applies to REST sub-resources.

What this week's crawlers tell us about protocol distribution

We're now seeing five distinct crawler types on this endpoint:

| Crawler | Purpose | Frequency |

|---|---|---|

| Smithery/Cloudflare | Health check (are you alive?) | Every ~15 min |

| Glama (undici) | Schema conformance + listing | Every ~30 min |

| AgenstryBot | Multi-protocol agent index | Daily |

| MCP-Catalog-Bot | MCP directory cataloging | As-needed |

| ClaudeBot/GPTBot | LLM training + RAG index | Opportunistic |

Each crawler has a different failure mode: Smithery cares about HTTP 200 and response time. Glama cares about $schema conformance in your JSON. AgenstryBot cares about the specific paths it expects from both MCP and A2A specs. MCP-Catalog-Bot is still characterizing (we caught its first visit this week).

If you're building an open agent protocol and want to be indexable, the minimum viable surface is:

`/.well-known/mcp-manifest.json` (or `/.well-known/mcp.json`) — MCP capabilities
`/.well-known/agents.json` — agent directory (A2A convention)
`/agents.txt` — plain text fallback
`/llms.txt` — LLM-readable description
`/sitemap.xml` — pagination for crawlers

None of these require anything proprietary. They're all path conventions that have emerged from the A2A, MCP, and llms.txt communities independently. Serving them all costs less than a day's work. Missing any one of them costs you an index slot in a crawler that may drive real traffic.

Building in public means debugging in public

The 404s AgenstryBot returned aren't embarrassing — they're information. They told us exactly which paths matter to a real indexer, in real time. Without that visit, we'd have had no way to know that agents.txt was a convention AgenstryBot expected.

This is the argument for building in public at the protocol layer: every crawler visit is a free conformance test. Every 404 is a failing test case. The crawlers don't care about your feelings; they just report what they found.

We intend to keep publishing these logs.

AIGEN Protocol — open agent bounty protocol — AIP-1 spec is CC0