veda.ng
Open-Source Project

AI Discovery Standards

Every file, protocol, and technique used to make websites discoverable by AI systems, search engines, and autonomous agents. One command to set up everything.

View on GitHub
npx ai-discovery-standards

What it does

Run one command and generate 13 AI discovery files for any web project. The CLI tool auto-detects your public/ or static/ directory, asks for your site details, and creates every file you need. Existing files are never overwritten.

One-command setup

npx ai-discovery-standards generates all 13 files

25+ AI crawlers

Complete robots.txt with every known AI bot

AEO & GEO guides

Answer Engine and Generative Engine optimization

Claude Code skill

Slash command for AI-assisted setup

Discovery Files

Static files you place on your web server to communicate with AI crawlers and agents.

FilePurpose
robots.txt
Crawler access policies for 25+ AI bots
llms.txt
Curated content summary for LLMs
llms-full.txt
Full-text content for AI ingestion
ai.txt
AI usage permissions (training, citation, indexing)
ai.json
Structured content map for AI agents
brand.txt
Brand governance rules for AI systems
ai-plugin.json
ChatGPT plugin manifest
agents.json
A2A agent capability advertisement
security.txt
Vulnerability reporting (RFC 9116)
humans.txt
Team credits and technologies
sitemap.xml
URL index with metadata
manifest.json
PWA metadata and icons
browserconfig.xml
Windows tile configuration

AI Crawler Registry

All known AI crawler user-agent strings as of April 2026, organized by company.

OpenAI

GPTBotOAI-SearchBotChatGPT-User

Anthropic

ClaudeBotClaude-SearchBotClaude-User

Google

GooglebotGoogle-ExtendedGoogleOther

Perplexity

PerplexityBotPerplexity-User

Meta

meta-externalagentmeta-externalfetcher

Apple

ApplebotApplebot-Extended

Amazon

Amazonbot

ByteDance

BytespiderTikTokSpider

Others

CCBotcohere-aiCopilotBotYouBotDiffbot

FAQ

What is llms.txt?
A Markdown file at /llms.txt that gives LLMs a curated summary of your site. It includes a title, a one-paragraph description, and organized links to your key pages. Created by Jeremy Howard (Answer.AI) in 2024. Adopted by Anthropic, Stripe, Vercel, and Cloudflare.
What is the difference between AEO and GEO?
AEO (Answer Engine Optimization) targets question-answer extraction by AI systems like ChatGPT and Perplexity. GEO (Generative Engine Optimization) targets citation rate and "Share of AI Voice" across all AI platforms. AEO is about being the answer. GEO is about being the cited source.
Which AI crawlers should I allow?
Separate training bots (GPTBot, ClaudeBot, Google-Extended) from search bots (OAI-SearchBot, Claude-SearchBot, PerplexityBot). Blocking training bots prevents your content from being absorbed into model weights. Blocking search bots removes you from AI-generated answers entirely.
What is brand.txt?
A plain-text file that tells AI systems how to represent your brand: correct name capitalization, preferred terminology, prohibited terms, tone guidance, and competitor disambiguation. Reduces hallucinations about your brand identity.
What is ai.txt?
A plain-text file declaring what AI systems may do with your content: training, indexing, citation, or summarization. Works alongside robots.txt but with AI-specific granularity. Not yet standardized but gaining adoption.

Get started

$ npx ai-discovery-standards

Generates all 13 discovery files interactively. Auto-detects your project structure.

Full documentation on GitHub