Robots.txt & AIPREF
Virtual robots.txt management with AIPREF Content-Usage directives for granular AI content control.
Last updated Feb 21, 2026
How virtual serving works
CitedPro manages your robots.txt entirely through WordPress, without writing a physical file to disk. It hooks into WordPress's robots_txt filter at PHP_INT_MAX priority (the highest possible), ensuring CitedPro's output takes precedence over other plugins that modify robots.txt.
This virtual approach eliminates common problems with physical robots.txt files: permission errors, conflicts between plugins, and stale content that does not match your current settings.
Physical file migration
If a physical robots.txt file already exists in your WordPress root directory, CitedPro handles it gracefully. On first load, it reads the existing file, preserves any non-CitedPro rules as custom content, and begins serving robots.txt virtually. The physical file is no longer needed once CitedPro is managing the output.
Tip
After confirming CitedPro is serving your robots.txt correctly, you can safely delete the physical robots.txt file from your server. CitedPro will continue serving the content virtually.
Robots.txt output structure
CitedPro builds your robots.txt with several distinct sections, each clearly marked with comments:
- CitedPro banner: An ASCII art header identifying the output as CitedPro-managed
- AI discovery file pointers: Comment-based references to your
llms.txtandllms-full.txtfiles - AIPREF directives: Content-Usage preferences per the IETF standard
- Blocked bot rules: User-agent and Disallow directives for bots you have chosen to block
- Custom rules: Any additional directives you have added via the custom rules editor
- Sitemap reference: A pointer to your XML sitemap (when enabled)
AIPREF Content-Usage directives
AIPREF is an IETF-proposed standard that gives website owners granular control over how AI systems use their content. Unlike simple User-agent/Disallow rules that either block or allow a bot entirely, AIPREF lets you specify exactly what types of AI usage you permit.
Navigate to CitedPro → Robots to configure these directives. Toggle AIPREF on, then set each directive to your preference.
| Directive | What it controls | Options |
|---|---|---|
processing | Whether AI systems may process (read and analyze) your content at all | Allow / Disallow / Unstated |
train-ai | Whether your content may be used to train any AI or machine learning models | Allow / Disallow / Unstated |
train-genai | Whether your content may be used specifically to train generative AI models (LLMs, image generators, etc.) | Allow / Disallow / Unstated |
search | Whether your content may be used in AI-powered search results and answer engines | Allow / Disallow / Unstated |
When set to Unstated, the directive is omitted from robots.txt, leaving the decision to each AI system's default behavior.
Example AIPREF output
# AIPREF Content-Usage Directives
Content-Usage: processing=allow
Content-Usage: train-ai=disallow
Content-Usage: train-genai=disallow
Content-Usage: search=allowThis example allows AI systems to process your content and include it in search results, but prohibits using it for AI training purposes. This is a common configuration for businesses that want AI visibility without contributing to training datasets.
Tip
AIPREF gives you finer control than simply blocking bots. You can allow AI-powered search engines to cite your content while blocking your content from being used to train AI models. This is the best of both worlds for most businesses.
Blocked bot rules
When you block a bot via CitedPro → Bots or the Robots tab, CitedPro adds a User-agent + Disallow: / rule for that bot. These rules tell well-behaved crawlers to stay away from your entire site.
# Blocked Bots
User-agent: SomeBot
Disallow: /
User-agent: AnotherBot
Disallow: /Keep in mind that robots.txt is a suggestion, not an enforcement mechanism. Polite bots respect it; malicious scrapers may ignore it. For enforcement-level blocking, CitedPro also blocks matched user agents at the PHP level with a 403 response.
Custom rules editor
For additional directives beyond what CitedPro manages automatically, enable the custom rules editor:
- Go to CitedPro → Robots
- Toggle Custom Rules Editor to enabled
- Enter your custom directives in the text area
- Custom rules are appended after the CitedPro-managed sections
This is useful for adding rules for specific bots or directories that are outside CitedPro's scope, such as blocking crawlers from /wp-admin/ or /private/ paths.
Live preview
The Robots tab includes a live preview panel that shows you exactly what your robots.txt output will look like before saving. As you toggle AIPREF directives, block bots, or add custom rules, the preview updates in real time.
This lets you verify the output is correct before it goes live. The preview shows the complete robots.txt content, including the CitedPro banner, AIPREF directives, blocked bot rules, and custom rules.
Cache and refresh schedule
CitedPro caches the robots.txt output and refreshes it hourly via a WordPress cron job. This keeps the output up to date without regenerating on every request.
Changes to your settings (AIPREF toggles, blocked bots, custom rules) clear the cache immediately, so updates take effect right away. The hourly cron is a safety net to ensure the cache never gets stale.
Configuration walkthrough
Here is the recommended setup for most businesses:
- Go to CitedPro → Robots
- Toggle AIPREF Directives to enabled
- Set
processingto Allow (lets AI systems read your content) - Set
searchto Allow (lets AI search engines cite you) - Set
train-aiandtrain-genaito your preference (Allow or Disallow) - Review the live preview to confirm the output
- Optionally enable the custom rules editor for additional directives
Verifying your robots.txt
- Visit
yoursite.com/robots.txtin your browser - Confirm the CitedPro banner is present at the top
- Check that AIPREF directives match your settings
- Verify AI discovery file pointers (
llms.txtandllms-full.txt) are listed - Confirm blocked bots show their
Disallow: /rules
Important
If another plugin or your hosting environment is also generating robots.txt, you may see conflicts. CitedPro's PHP_INT_MAX priority means it runs last, but check your output to make sure it contains everything you expect.