Save 15% with code SAVE15

Robots.txt & AIPREF

Virtual robots.txt management with AIPREF Content-Usage directives for granular AI content control.

Last updated Feb 21, 2026

How virtual serving works

CitedPro manages your robots.txt entirely through WordPress, without writing a physical file to disk. It hooks into WordPress's robots_txt filter at PHP_INT_MAX priority (the highest possible), ensuring CitedPro's output takes precedence over other plugins that modify robots.txt.

This virtual approach eliminates common problems with physical robots.txt files: permission errors, conflicts between plugins, and stale content that does not match your current settings.

Physical file migration

If a physical robots.txt file already exists in your WordPress root directory, CitedPro handles it gracefully. On first load, it reads the existing file, preserves any non-CitedPro rules as custom content, and begins serving robots.txt virtually. The physical file is no longer needed once CitedPro is managing the output.

Tip

After confirming CitedPro is serving your robots.txt correctly, you can safely delete the physical robots.txt file from your server. CitedPro will continue serving the content virtually.

Robots.txt output structure

CitedPro builds your robots.txt with several distinct sections, each clearly marked with comments:

  1. CitedPro banner: An ASCII art header identifying the output as CitedPro-managed
  2. AI discovery file pointers: Comment-based references to your llms.txt and llms-full.txt files
  3. AIPREF directives: Content-Usage preferences per the IETF standard
  4. Blocked bot rules: User-agent and Disallow directives for bots you have chosen to block
  5. Custom rules: Any additional directives you have added via the custom rules editor
  6. Sitemap reference: A pointer to your XML sitemap (when enabled)

AIPREF Content-Usage directives

AIPREF is an IETF-proposed standard that gives website owners granular control over how AI systems use their content. Unlike simple User-agent/Disallow rules that either block or allow a bot entirely, AIPREF lets you specify exactly what types of AI usage you permit.

Navigate to CitedPro → Robots to configure these directives. Toggle AIPREF on, then set each directive to your preference.

DirectiveWhat it controlsOptions
processingWhether AI systems may process (read and analyze) your content at allAllow / Disallow / Unstated
train-aiWhether your content may be used to train any AI or machine learning modelsAllow / Disallow / Unstated
train-genaiWhether your content may be used specifically to train generative AI models (LLMs, image generators, etc.)Allow / Disallow / Unstated
searchWhether your content may be used in AI-powered search results and answer enginesAllow / Disallow / Unstated

When set to Unstated, the directive is omitted from robots.txt, leaving the decision to each AI system's default behavior.

Example AIPREF output

# AIPREF Content-Usage Directives
Content-Usage: processing=allow
Content-Usage: train-ai=disallow
Content-Usage: train-genai=disallow
Content-Usage: search=allow

This example allows AI systems to process your content and include it in search results, but prohibits using it for AI training purposes. This is a common configuration for businesses that want AI visibility without contributing to training datasets.

Tip

AIPREF gives you finer control than simply blocking bots. You can allow AI-powered search engines to cite your content while blocking your content from being used to train AI models. This is the best of both worlds for most businesses.

Blocked bot rules

When you block a bot via CitedPro → Bots or the Robots tab, CitedPro adds a User-agent + Disallow: / rule for that bot. These rules tell well-behaved crawlers to stay away from your entire site.

# Blocked Bots
User-agent: SomeBot
Disallow: /

User-agent: AnotherBot
Disallow: /

Keep in mind that robots.txt is a suggestion, not an enforcement mechanism. Polite bots respect it; malicious scrapers may ignore it. For enforcement-level blocking, CitedPro also blocks matched user agents at the PHP level with a 403 response.

Custom rules editor

For additional directives beyond what CitedPro manages automatically, enable the custom rules editor:

  1. Go to CitedPro → Robots
  2. Toggle Custom Rules Editor to enabled
  3. Enter your custom directives in the text area
  4. Custom rules are appended after the CitedPro-managed sections

This is useful for adding rules for specific bots or directories that are outside CitedPro's scope, such as blocking crawlers from /wp-admin/ or /private/ paths.

Live preview

The Robots tab includes a live preview panel that shows you exactly what your robots.txt output will look like before saving. As you toggle AIPREF directives, block bots, or add custom rules, the preview updates in real time.

This lets you verify the output is correct before it goes live. The preview shows the complete robots.txt content, including the CitedPro banner, AIPREF directives, blocked bot rules, and custom rules.

Cache and refresh schedule

CitedPro caches the robots.txt output and refreshes it hourly via a WordPress cron job. This keeps the output up to date without regenerating on every request.

Changes to your settings (AIPREF toggles, blocked bots, custom rules) clear the cache immediately, so updates take effect right away. The hourly cron is a safety net to ensure the cache never gets stale.

Configuration walkthrough

Here is the recommended setup for most businesses:

  1. Go to CitedPro → Robots
  2. Toggle AIPREF Directives to enabled
  3. Set processing to Allow (lets AI systems read your content)
  4. Set search to Allow (lets AI search engines cite you)
  5. Set train-ai and train-genai to your preference (Allow or Disallow)
  6. Review the live preview to confirm the output
  7. Optionally enable the custom rules editor for additional directives

Verifying your robots.txt

  1. Visit yoursite.com/robots.txt in your browser
  2. Confirm the CitedPro banner is present at the top
  3. Check that AIPREF directives match your settings
  4. Verify AI discovery file pointers (llms.txt and llms-full.txt) are listed
  5. Confirm blocked bots show their Disallow: / rules

Important

If another plugin or your hosting environment is also generating robots.txt, you may see conflicts. CitedPro's PHP_INT_MAX priority means it runs last, but check your output to make sure it contains everything you expect.