How to Get Your Business Cited by ChatGPT, Perplexity, and Claude

You asked ChatGPT who the best person is for your kind of work. A few competitors came up. But you didn't.

It's a quietly destabilising moment. Your site is live. You've done the work. You've had clients. And yet, when the question gets asked of an AI engine, you simply don't exist.

The question most people ask next is: what should I write? But getting cited by AI engines — and getting your brand cited by ChatGPT — isn't primarily a content problem. It's an infrastructure problem. The content marketers writing about this topic will tell you to publish more, use the right keywords, and produce "thought leadership". What they don't tell you (because most of them have never touched the infrastructure layer) is that none of that content will be cited if the foundations underneath aren't right.

This is what being invisible to AI search actually looks like, and what it takes to fix it.

Why AI engines cite some sources and not others

AI engines don't work like Google. Google ranks pages. AI engines extract answers. The distinction matters because a page can rank reasonably well on Google while remaining completely unciteable by an AI—if that page isn't structured for extraction.

When ChatGPT, Perplexity, Gemini, or Claude generates a response to a question, they're looking for sources that meet a specific set of conditions:

the content can be crawled,
the entity behind it can be identified,
the answer is direct and well-formed enough to be lifted and attributed, and
the information is corroborated by sources outside of the site itself.

Miss any of those conditions, and you won't be cited, no matter how good your service is or how long you've been in business. A business can be genuinely excellent and completely unknown to an AI because the web hasn't validated it yet.

The businesses that appear in ChatGPT answers consistently aren't necessarily better than you (which is the part that hurts). What they have done is build the infrastructure that makes them citable.

The technical foundation: what AI crawlers actually need

Before any content question matters, there's a prior question: can AI engines actually access your site?

The major AI crawlers—GPTBot (ChatGPT), ClaudeBot (Claude), PerplexityBot, OAI-SearchBot, Google-Extended—all operate differently from the human browser. None of the major AI crawlers currently render JavaScript. This means that if your site serves content via client-side rendering—which is where the page loads blank and JavaScript fills it in—those crawlers may be reading an empty page. Every word you've written, every service you offer, and every answer you've provided: invisible.

This is one of the most common and easily overlooked AI citation blockers. It doesn't show up in your analytics. You don't see it on the frontend. There's no error. The crawler just doesn't find what a human visitor would see.

Beyond rendering, your robots.txt file must explicitly permit each crawler. Blocking them via a blanket disallow—even accidentally—means your content is entirely excluded from their training passes and retrieval pipelines. Alongside that, a clean XML sitemap helps crawlers discover every page you want indexed, and an llms.txt file at your root lets you signal how you'd prefer your content to be attributed and used.

Getting the door open is the first requirement. If AI can't reach you, nothing else matters. Your AI tech stack doesn't begin with content — it begins with whether crawlers can reach you at all.

Schema markup and entity signals

Once crawlers can reach your site, the next question is whether they can identify who you are.

AI engines don't just read text. They build a model of entities—people, businesses, services, and topics—and they cite sources they can confidently attribute. If your site doesn't give them the structured signals to understand who you are, what you do, and where you operate, they'll either skip you entirely or use a vague, unattributed paraphrase of something you wrote.

Schema markup is how you provide those signals in a machine-readable format. The minimum viable entity graph for a business that wants AI citation includes:

A Person entity with your name, job title, areas of expertise, and sameAs links to verified external profiles
An Organization or ProfessionalService entity with your business name, URL, services, and area served
Service schema on each individual offering
FAQPage schema on every content page
Article schema on every piece of content, with an author reference back to your Person entity

These aren't decorative. They tell AI engines: this is a real business, this is the person behind it, these are the things they know, this is what they offer, and this is what else they're connected to. Without them, the engine is guessing. And when it's unsure, it uses a source it is sure about instead.

Entity signals also need to be consistent across the web, not just on your own site. If your name, title, and business description differ across your LinkedIn profile, your About page, and any external mentions, those signals conflict. When they conflict, confidence drops. When confidence drops, you don't get cited by AI.

Answer-first content structure: the format that gets extracted

The fourth layer is content, but not content in the way most marketing advice describes it. The question isn't how much you should publish. It's whether what you've published is structured in a way that can be extracted.

AI citation strategy begins at the sentence level. AI engines extract answers. That means they're looking for content where the answer comes first, not buried three paragraphs in, hedged with setup, or written to lead the reader gently toward a conclusion. The passage that gets cited is the one that answers the question in the first sentence.

Research into how large language models cite sources consistently shows that citations cluster toward the beginning of content. The introduction does disproportionate citation work, which means if you start an article by rambling, it's immediately invisible. Answer-first structure isn't a stylistic preference. It's the difference between being cited and not.

For each page or article you want to be cited for, write a direct 40–60-word answer to the question the page addresses. Lead every section with a direct, citable statement. Use FAQ schema to provide pre-chunked question-and-answer pairs that AI engines can extract cleanly.

Marketing language is invisible to AI. The moment you're throwing around terms like "industry-leading," "comprehensive solutions," and "we're passionate about" is the moment you're passed over. Specificity is what gets cited. The AI isn't reading for brand feel. It's reading for extractable, attributable answers to real questions.

llms.txt, robots.txt, and AI crawler access

These three files form the access layer of your AI visibility infrastructure.

Your robots.txt file should explicitly name and allow each major AI crawler. A blanket Allow: / permits most bots, but explicitly naming GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, Applebot-Extended, Google-Extended, Diffbot, and ChatGPT-User removes ambiguity and ensures overly broad crawl restrictions don't accidentally catch you.

Your sitemap.xml should be dynamically generated, include every page you want crawled, and be referenced in robots.txt. AI crawlers discover content through sitemaps. If a page isn't in your sitemap, it may not be found.

llms.txt is a plain-text markdown file placed at the root of your domain. It provides AI systems with a human-readable index of your key pages, a short description of each, and your preferred attribution. It's an emerging standard, and adoption is still limited, but the cost of adding it is low, and it signals intent clearly. Treat it as a preference statement rather than a ranking lever.

None of these files requires technical heroics. They're fast to implement, and they remove blockers that are otherwise completely silent.

Most access issues are completely invisible — no error, no warning, just absence. The Infrastructure Audit makes the access layer the first thing we check.See the audit →

What your website platform decides for you (and what it doesn't)

Most founders don't write their own robots.txt file. Their platform generates one, and that default decision has an invisible price tag with direct consequences for AI visibility.

Kajabi generates a robots.txt that allows all crawlers by default, with no native editor available. If you're on Kajabi, your access layer is open, but you cannot customise it.

GoHighLevel offers a robots.txt editor at the domain level, accessed via Settings → Domains → Manage. The platform generates a basic file by default, and custom directives can be added manually. The editor exists, but no AI-specific guidance ships with it, so most users never configure it.

Squarespace allows AI crawlers by default, with an opt-in toggle under Settings → Crawlers to block them. The toggle treats all AI bots the same; you can't permit citation crawlers while blocking training crawlers. You also can't edit the file directly.

Wix offers a robots.txt editor in its SEO settings, but no one-click AI crawler control. Adding the right directives requires knowing which crawler names to include and how to format the rules correctly. Most users won't do this.

Webflow allows all search and AI crawlers by default and offers a proper robots.txt editor, plus a Content-Signal HTTP header that lets you specify AI training versus citation access separately.

WordPress (self-hosted) generates a basic open robots.txt by default. AI-specific directives require manual addition, either through an SEO plugin's robots.txt editor or directly in the file.

Managed WordPress hosts vary. Some, including SiteGround, block AI training crawlers by default at the server layer, without necessarily making that visible in your robots.txt. If your managed host routes traffic through Cloudflare, there's an additional layer to check. In July 2025, Cloudflare switched to blocking AI bots by default for all new domains added from that point. You could correctly configure your robots.txt, and AI crawlers will still be stopped at the Cloudflare layer, before your server even sees the request.

The pattern here is consistent: most all-in-one web builders, no-code and low-code platforms make the access decision for you, give you limited visibility into what they've decided, and provide little or no control over the result. For non-technical founders, that's the infrastructure blind spot most businesses don't discover until they're already invisible.

The content that gets cited vs the content that gets ignored

The gap between content that gets cited by AI and content that gets ignored usually comes down to three things: structure, specificity, and corroboration.

Structure means answer-first paragraphs, FAQ schema, clean HTML that doesn't rely on JavaScript to render, and clear topical focus on each page. Thin pages that vaguely reference a topic without directly addressing it provide nothing for an AI to extract.

Specificity means writing to the actual questions your buyers ask. Not the polished version of those questions, but the real ones. "Why doesn't my website come up on ChatGPT?" is a real question. "Why is AI search important for modern businesses?" is not what anyone types into a search engine or asks an AI. Write to the real question. Use the real language.

Corroboration means your name and business appear in sources the AI already treats as trustworthy. Think third-party publications, podcast transcripts, directory listings, and industry platforms. If your brand is only mentioned on your own website, that's a weak signal. AI engines weigh multi-source corroboration heavily. A business cited by one high-authority external source is more likely to appear in AI search results than a business with ten pages of well-structured content and no external validation.

These three things work together. Structure makes your content extractable. Specificity makes it relevant. Corroboration makes it trustworthy. All three need to be present.

Structure, specificity, and corroboration all need to be in place. The Infrastructure Audit identifies which one is letting you down — and what to fix first.See the audit →

How do I get my website cited by ChatGPT?

To get your website cited by ChatGPT, you need to address the infrastructure layer first: ensure AI crawlers can access your site (no JavaScript rendering blocking the crawl, robots.txt explicitly allowing GPTBot and OAI-SearchBot), implement schema markup that identifies your business as a named entity, structure your content with direct answer-first paragraphs, and build a presence in external sources the model already trusts. Content quality matters, but it only becomes citable once the foundations are in place.

Why isn't my business showing up in AI search results?

The most common reasons a business is invisible in AI search results are: AI crawlers are blocked or can't read the site (JavaScript rendering issues or robots.txt misconfigurations), no schema markup to identify the business as a recognised entity, content that isn't structured for extraction (answers buried in narrative prose rather than leading sections), and no external corroboration (the business is only mentioned on its own website). Most of these are infrastructure problems, not content problems.

What is the difference between ranking on Google and being cited by AI?

Google ranks pages based on relevance, authority, and technical signals and returns a list of links for the user to click. AI engines like ChatGPT and Perplexity extract answers and synthesise a response directly, with no click required. To rank on Google, you need good technical SEO and strong content. To be cited by AI, you additionally need schema markup, answer-first content structure, explicit AI crawler access, and multi-source entity signals that let the model confidently attribute the information to you.

Does schema markup help with AI citations?

Yes. Schema markup—specifically JSON-LD structured data—helps AI engines identify who you are, what your business does, and what each piece of content is about. A Person entity, a ProfessionalService entity, FAQPage schema, and Article schema with author attribution all give AI engines machine-readable signals they can use to identify and cite your content with confidence. Without schema, the engine must infer these things from prose alone, which reduces the likelihood of accurate attribution.

How do I know if AI engines can crawl my website?

Check three things. First, review your robots.txt file to confirm it explicitly allows GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, and Google-Extended. Second, check how your site renders: if your content is loaded by JavaScript after the initial page load, AI crawlers may be reading a blank page. Disable JavaScript in your browser and see what remains visible. Third, confirm you have a clean XML sitemap referenced in robots.txt, so crawlers can discover every page you want indexed.

→ The Infrastructure Audit

→ What is AEO?

→ What is Discoverability?

→ AI Training vs AI Citation: The Crawler Decision Most Businesses Get Wrong

›Sources

Overview of OpenAI crawlers — GPTBot, OAI-SearchBot, ChatGPT-User, OpenAI Developer Documentation
Publishers and Developers FAQ, OpenAI Help Centre
Anthropic crawler documentation — ClaudeBot, Claude-User, Claude-SearchBot, Anthropic Support
Cloudflare changes how AI crawlers scrape the internet — permission-based approach and AI Crawl Control, Cloudflare Press Release, July 2025
Anthropic's Claude bots make robots.txt decisions more granular, Search Engine Journal, February 2026
Anthropic clarifies how Claude bots crawl sites and how to block them, Search Engine Land, February 2026
Set robots.txt rules, Webflow Help Centre
Request that AI models exclude your site, Squarespace Help Centre
A practical guide to technical SEO for GoHighLevel websites, GoHighLevel, March 2026