AI Crawler Study: What 60+ Tests Across 6 LLMs Reveal

At Writesonic, we track how AI models talk about brands across 500M+ conversations. We see which companies get cited, which get ignored, and what gets said about them.

But the more data we analyzed, the more one question kept nagging at us: everyone’s racing to optimize for AI search, yet nobody’s actually tested whether AI can even read what’s on their website.

So we tested it.

Our team built a page with 62 unique hidden codes, each planted in a different location in the HTML. Then we asked ChatGPT, Claude, Gemini, DeepSeek, Grok, and Copilot to visit the page and report what they found.

Some of what we found was expected. Most of it wasn’t.

Three tiers of AI crawler capability: Tier 1 full browser rendering by Copilot with 21 out of 62, Tier 2 headless JS browsers DeepSeek with 17 and Grok with 22 out of 62, Tier 3 HTML-only parsers ChatGPT with 13, Gemini with 16, and Claude with 18 out of 62

Key Takeaways

Your metadata is invisible to AI. JSON-LD, meta descriptions, OG tags scored 0/6. The <title> tag is the only metadata that reliably reaches AI assistants (5/6). Google recommends JSON-LD as the preferred structured data format. Google’s own Gemini can’t read it.
Half of AI assistants don’t execute JavaScript. ChatGPT, Claude, and Gemini fetch raw HTML and parse it as text. The three that do run JS (Copilot, DeepSeek, Grok) give you between 500ms and 3 seconds before moving on. Nobody waits 5 seconds.
CSS-hidden content is fully visible to AI. CSS-generated content is invisible. Every accordion, tab, and collapsed FAQ you’ve hidden? AI reads all of it. But ::before and ::after pseudo-elements? 0/6.
The crawlers fall into three distinct capability tiers. Full browser rendering (Copilot), headless JS browsers (DeepSeek, Grok), and HTML-only parsers (ChatGPT, Claude, Gemini). The tier determines almost everything about what each AI can see.

Here’s what we found.

Three tiers of AI crawler capability

We grouped the six AI crawlers by how they fetch and process pages. The grouping explained almost every result.

Tier	Who	What they do	JS patience	How many codes they found
Full browser	Copilot (Diffbot)	Runs a complete rendering engine. JS, Shadow DOM, iframes, CSS functions.	~500ms	21 out of 62
Headless browsers	DeepSeek, Grok	Real browsers without a screen. JS, network requests, Web Workers.	~2s, ~3s	17, 22 out of 62
HTML-only	ChatGPT, Gemini, Claude	Fetch raw HTML. Convert it to Markdown. No JavaScript at all.	None	13, 16, 18 out of 62

Individual crawler profiles

ChatGPT — Most popular, most basic. Fetches raw HTML, converts to Markdown. Misses alt text, URL paths, and og:title that Claude catches. But it’s the lowest common denominator: if your page works for ChatGPT, it works for everything.

Claude — Best HTML parser in the group. Reads og:title (one of two), preserves alt text and full URLs. Its converter treats more attributes as content. Rewards clean HTML practices.

Gemini — Google crawls pages with a full rendering engine, but when Gemini reads a URL in conversation, it gets the same stripped HTML as ChatGPT and Claude. The search index and the AI assistant are two separate pipelines, even at Google. JSON-LD still helps Google Search discover your content, but Gemini ignores it when generating answers.

DeepSeek — Strong JS execution (2s window), full network capabilities (fetch, XHR, POST), Web Workers, dynamic imports. But its converter strips almost everything else: misses <title> (the only AI that does), alt text, URL paths, og:title. It can do dynamic content.

Grok — Highest score (22/62) and most patient (3s), occasionally, it reads <noscript> , it can distinguish between CSS and UI. There can be two different web crawlers, both client side and server side in this model.

Copilot — Second highest score (21/62) only 500ms patience. Only crawler that reads Shadow DOM, inline iframes (srcdoc), and CSS content:attr(). Built for structurally complex static sites. If your site loads content from APIs after initial render, Copilot is one of the worst.

Keep these tiers in mind. They explain every finding below.

Your metadata is invisible to AI

I need to say this clearly because a lot of current GEO advice gets it wrong: the metadata on your website does not reach AI assistants. Almost none of it.

To be clear, metadata is still valuable for traditional SEO. Meta descriptions help your click-through rate in Google Search. JSON-LD helps Google understand your content. But when a user asks ChatGPT or Gemini to read your page, none of that metadata reaches the language model.

Metadata visibility scores across 6 AI crawlers: 9 of 11 metadata elements score zero, only the title tag reliably survives

11 metadata elements. 9 of them scored zero. The <title> tag is the only one that reliably works. og:title is a distant second at just 2 out of 6.

The reason: every AI crawler runs an HTML-to-Markdown conversion before content reaches the language model. That converter strips the entire <head> section. The <title> survives because most converters promote it into the Markdown as a heading. Everything else is gone.

AI gives your JavaScript between 500ms and 3 seconds

We loaded content via setTimeout at six intervals.

Delay	DeepSeek	Grok	Copilot
0ms	✅	✅	✅
500ms	✅	✅	✅
1 second	✅	✅
2 seconds	✅	✅
3 seconds		✅
5 seconds

JavaScript patience window for AI crawlers: ChatGPT Claude and Gemini run no JS, Copilot waits 500ms, DeepSeek 2 seconds, Grok 3 seconds, nobody waits 5

ChatGPT, Claude, and Gemini don’t run JavaScript when they fetch pages live. If your React app renders everything client-side, these three AI crawlers see <div id="root"></div> and nothing else.

Copilot runs the most advanced engine but waits about 500 milliseconds. DeepSeek gives you 2 seconds. Grok tops out at 3. Nobody catches 5.

And no AI scrolls. IntersectionObserver at 2,000px below the fold: 0 out of 6. Lazy-loaded content is invisible to every AI.

AI reads your hidden content but not your CSS-generated content

CSS visibility comparison: display none and visibility hidden score 5-6 out of 6, while CSS pseudo-elements before and after score 0 out of 6 across all AI crawlers

Content hidden with CSS? AI reads all of it. Content generated by CSS? AI reads none of it.

CSS hiding methods	Score
display:none	5/6
visibility:hidden	6/6
opacity:0	6/6
JS toggle (hamburger menu)	5/6

CSS generation methods	Score
::after pseudo-element	0/6
::before pseudo-element	0/6
content:attr()	1/6 (Copilot only)
Class names, IDs, style values	0/6 each

HTML-only crawlers read source code. They don’t render CSS. display:none is a styling instruction they never execute. The text is in the HTML, so they see it.

::before and ::after content only exists after the browser computes styles. It’s never in the HTML. Crawlers don’t get there.

Half of AI crawlers disguise their identity

AI	What it sends	Can you block it?
ChatGPT	`ChatGPT-User/1.0`	Yes
Claude	`Claude-User/1.0`	Yes
Gemini	Generic “Google”	Risky
DeepSeek	Looks like Firefox (zh-CN)	No
Grok	100+ rotating proxy IPs	No
Copilot	Looks like regular Chrome	No

AI crawler identification methods: ChatGPT and Claude identify themselves transparently, while DeepSeek Grok and Copilot disguise as regular browsers

ChatGPT and Claude are transparent. They identify themselves, publish IP ranges, respect robots.txt.

DeepSeek disguises itself as Firefox. Grok routes through 100+ proxy IPs. Copilot mimics Chrome with no bot identifier.

If you block ChatGPT-User and Claude-User in robots.txt, you’ve blocked two out of six.

`<style>` and `<script>` blocks are deleted entirely

0 out of 6 across every test. CSS comments, CSS variables, JS comments, JS variables — the converter strips these blocks before the LLM sees them.

Any data in JavaScript variables, config objects, or analytics tags is invisible unless it’s also rendered into the DOM.

URL parsing and alt text vary wildly

Alt text on images: only Claude, Gemini, and Copilot read it. 3 out of 6.

Full URL preservation (paths + query params): Claude, Gemini, and Copilot. ChatGPT and DeepSeek only see anchor text.

The full scorecard

Everything we tested. All 62 elements. All 6 AIs.

Visible text

Static body text is the one thing every crawler agrees on.

Test	ChatGPT	Claude	Gemini	DeepSeek	Grok	Copilot
Static HTML text	✅	✅	✅	✅	✅	✅
Schema.org microdata	✅	✅	✅	✅	✅	✅
Inline SVG text	✅	✅	✅		✅	✅
Nav link text	✅	✅	✅			✅
Web component slot	✅	✅	✅	✅	✅	✅

Hidden HTML

Alt text only reaches 3 out of 6 crawlers. Comments, data attributes, and input values reach none.

Test	ChatGPT	Claude	Gemini	DeepSeek	Grok	Copilot
Noscript fallback	✅	✅	✅		✅
Template element	✅	✅	✅	✅
HTML comment
Data attribute
Hidden input
Visible text input value
Image alt text		✅	✅			✅
onclick handler

Head parsing

9 out of 11 elements scored zero. Only <title> reliably survives the HTML-to-Markdown conversion.

Test	ChatGPT	Claude	Gemini	Grok	Copilot
Custom meta tag
og:title		✅		✅
og:description
og:image
og:url
og:type
Title tag	✅	✅	✅	✅	✅
Meta description
Meta keywords
JSON-LD
Link preload

CSS and DOM hidden

Test	ChatGPT	Claude	Gemini	DeepSeek	Grok	Copilot
display:none	✅	✅	✅	✅		✅
visibility:hidden	✅	✅	✅	✅	✅	✅
opacity:0	✅	✅	✅	✅	✅	✅
JS toggle nav	✅	✅	✅		✅	✅

CSS rendering

Only Copilot scored here, and only on one test.

Test	ChatGPT	Claude	Gemini	DeepSeek	Grok	Copilot
CSS ::after
CSS ::before
CSS content:attr()						✅
Inline style value
Class name
ID value

Style and script text

The converter deletes <style> and <script> blocks before the LLM sees them. 0/6 across every test.

Test	ChatGPT	Claude	Gemini	DeepSeek	Grok	Copilot
CSS comment
CSS rule value
CSS variable
JS comment
JS IIFE variable
JS variable (no DOM)

JavaScript execution

The cutoff is sharp. After 3 seconds, every crawler has moved on.

Test	DeepSeek	Grok	Copilot
JS immediate (0ms)	✅	✅	✅
JS 500ms	✅	✅	✅
JS 1s	✅	✅
JS 2s	✅	✅
JS 3s		✅
JS 5s

JS + Network

Test	ChatGPT	Claude	DeepSeek	Grok	Copilot
Fetch API			✅	✅
XMLHttpRequest			✅	✅
POST request			✅	✅
Redirect chain	✅	✅	✅	✅	✅

JS + Advanced DOM

Shadow DOM, iframes, and IntersectionObserver are nearly invisible. Only Copilot and the headless browsers pick up fragments.

Test	DeepSeek	Grok	Copilot
Shadow DOM (open)			✅
iframe src
iframe srcdoc			✅
Dynamic ES import	✅	✅
Web Worker	✅	✅	✅
IntersectionObserver

URL parsing

Claude, Gemini, and Copilot preserve full URLs. ChatGPT and DeepSeek only keep anchor text.

Test	Claude	Gemini	Grok	Copilot
Image src URL	✅	✅		✅
Anchor href URL	✅	✅		✅
Query parameter	✅	✅	✅	✅

What to do about it

You can’t optimize for all six equally. Work the tiers in order:

Optimize for Tier 3 (ChatGPT, Claude, Gemini).

These are the most popular AI assistants. Optimizing for HTML-only parsers means body text, heading structure, descriptive <title> tags, server-side rendering. Anything that works for Tier 3 works for everyone.

Don’t break Tier 2 (DeepSeek, Grok).

If your JS-rendered content takes longer than 2 seconds to appear in the DOM, DeepSeek misses it. Longer than 3 seconds, Grok misses it too. Test with a setTimeout, that’s literally what we did.

Ignore Tier 1 edge cases.

Copilot reading Shadow DOM and CSS content:attr() is interesting data. It’s not worth restructuring your site for one crawler.

The universal move: server-side rendering.

SSR is the only approach that gets full coverage across all three tiers. No timing gambles, no dependency on crawler JS engines.

If you’re an engineering team

Ship server-side rendered content. If you must use client-side rendering, keep Time to Interactive under 2 seconds. Don’t lazy-load content that matters for AI visibility. Don’t fetch content in the browser that you need AI to see, render it on the server before the page is served. Test with JavaScript disabled, that’s what ChatGPT and Claude see.

Note: Gemini can use google indexed data to create the answer. In that case, some client side data would be used in generating the answer.

If you’re running content or SEO

Your <title> tag is now your single most valuable metadata field for AI. Meta descriptions and JSON-LD still help Google Search, but they do nothing for AI assistants. Important information needs to live in body tag. Accordions and tabs are fine, there is no need to restructure your UX.

For a full on-page GEO checklist, see our on-page GEO guide.

If you’re managing AI access

robots.txt blocks two out of six. If that’s not enough, you’ll need WAF-level behavioral detection for the stealth crawlers. It’s an ongoing effort, not a one-time config.

You know what AI can see. Now track what it says.

This study shows what AI crawlers can technically read on your website. But can read ≠ does read.

Writesonic’s AI Bot Analytics shows you which crawlers are actually hitting your pages, which ones are getting errors, and whether those visits convert to citations.

The gap between what AI can read and what it actually reads on your site, that’s where the opportunity is.

Track your AI visibility with Writesonic →

Methodology

Test design: We deployed a single test page containing 62 distinct content injection methods. Each method was tagged with a unique marker, an invented word like MAPLE_01 or STORM_10 that doesn’t appear anywhere else on the internet. These markers can’t be hallucinated. If an AI reports finding one, we know exactly which injection method it read.

Content categories tested: Static HTML text, hidden HTML elements (<noscript>, <template>, comments, data attributes, input values, alt text), <head> metadata (meta tags, OG tags, JSON-LD, <title>), CSS-hidden content (display:none, visibility:hidden, opacity:0), CSS-generated content (::before, ::after, content:attr()), <style> and <script> text, JavaScript execution at timed intervals (0ms to 5s), JS network requests (fetch, XHR, POST), advanced DOM (Shadow DOM, iframes, Web Workers, IntersectionObserver), and URL parsing.

AI assistants tested: ChatGPT (OpenAI), Claude (Anthropic), Gemini (Google), DeepSeek, Grok (xAI), and Copilot (Microsoft/Diffbot).

Testing period: March 2026.

Exclusions: Perplexity was not included. Its crawl behavior differs significantly from the other six and requires a separate testing methodology.

TLDR

We tested 62 webpage elements across six AI crawlers: ChatGPT, Claude, Gemini, DeepSeek, Grok, and Copilot. The best crawler found 35%. The worst found 21%. Most of your website doesn’t reach AI.

The crawlers split into three tiers: full browser (Copilot), headless JS (DeepSeek, Grok), and HTML-only (ChatGPT, Claude, Gemini). 9 of 11 metadata elements score 0/6 — including JSON-LD, which Google recommends but Gemini can’t read. 3 of 6 crawlers disguise themselves as human visitors. No AI scrolls your page.

The fix: server-side render, put everything important in body text, write descriptive <title> tags. If it works for ChatGPT (the worst crawler), it works for all of them.

Ping me on LinkedIn or X if you have questions.

We built Writesonic to track what happens after crawlers visit, which AI platforms cite your brand, what they say about you, and where you’re invisible. See it in action →

Samanyou Garg

Founder @ Writesonic

Samanyou is the founder of Writesonic, a platform that helps you track & boost your brand’s visibility in AI search. Two years before the launch of ChatGPT, Writesonic was already at the forefront, helping organizations automate their entire marketing workflow through specialized AI agents for SEO and content. Samanyou is a Forbes 30 Under 30 awardee and a winner of the 2019 Global Undergraduate Awards, often referred to as the junior Nobel Prize.

AI Crawler Study: What 60+ Tests Across 6 LLMs Reveal

Key Takeaways