Large Language Models and JavaScript Consumption

Modified on Thu, 15 Jan at 9:48 AM

Schema App has two primary ways of inserting markup onto a webpage: client-side rendering and server-side rendering. The intent of this article is to provide comprehensive information about how LLMs make use of markup so users can make informed decisions about how they want to render and deploy their JSON-LD.

Alongside an analysis of how LLM bots crawl and index JSON-LD, this article client and server side rendering as it relates to Schema App deployment.

Key Takeaways and Recommendations

Although Gemini and CoPilot use crawlers than can render JavaScript, other major LLM platforms use crawlers that do not have this capability. This presents a visibility risk for users who want to ensure their structured data can be accessed and used by all LLMs. This section outlines how LLMs use schema markup, and suggestions for mitigating this "indexing gap".

Strategies to mitigate the "indexing gap"

The most direct way to mitigate the impact of this indexing gap is to explore server-side rendering solutions. In the context of Schema App, this will typically involve integration via a specific content management system (CSM) such as WordPress, AEM, or Drupal.

If this is not an option (e.g. your website is not hosted on one of these platforms), explore solutions that let you include JSON-LD in the initial HTML using pre-rendering or edge-side rendering.

How Schema Influences LLM Outputs

Schema markup typically gets used in the retrieval and grounding stage of LLM responses. LLMs that perform web-grounding and/or RAG (Retrieval Augmented Generation) are likely to use structured data to produce cited results and more accurate outputs.

LLMs and their Rendering Capabilities

Some crawlers (Google and Bing) can render JavaScript client side. Others like Perplexity and possibly ChatGPT cannot render that markup when crawling themselves but may use indices and sources that can render JSON-LD client side.

Google and Bing can render JavaScript

Googlebot and Bingbot can reliably render JavaScript. This means that Gemini and CoPilot are able to access and ground their AI responses because they have access to markup rendered by GoogleBot and BingBot respectively.

Perplexity and LLM Aggregators

Perplexity and other LLM aggregators use the data that their various web-sources (scrapers, indices, etc) will provide. If these upstream sources can "see" and index client-side rendered JSON-LD, then Perplexity will likely have access to the rendered JSON-LD.

ChatGPT and Claude

ChatGPT and Claude currently both use web crawlers that do not typically render JavaScript when they crawl a page. Although ChatGPT and Claude may use other indices and sources that include structured data, it is likely that they are unable to render and index JSON-LD that is inserted Client-side.

Client-side and Server-side Rendering of JSON-LD

A key component of this discussion is the distinction between client side and server side rendering.

Client Side Rendering

In client side rendering, the browser downloads an initial HTML shell and JavaScript code. The JavaScript runs in the client to fetch data (e.g. Schema App fetches & generates JSON-LD as the page loads).

Server Side Rendering

The server generates full page for each request and sends it to the browser. Schema App JSON-LD will have already been generated and sent to the client’s server to be included in the full page. Configuring Server-side rendering requires additional communication between Schema App’s CDN and the servers that host the client’s website. Typically, it is limited to CMS-specific integrations (WordPress, AEM, and Drupal).

AI Crawler Technical Specifications & Client Side Rendering Compatibility

The following list describes various AI crawlers and their compatability with client side rendering.

Crawler Identity Organization JavaScript Execution Schema Support Primary Constraint CSR Compatibility
OAI-SearchBot OpenAI (Search) No JSON-LD (Raw) Latency / Speed Zero
GPTBot OpenAI (Training) No JSON-LD (Raw) Packet Volume Zero
PerplexityBot Perplexity AI No JSON-LD (Raw) Real-time Fetch Zero
ClaudeBot Anthropic No Text Extraction Safety / Scale Zero
Googlebot Google Yes (WRS) Extensive Queue Time High
Bingbot Microsoft Yes (Limited) Extensive Consistency Moderate
Gemini (Agent) Google Variable Native Grounding Context Window Low (unless grounded)

Crawler Identity	Organization	JavaScript Execution	Schema Support	Primary Constraint	CSR Compatibility
OAI-SearchBot	OpenAI (Search)	No	JSON-LD (Raw)	Latency / Speed	Zero
GPTBot	OpenAI (Training)	No	JSON-LD (Raw)	Packet Volume	Zero
PerplexityBot	Perplexity AI	No	JSON-LD (Raw)	Real-time Fetch	Zero
ClaudeBot	Anthropic	No	Text Extraction	Safety / Scale	Zero
Googlebot	Google	Yes (WRS)	Extensive	Queue Time	High
Bingbot	Microsoft	Yes (Limited)	Extensive	Consistency	Moderate
Gemini (Agent)	Google	Variable	Native Grounding	Context Window	Low (unless grounded)