How To Use the Schema App Analyzer to Assess Sitewide Markup Coverage

Modified on Fri, 14 Feb at 4:38 PM

The Schema App Analyzer is a specialized website crawler that extracts JSON-LD, Microdata and RDFa from website pages and catalogs all schema.org data items. It can be used to monitor the health of your markup, and to identify other markup opportunities.

How To: Start a Crawl

Any website may be sampled using the Analyzer option on the left sidebar and entering the URL of the homepage -- the submitted website is placed on a queue and crawled for up to 10,000 links to produce a detailed report showing the count of each different schema type and errors or warnings found during validation.

Websites submitted for analysis are placed on a queue and run according to available resources, generally within 20 minutes. The crawler, where permitted by the website, will start with the given URL and process approximately 500 pages per hour.

How To: Review & Interpret the Results of an Analyzer Crawl

You will receive an email notification once your crawl is completed. Upon completion, select "Review Results" to see the Analyzer Report.

In the console, crawl results are grouped by Type. The Analyzer will flag markup for errors and warnings according to Google and Schema.org's documentation.

Note: The Analyzer is in Beta and may not provide up to date information about errors. Always test URLs with the Rich Results Test and the Schema Markup Validator for the most accurate troubleshooting.

To review URLs that have specific Type of markup deploying to them, click select "Show Details" to see a list of URLs with that markup applied. You will need to test URLs with the Schema Markup Validator to get clarity on nesting.

About the SchemaBot Crawler

Our crawler bot visits each page in the website, records and validates any schema markup found as JSON-LD, Microdata or RDFa, and then extracts all links found on the page and queues them. The crawl process begins with the given URL and first looks for sitemap.xml and adds the links found to a start-list. Pages are processed starting with the given URL and stepping through the sitemap, fanning out from the links found on each page using multiple crawl processes running in parallel. The schema crawler can process approximately 500 pages per hour.

For more information about adding the SchemaBot Crawler to your allow list please review the following support document Allowlist Schema App IP Address & Endpoints for Full Functionality