SchemaBot Crawler and the Analyzer Console

Modified on Wed, 6 Jul, 2022 at 5:28 PM

The Schema App Analyzer is a specialized website crawler that extracts JSON-LD, Microdata and RDFa from website pages and catalogs all schema.org data items.  It can be used to monitor the health of your markup, and to identify other markup opportunities.

 

Starting a Crawl 


Any website may be sampled using the Analyzer option on the left sidebar and entering the URL of the homepage -- the submitted website is placed on a queue and crawled for up to 10,000 links to produce a detailed report showing the count of each different schema type and errors or warnings found during validation.



Websites submitted for analysis are placed on a queue and run according to available resources, generally within 20 minutes. The crawler, where permitted by the website, will start with the given URL and process approximately 500 pages per hour.

Reviewing a Crawl in the Analyzer Console 


Once a crawl is complete you can choose to "View Results". In the console, crawl results are grouped by Type, with the tool flagging markup for errors and warnings according to Google and schema.org's documentation. Please note that the Analyzer is in Beta and may not always provide the most up to date information about errors. We recommend you test individual URLs with the Rich Results Test and the Schema Markup Validator to troubleshoot issues. If you aren't sure how to proceed, you can always email [email protected] and one of our CSMs will help you.



To review results in further detail you can select "Show Details" to see the list of URLs that have a certain Type of markup on them.


About the SchemaBot Crawler


Our crawler bot visits each page in the website, records and validates any schema markup found as JSON-LD, Microdata or RDFa, and then extracts all links found on the page and queues these for processing.  


The crawl process begins with the given URL and first looks for sitemap.xml and adds the links found to a start-list.  Pages are processed starting with the given URL and stepping through the sitemap, fanning out from the links found on each page using multiple crawl processes running in parallel.  The schema crawler can process approximately 500 pages per hour.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article