The SchemaApp Analyzer is a specialized website crawler that extracts JSON-LD, Microdata and RDFa from website pages and catalogs all schema.org data items.  


Analyzer console


Any website may be sampled using the Analyzer option on the left sidebar and entering the url of the homepage -- the submitted website is placed on a queue and crawled for up to 10,000 links to produce a detailed report showing the count of each different schema type and errors or warnings found during validation.


Websites submitted for analysis are placed on a queue and run according to available resources, generally within 20 minutes. The crawler, where permitted by the website, will start with the given url and process approximately 500 pages per hour.


Trend Reports and Scheduled Crawls


When you begin a new project, the url entered will be automatically scheduled for a routine schema analysis to be run once per month.  The individual crawl results will be added to the Analyzer listings, but will also be added to the Trend Report showing the evolution of your schema markup over time.


Schema Crawler


The crawling process visits each page in the website, records and validates any schema markup found as JSON-LD, Microdata or RDFa, and then extracts all links found on the page and queues these for processing.  


The crawl process begins with the given url and first looks for sitemap.xml and adds the links found to a start-list.  Pages are processed starting with the given url and stepping through the sitemap, fanning out from the links found on each page using multiple crawl processes running in parallel.  The schema crawler can process approximately 500 pages per hour.