Schema App Crawler Deployment Overview

Modified on Thu, 10 Aug 2023 at 01:37 PM

Introduction 


If your website has issues with page speed and none of the JavaScript integrations suit your needs, you may want to use Schema App's Crawler to deploy your markup. 


With the Crawler, Schema Markup is pulled from the Schema App cache and rendered server-side. As a result, overhead on the page is very low. This is a good option if the content on your site doesn't change frequently.


  

TABLE OF CONTENTS



Technical Design

Schema App's Crawler (Schema Bot) uses a website's sitemap and crawls the links listed for up to 10 levels until it finds a page that matches a Highlighter Template's page set. Once found, it will check whether JSON-LD exists or matches a current version stored and, if needed, send out an update.


After the initial crawl, the Crawler will check the website every 4 hours for URLs with "lastmod". If a URL has been modified, as indicated by lastmod, the page will be recrawled and markup will be updated.


A full recrawl of the entire website takes place:

  •  Every 7 days by default
  • After a user clicks "Start Crawl" on the Highlighter page
    • Note: If you make modifications to any Highlighter template (highlight, tag, page set, etc.), you must click "Start Crawl" in order for those changes to take immediate effect.



Technical Design - Notes and Observations

  • Some sites that block crawlers may need to whitelist our bot. Its useragent is 
    Mozilla/5.0 (compatible; SchemaBot/1.2; +https://www.schemaapp.com/bot/)
    Generic
    It will crawl from the following IP address: 52.45.62.191

  • If a page is "orphaned" (i.e. internal linking is broken) the Crawler will not be able to deploy markup to it. Google may also have difficulties crawling these pages since their crawler uses sitemaps and a similar strategy.

  • If there are discrepancies between the desktop and mobile versions of your website, you can request the mobile version of the site be deployed to by contacting support@schemaapp.com. They will open a ticket with our development team to have the domain manually added.

Integration Options

In order for the Crawler to deploy markup, it requires a server-side integration. This can be done through:

Note: If using JavaScript, highlight.js will run to evaluate the page and fetch from our CDN. Markup will be fetched from a cache of the most recent crawl rather than dynamically rendered on page load.  

Additional Resources

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select atleast one of the reasons

Feedback sent

We appreciate your effort and will try to fix the article