Schema App Crawler Deployment Overview

Modified on Mon, 08 May 2023 at 02:27 PM

Introduction 


If your website has issues with page speed and none of the JavaScript integrations suit your needs, you may want to use Schema App's Crawler to deploy your markup. 


With the Crawler, Schema Markup is pulled from the Schema App cache and rendered server-side. As a result, overhead on the page is very low. This is a good option if the content on your site doesn't change frequently.


  

TABLE OF CONTENTS


Technical Design

Schema App's Crawler (Schema Bot) uses a website's sitemap and crawls the links listed for up to 10 levels until it finds a page that matches a Highlighter Template's page set. Once found, it will check whether JSON-LD exists or matches a current version stored and, if needed, send out an update.


After the initial crawl, the Crawler will check the website every 4 hours for URLs with "lastmod". If a URL has been modified, as indicated by lastmod, the page will be recrawled and markup will be updated. A full recrawl of the entire website takes place:

  •  Every 7 days by default
  • Whenever a Highlighter template is updated


To implement, you can place the JavaScript Rendering code for your account on every templated page of your site. E.g.:

<script>window.schema_highlighter={accountId: "Cupcake", outputCache: true}</script>

<script async src="https://cdn.schemaapp.com/javascript/highlight.js"></script>
HTML

NOTE: This not compulsory as the crawler deployment can run without JavaScript.



Technical Design - Notes and Observations

  • Some sites that block crawlers may need to whitelist our bot. Its useragent is 
    Mozilla/5.0 (compatible; SchemaBot/1.2; +https://www.schemaapp.com/bot/)
    Generic
    It will crawl from the following IP address: 52.45.62.191

  • If a page is "orphaned" (i.e. internal linking is broken) the Crawler will not be able to deploy markup to it. Google may also have difficulties crawling these pages since their crawler uses sitemaps and a similar strategy.

Additional Resources

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select atleast one of the reasons

Feedback sent

We appreciate your effort and will try to fix the article