Export Data API - Individual

Modified on Tue, 31 May, 2022 at 9:46 AM

Schema App’s CDN (https://data.schemaapp.com) allows customers to pull markup for any given page that is marked up. For customers that may benefit from custom integrations, such as those using a pre-rendering service, the Individual Data API can be a great option to get schema markup rendered service-side.


How the CDN is structured


Markup is divided by what service produces the markup. Schema App divides markup when it’s used to augment other markup or is generally deployed without a call to the cache. See the table below for how sources are divided. 


Path

Markup Source

[AccountId]/

Markup produced by Editor, Crawler, Merchant Center Integration or BigCommerce Integration.

[AccountId]/__highlighter_js

Markup produced by highlight.js

[AccountId]/__Bazaarvoice

Markup produced by Bazaarvoice integration


All markup is stored based on the account that you can find on the integrations page of the project you want to deploy markup from. When calling the CDN do not include the full URL of the accountId just include the path after db/. The URL for the page is base 64 encoded with the trailing ‘=’ removed. 


For the Account ID: http://schemaapp.com/db/ExampleOrg for one to get the markup for the page https://www.example.org/about-us they would create a GET request to the URL: 


https://data.schemaapp.com/ExampleOrg/aHR0cHM6Ly93d3cuZXhhbXBsZS5vcmcvYWJvdXQtdXM


If one is interested in using the highlight.js script but wants the markup to be served from the CDN they would choose the JS+Webhook Hybrid integration and apply that script to their site. To retrieve the markup from the highlighter for the same account and page as above the URL would be:

 https://data.schemaapp.com/ExampleOrg/__highlighter_js/aHR0cHM6Ly93d3cuZXhhbXBsZS5vcmcvYWJvdXQtdXM


Example JavaScript


Here is an example of how Schema App currently calls the CDN to deploy markup as part of our JavaScript deployment method. This is a typescript example designed to compile and run client side in the browser.


const fullUrl = [location.protocol, '//', location.host, location.pathname].join('');
const encodedFullUrl = btoa(fullUrl).replace(/=/g, '');

interface AccountId {
	account: string,
	subaccount?: string,
	complete: string
}

function applyQueryParams(url: string, params?: string[][]): string {
	const paramString = new URLSearchParams(params);
	return paramString.toString() !== '' ? url + '?' + paramString : url;
}

function applyHeaders(accountId: AccountId|null, additionalHeaders?: Headers): Headers {
	const baseHeaders: Headers = new Headers();
	if (accountId) {
		baseHeaders.set('x-account-id', accountId.complete);
	}
	baseHeaders.set('accept', 'application/json');

	if (additionalHeaders) {
		additionalHeaders.forEach((value: string, key: string) => baseHeaders.set(key, value));
	}

	return baseHeaders;
}

function getRequest(
	url: string,
	accountId: AccountId|null,
	params?: string[][],
	headers?: Headers,
): Promise<Response> {
	return fetch(applyQueryParams(url, params), {headers: applyHeaders(accountId, headers), mode: 'cors', cache: 'no-cache'});
}

function insertIntoPage(item: string, source: string): HTMLScriptElement {
	const script = document.createElement('script');
	script.type = 'application/ld+json';
	if (source !== '') {
		script.setAttribute('data-source', source);
	}
	script.innerText = item;
	document.head.appendChild(script);

	return script;
}

getRequest(endpoints.DATA + accountId?.complete + '/__highlighter_js/' + encodedFullUrl, accountId).then(async (response: Response) => {
	const text = await response.text();
	if (text !== '') {
		insertIntoPage(text, response.headers.get('x-amz-meta-source') ?? '', true);
	}
});

How Pages are Stored


Pages are only cached at the path level, Schema App does not store markup with any search parameters or fragments on the URL. When making a request any additional parameters must be removed before encoding. i.e. if your URL is https://www.example.org/about-user?with-location=true it must become https://www.example.org/about-user before base64 encoding. 


If you are using the highlight.js script parameters and fragments are removed automatically so deploying to pages that commonly use such parameters is not an issue. The script example above also does this removal as an example.


CDN Headers Explained

In addition to many standard headers such as Content-Type, Content-Length Schema App adds additional metadata that can be used.


Header Name

Purpose

last-modified

The last time this page’s JSON-LD was modified.

etag

The version of the object

x-amz-server-side-encryption    

Always uses AES256 encryption through AWS.

x-amz-meta-url

The URL of the page the markup is for

cache-control

By default we cache for one week.

x-amz-mainaccount

The base level project in your Schema App account i.e. ExampleOrg using the example above.

x-amz-meta-source

Which Schema App Service produced the markup. I.e. Editor, Crawler, HighlightJS

x-amz-meta-accountid

The full account id i.e. ExampleOrg/SubAccount

x-amz-version-id

The internal version id within Schema App



Compression Support

Schema App’s CDN supports gzip or brotli compression, in order to receive a compressed response the request must include the Accept-Encoding header with either gzip or brotli. By default Schema App will not compress the response, that is if no accept-encoding header is included in the request the response is returned uncompressed. If the response is compressed a content-encoding header will be sent in the response with the compression algorithm used. 



Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article