Data Discrepancies: GSC vs SPA

Modified on Wed, 15 Feb, 2023 at 3:21 PM

Google Search Console data representation has its own limitations; some of the highlights are as follows:

To protect user privacy, GSC doesn't show all data. For example, we might not track some queries that are made a very small number of times or those that contain personal or sensitive information.
Some processing of our source data might cause these stats to differ from stats listed in other sources (for example, to eliminate duplicates and visits from robots). However, these changes should not be significant.
Technical differences between tools. We are using the Search Console API which processes data differently from the Search Console UI.

In the following examples, the “View” column represents the available data splits in the GSC UI. The “Overall” view, presented in the GSC screenshot below, is different from other views. In both examples, we can see that the “Pages” view shows more clicks and impressions in comparison to other breakdowns (overall, queries, countries, devices). This is one example that highlights the extent of data discrepancies.

The data presented in SPA will reflect, more closely, the metrics related to “Pages” rather than the “Overview”.

Example # 1: Bushnell, May 2020


Google Search Console Performance Report
View	Clicks	Impressions	Average CTR	Average Position
Overall	159K	2.22M	7.1%	15.8
Queries	79879	822368	27.11%	2.87
Pages	167849	4 495 602	6.28%	13.17
Countries	158834	2222789	5.21%	21.9
Devices	52944	740929	8.23%	13.2
Search Appearance	78371	966534	7.41%	9

Schema Performance Analytics (SPA):

Example # 2: Bushnell, June 2020


Google Search Console Performance Report
View	Clicks	Impressions	Average CTR	Average Position
Overall	122K	1.57M	7.8%	17
Queries	66995	551814	24.74%	2.87
Pages	126985	3164315	7.01%	13.42
Countries	122489	1572727	6.20%	24.8
Devices	122489	1572727	8.93%	13.80
Search Appearance	55247	698801	6.17%	12.08

Schema Performance Analytics

From the above samples, we can clearly see that GSC matrices reflect maximum clicks and impressions for page view even though the number of pages are limited to 1000 in UI. These data discrepancies are known by Google Search Console and are explained here for different reasons: https://support.google.com/webmasters/answer/6155685?hl=en#groupingdata

Whereas, SPA uses the GSC API to collect data on a daily basis. API gives the advantage to retrieve more than 1000 rows, however, when metrics such as clicks or impressions are pulled using API and broken down by different dimensions (page, search appearance and query), metrics are not the same as if the breakdown had not been applied. For example, the results of the brand query could not be equivalent to the overall/unfiltered results. As overall results provide an aggregate of all queries while brand queries (branded and non-branded) reflect the aggregate of tracked queries and omit the performance (clicks, impressions, CTR, position) of untracked queries. Therefore, anonymized queries are omitted, and data is truncated due to serving limitations by Google. This same reason is applicable to data inconsistencies for search appearances, scopes and other measures in SPA.

Therefore, in order to pull data for different data dimensions in SPA, first, we pull one row per URL, with the total of clicks, impressions, and CTR. After that, a separate stream pulls the breakdown by query and search appearances. The following scheme is used while pulling the information from API and also presenting different visualizations in the SPA dashboard.

Overall Results
- All Features + All Queries
Specific Feature
- Feature +All Queries
Specific Feature Brand Query (Branded/NonBranded)
- Feature + set of tracked queries
Overall Brand Query (Branded/NonBranded)
- AllFeatures + set of tracked queries

The users of the dashboard don’t need to memorize or handle the above information while extracting and filtering the information. The above information is provided to clarify the gap between different data segments.

Why do I see gaps in reporting for “No rich results”?

You may come across graphs in reporting that appear to show missing data. This is expected behaviour for tracking “No Rich Results”. Google doesn't report on this category, so Schema App computes these metrics with the formula:

All clicks - Sum of Search Appearance Clicks = No Rich Result Clicks

A graph showing data gaps for the "No Rich Results" data.

Ideally, the "No Rich Results" category should be a positive number, however, Google sometimes reports sum(search appearance clicks) more than overall clicks, therefore the formula results in a negative value which shows up as missing data in the visualization.