User Count for GA4 and Bigquery have huge different
Asked Answered
H

2

1

I have export the GA4 data to Bigquery, however when I compare the event count and user count between GA4 data and Bigquery data, there have a huge different on user count between GA4 and Bigquery. However for other event such as session start or custom event will not have similar issues. Anyone have idea about why?

User Count and Event Count from GA4

1

User Count and Event Count from Bigquery][

2

The code I use for BQ is as below:

SELECT event_name, COUNT(*) as event_count, COUNT(DISTINCT(user_pseudo_id)) AS user_id, 
FROM
    mytable
WHERE event_name = 'page_view'
AND _TABLE_SUFFIX BETWEEN '20230201' AND '20230228'
GROUP BY 1

Also I checked, don't have any NULL value in user_pseudo_id

Harlene answered 9/3, 2023 at 15:29 Comment(0)
H
0

If events add up then the good news is its not a data capture issue or a date time issue.

Do you have a privacy policy on your website?, if so it is probably users opting out. Did you check for null user_pseudo_id looking at the data table or was it by count(distinct)? The count(distinct) excludes null values.

Hixon answered 9/3, 2023 at 16:1 Comment(3)
I did check the null for user_pseudo_id, seem like don't have any of null. But if it is because of users opting out issues, why only page_view will have huge different in user? Other event seem less than 5% different only.Harlene
GA4 can be set up for cookie less tracking, this means that it will record the event of a page view but there will not be a ga_session_id or user_pseudo_id a attached to it. I know you have checked but if you try adding user_pseudo_id is not null to your WHERE statement and see if the number of views change, that will confirm if it is a missing ID challenge.Hixon
I use the code below to check again, the user count is the same. SELECT event_name, COUNT(*) as event_count, COUNT(DISTINCT(user_pseudo_id)) AS user_id FROM aigensstoreapp-analytics.analytics_271504477.events_* WHERE event_name = 'page_view' AND _TABLE_SUFFIX BETWEEN '20230201' AND '20230228' AND (user_pseudo_id IS NOT NULL OR user_pseudo_id != '') GROUP BY 1Harlene
A
0

One of the reasons why the report matrix is inconsistent with bigquery is because of GA4 Data thresholds

If a report or exploration includes demographic information and the reporting identity relies on the device ID, the row containing that data may be withheld if there aren't enough total users.

Angelika answered 10/3, 2023 at 15:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.