If you have started migrating to Google Analytics 4 (GA4) from Universal Analytics, you might have hit some bumps in the road when it comes to using Regular Expressions (RegEx). I know I have, and I thought I’d lost the plot. Turns out, it wasn’t just me.
RegEx provides a great way of filtering data in reports. But what the heck is it?
A short introduction to RegEx
RegEx is based around specific syntax that uses special characters and rules.
There are too many to list and provide explanations for, so here is a nice rundown of the ones most commonly used.
Although it takes a while to learn all of the special characters involved, and what they do, once you get the hang of it, isolating just the data you want in reports, filters, or segments in Universal Analytics is a breeze.
But then GA4 just had to come along and ruin everything, didn’t it?
So, what’s the deal with RegEx in GA4?
For starters, RegEx is not as widely available in GA4. Why? I don’t know. In addition, the way in which RegEx works in GA4 is slightly different.
In Universal Analytics, you could enter partial matches for the data you want to include/exclude using RegEx. In GA4 it must EXACTLY match, or you need to use more advanced RegEx to achieve the same results.
Here’s a good example of the difference between partial and exact match RegEx from Google:
By default, regular expressions in Universal Analytics properties are treated as a “partial match.” The expression will be true if the pattern you provide is contained anywhere in the data.
For example, if you provide the pattern “India” the regex matches “India”, “Indian”, “Indiana”, “Indianapolis”, and so on. You don’t need to use metacharacters to achieve this partial match.
In a Google Analytics 4 property, the default regex is a “full match.” The data must exactly match the pattern you provide. For example, the pattern “India” only matches “India.” To make this regex act like a partial match, you must use metacharacters: “India.*” will return any value that begins with “India” and ends with anything (or nothing) else.
Oh, and by the way, GA4 does not support segments, and the options for filters are also limited – both of which I use extensively to filter data with RegEx. So have fun with that.
With all that in mind, here’s a quick guide to help you get started with RegEx in GA4. Hopefully, I’ll be able to add to this as more RegEx functionality becomes available.
Excluding internal traffic
The options available for filters sure are limited in GA4, but thankfully one thing that RegEx does work with is for filters excluding internal traffic.
This is nice and easy to set up.
From Settings, navigate to your Data Stream and select the web data stream. Next, click on Configure Tag Settings.
Here, you’ll not see the thing you’re looking for (because why would they make it easy) until you click on the (completely pointless) ‘Show All’ drop down, and you’ll then finally see ‘Define internal traffic’.
Click the Create button, then select ‘IP address matches regular expression’ from the drop-down under ‘Match type’. Then add your IP addresses in the Value field.
RegEx in events
With less than 3 months to go, I was surprised to learn today that RegEx is only now being made available for modifying and creating events.
Again, it’s pretty basic, but at least it has the option to ‘ignore case’ if for whatever reason your event naming conventions aren’t consistent.
Listing unwanted referrals
From Settings, navigate to your Data Stream and select the web data stream. Next, click on Configure Tag Settings. Click on the same pointless ‘Show All’ drop down and then ‘List unwanted referrals’. Click that.
Now you can add multiple domains you want to exclude traffic from. An excellent explainer video that takes you through the process step by step can be viewed here.
Exploration reports
If you’ve not checked out the reports in GA4, I’d suggest you start getting familiar with exploration reports ASAP. GA4 lacks a lot of the standardised reports available in Universal Analytics, so you’ll need to rely heavily on exploration reports to find a lot of the data you need.
As stated before, RegEx with GA4 is based on EXACT matches. So whereas previously, you could build a RegEx with partial matches, that’s no longer the case.
Here’s an example of filtering out data in an ‘All pages’ report in UA with simple, partial RegEx just using the Advanced search tool.
This was a nice easy way of removing traffic to URLs containing /blog/ which are not actual posts (i.e. authors, pagination, tags, and categories) from the standard ‘All pages’ report. You could build the same thing as a Segment just as easily, too.
There is a ‘Pages and screens’ report as standard in GA4. But the customisation options are limited.
Even when adding ‘Page path and screen class’ as a secondary dimension, you have to then exclude the pages you don’t want as Conditions. Not only is it a pain in the arse to scroll through to find the pages you want to exclude, you can only build up to 5 conditions.
And if I wanted to exclude all of those above (as I did with the UA report), there appears to be no way of doing this as I can’t add another Condition to exclude a different ‘Page path and screen class’ dimension.
So that leaves us with building an exploration report, which is a bit more complex.
To build the same report in GA4 as the one used in the example earlier (removing authors, pagination, tags, and categories after just isolating blog traffic, excluding the root), from the dashboard, click on ‘Explorations’ and then select the Blank template.
Next, add the Dimensions and Metrics you want to include in your report (which is a table report by default).
Then, I added:
Filter: Contains
/blog/
And good news, I can now just see blog URLs.
You’d think that now you could use the same RegEx we did in UA to exclude the author, page, tag, and category URLs we don’t want. But you’d be wrong.
There is a ‘does not match regex’ option in the Filter drop-down, but this isn’t the same thing as EXCLUDING RegEx based on what you want it to match. There is also no Exclude function on the Filter.
So we can’t just use EXCLUDE and input author|page|tag|category as we did before in UA. Hmm.
When it comes to using RegEx to exclude or filter out data, I have not come across a solution.
However, one exclusion method is to use the ‘does not contain’ filter option to add exclusions individually. Not a huge problem when it’s a handful, but still a pain when you have LOADS to add.
The other option is to right-click on each row of data you want to be removed from the report, and then select ‘Exclude selection’. It will then add this to the filters.
[exclude selection image]And yes, you do have to do that for every page you want to exclude. And no, there isn’t a search function to find those pages easily. And yes, it does refresh the report and take you back to the top of the page each time.
*** I would LOVE to find out if anyone has worked out an easy way to perform RegEx exclusions in GA4 – if you are that person please let me know! ***
Including data using RegEx is a bit less complicated. But remember, you have to EXACTLY match the RegEx, or it won’t work.
Or, you can use this RegEx to accomplish the same thing:
Filter: Matches regex
^(.*\/(tag|category|author|page)\/.*)$
If you just use tag|category|author|page it won’t work. Luckily, you can use the ‘contains’ filter option and add each one individually.
Before I go. I’d like to give an extra special shout-out to Analytics Mania, who are producing some awesome GA4 content. They’ve really helped to bring me up to speed as it’s definitely been a steep learning curve. I’d highly recommend you check out their eBook here.