Anamap Blog

Don't Make These Common Analytics Mistakes

Business Insight

6/5/2024

Alex Schlee

Founder

It’s only natural to fall into the occasional pitfall or two. Hopefully, this list helps your company not fall into some common ones related to analytics.

1. Thinking that switching to a new analytics platform can solve all your organization’s problems.

This seems to be the single most common mistake that I've seen nearly every business make during its existence. I'm sure you've experienced this a time or two or maybe you've even been guilty of perpetuating this mistake. This typically takes the form of a business decision maker saying that they aren't getting enough out of their current data analytics system whether that's Google Analytics, Adobe Analytics, Mixpanel, or others. The line of thinking goes something like this, "if we only switch to this new vendor, it'll solve all our issues with data". While I'll admit that there is the odd occasion this is true it isn't true for 95% of companies out there.

Inevitably companies will switch to a new vendor bearing the burden of doing the migration and all the issues associated with it and will still be unhappy with the outcome. The problem is that people often forget that the tool is just one part of the equation. Having the most powerful available tool on the market means almost nothing if no one at your company knows how to use it. Even with a suboptimal tool, having the right people with the right knowledge can result in great insights. Human capital is by far and away the most important part of your organization getting the most out of data. Invest in your existing people with the right training or find and hire the right outside people, it will make a huge difference.

2. Trying to track everything.

This mistake is most commonly made when the analysts in your organization have control over the analytics implementation or data collection strategy. Having sat in the hotseat myself in front of a table of bristling executives I can tell you that it doesn't feel great when one of them asks a question about some element of the user experience where there is no tracking. Instead of being able to provide a "yes, let me look that up and get back to you" you must concede that those interactions aren't tracked which means not only is there no historical data to use for trends but also the turnaround time to get the insights is much longer due to the implementation lag time. Most analysts will try to avoid this situation at all costs by just tracking everything "just in case".

Having too many events with too many different attributes can make it more cumbersome to figure out what event you should really be looking at and it makes the task of maintaining documentation considerably harder. If your company is ever hoping to democratize your data and make business stakeholders data driven decision-makers, you're going to have a hard time if your data structure is too complicated.

The single most important thing to do when thinking about implementing a new event is just to ask, "what meaningful business decision could be made if I had this data?". If you, or the person requesting the data, can't answer this question your company probably doesn't need that data. Don't be afraid to ask executives this question too. I've seen many flights of fancy type requests just because the executive was curious, not because they were expecting to make a decision with the information.

3. Using too many events rather than count attributes.

Most analytics vendors charge by event volume; the more volume you want the more your annual contract is going to cost (assuming you have one). In many cases going over your contracted volume amount and into overage territory will cost your company dearly. The overages can be a very expensive lesson in being conservative with your event collection. In some cases, you may need the full event with all its context and attributes but in many cases you just need a count of how many times an event happened without all the additional attribute information.

Instead of firing an event for every single time a user interaction is performed you can keep a running count of the number and send it as a user is leaving the page. There are several ways this can be done using sendBeacon and watching for page visibility events as well as the document unload event. Because browsers are all maintained by separate groups of developers and different users have different connection qualities there may be sometimes where this event with counts does not get sent. The counts from this event are going to be directionally accurate, which will allow your business to track it over time and understand whether it is being positively or negatively impacted by changes you’re making.

4. Not using User vs Event level attributes properly.

A common mistake is overloading event-level attributes with user-level information, and it’s one that can significantly hamper your data management and analysis efforts. Imagine you’re trying to sift through mountains of data, only to find the same user information repeated across every single event. Not only does this lead to redundant data, but it also makes the entire process of managing and analyzing data more cumbersome and less efficient. User-level attributes—like age, location, and subscription type—are meant to provide a snapshot of the user’s profile. These details should describe the user as a whole and not be included repeatedly in every event they trigger. Session level attributes like marketing campaign tracking can also be set at the User-level so all events that happen within that session can be attributed correctly to that campaign.

Regular updates to user-level attributes and timing those updates correctly are crucial to maintain data accuracy and relevance. In the case of campaigns, you typically want to set them at the very beginning of a new session (or if user clicks through from multiple campaigns in a single session) but you also want to make sure they "expire" at the beginning of the new session which means setting them back to an empty state. The timing of user level data updates is also important to consider. Take the case of updating a user's subscription type before versus after the subscription update event. If you update the User-level attribute before the event you may not be able to tell as easily what type of subscription the user was upgrading from. Conversely, if the User-level attribute is important for figuring out what kind of subscription type they upgrade to you it makes sense to update the attribute before the event.

By avoiding the mistake of overloading event-level attributes with user-level information, and by ensuring regular updates to user-level attributes, you can make your data more manageable and your insights more accurate. This practice not only streamlines your data management but also enhances the precision and relevance of your analytics, ultimately leading to more informed and effective business decisions.

5. Assuming your data is going to be perfectly clean.

The Internet is basically the Wild West, and your data is lucky to make it all the way to the server in one piece. There are many different causes of weird data that it's hard to enumerate all of them, but I would coarsely bucket them into these categories:

  • Human Data Entry Errors

    Anywhere a human is involved there is the potential for error. This could be products being categorized incorrectly in your product database or typos in your CMS or incorrect strings in your analytics implementation itself. Benign or not, these issues crop up in every company data set I’ve ever seen. Missing data from human error is easy to spot but miscategorized data can be a real pain to sniff out.
  • Bot Traffic

    Ever wonder why some users on your site have 300 page views? Usually, it’s because you have a bot generating way more page views than a real user would. “Don’t analytics platforms block bot activity?” Some commonly known bots like Google Crawler do get blocked by many analytics platforms, however, bots that are created by individuals specifically to scrape data from your site fly under the radar since they only show up on your site. There are some bot detection tools that can help but they are locked into an arms race with the creators of the bots who do not want their bots to be discovered so they aren’t consistently good at detecting the bots.
  • Connectivity Issues

    People tend to take a connection to the internet for granted but at any given time there are many users, your customers, who are experiencing some kind of internet connection issue. Whether that’s because there’s a storm causing their home internet to be intermittent or they’re on a mobile device switching between cell towers or something else entirely, there is a good chance a percentage of your users will experience connection issues while on your site or app. We’ve all been there when we go to click a link (possibly generating a Click event) but your internet cuts out before loading the next page. Many people would look at your event stream and think “how is it possible this user has a click event with no page view after that?”. The next time you see that in your company’s data you’ll know.
  • Browser Differences

    This used to be a much bigger issue in the old days when browsers were more dissimilar, but these days the problems can still be observed in the dark corners of your datasets. Javascript execution and subtle differences in the APIs browsers expose to collect data can be a little different and these differences can result in irregular data collection.
  • Caching (Stored Version of Pages)

    Are you still seeing bad data from something you patched 3 weeks ago? Chances are one of your users is viewing a cached (stored) version of your page from before the fix was released. This can impact both apps and websites since users can take forever to upgrade old apps and there are certain ways for users to effectively save a copy of your website and continue to use it after the live version has changed. These caching issues effectively mean that bad or old data can persist for a surprisingly long time after updates have been made.

The real takeaway here is to not worry about your data being perfectly clean. As I've outlined it is nearly impossible to eliminate all sources of random data errors. Understand that a (hopefully) small amount of your data is going to be weird and do your best to make directional conclusions based on the majority of the data.

ABOUT THE AUTHOR

Alex Schlee

Founder

Alex Schlee is the founder of Anamap and has experience spanning the full gamut of analytics from implementation engineering to warehousing and insight generation. He's a great person to connect with about anything related to analytics or technology.