May 15th, 2019

Data pipeline for a data driven company

SupersetWe at Innovalabs Technologies love working with data, we focus on what stack a product requires to help it grow by understanding user's behavior. This is possible by the intelligent resources working in our data engineering and analytics team.

What Data pipeline stack to use for a data-driven company?

Which tool comes to your mind when it comes to analytics? Google Analytics isn't it?

Well, Google Analytics is a perfect tool for a small business with little data. But when you want your user base to be strong and your product to be successful, you need to perform the complex tasks with the data you gather. The more data that you capture, more the options lie in front of you to do with the data.

Where to start?

Well first things foremost, you need a event analytics tools. Some of my favourite are:

  • Matomo
  • Google analytics
  • Snowplow
  • Hubilo

The choice is yours as there are other tools available as well.

Let’s talk about Matomo, it’s an open source platform which also provides you free mobile app.

Some advanced features:

  • Heat Map analytics
  • Form analytics
  • Session recording
  • User flow
  • Search engine keywords performance

It provides two pipelines for event tracking: Batch and real-time

You can either implement one or both of them depending on your requirement. I suggest implement both for a bigger picture. Heads-up, Matomo provides a very decent looking for a custom visualization, we can opt out for 3rd party visualization tool. We will talk about data visualization tool later in the article.

Data pipeline

Data visualization tools

Snowplow does not provide an out-of-the-box data visualisation interface like Google Analytics and Clevertap. To overcome this, we started using 3rd party tools namely, Kibana, Redash, and Superset.

Kibana is used for the data that is stored in Elasticsearch. Kibana helps us in visualising our users in real time. It also has a dashboard feature with which we can create our own dashboard and track many metrics in a single page.

Redash and Superset are tools that we use with the data that is stored in Redshift. Data can be fetched using simple sql queries. Superset goes one step ahead by having great interface for exploring the data without writing SQL queries. Both provide great interfaces when it comes to data visualisation. The tools can be used with other data sources like PostgreSQL, Elasticsearch, BigQuery, MongoDB, MySQL etc.

What's next?

This was an introduction to pipeline to track events for a data driven company. With the above architecture one can easily track millions of events in a day, thus creating tons of data for your company to play with. But the more important questions is what do we do with this data? Stay tuned!


Latest blogs

March 18th, 2020

Make HIPAA compliant architecture using AWS

Healthcare is one of the most important industries in the startup ecosystem in the World. But what makes Healthcare startups so important? The abil...

March 1st, 2020
IT Staff Augmentation

How Innovalabs's AGILE team has been so successful...

Recent health crisis in the world has forced companies to allow employees to work remotely. This is testing time for the whole world and businesses...

August 26th, 2019

Go Serverless with your products

For product-based tech startups, it has never been easy to build products. Serverless architecture has low or no cost associated with it and this i...