Get Data Smart

Your data science 101 guide

Businesses across the country (and the globe) are fast becoming more proficient with capturing, storing and using data to enhance their decision making—and you can too.

In this guide, we'll teach you the fundamentals of data and how data science works, the best practice data gathering techniques to use, and how you can start using data to:

  • Identify your most profitable customers.
  • Predict and prevent customer churn.
  • Power up your cross selling capabilities.
  • Provide critical insight to your marketing and sales efforts.

The world of data can be daunting, but with the right expertise to guide you it doesn't have to be out of reach.

Short on time? Download a copy of this guide to read offline!

New call-to-action

The difference between data and information

Learn the difference between data and information in business, and discover what your enterprise needs to leverage your market research to its fullest.

Data versus information

Put simply, data is a number, picture, statement, etc that is unprocessed.

It might be a name, the content of an email, an address, a sales figure; anything of that nature. If you printed out a bundle of receipts from your latest transactions and observed the total value, you would have a set of data.

Every business generates data, whether it’s through the cash register, an email list, a loyalty card or even just casual conversations with customers.

It isn’t just numbers in a system. You can get both structured data (numbers), and unstructured data (images, audio, writing, conversations). It is quite similar to the difference between quantitative and qualitative data.

Data in its raw form, as you can imagine, isn’t particularly useful. Knowing that some customer somewhere at some point paid $10 for something only tells you one thing: how much they paid. Once you begin gathering other data points, such as who, when, where, why and for what, and expand that out to your whole customer base, you start getting something more helpful for your strategy.

You start getting information.


Read more: Businesses that embrace data science are set to come out ahead



What about information?

Information is processed data. It has meaning and context. It isn’t just a single data point, it’s a series of data points that have been melded into something that informs or allows you to act.

Take that $10 purchase from earlier. Once you know the specifics of that transaction, and other similar transactions, you might start noticing that one particular product is performing well at a specific time of day, or is being bought alongside other items. Each and every single one of those numbers, sales figures, timings, customer numbers; those are data points.

Once you group them, collect them, organise them, analyse them and manage them, you get information. From information, you can see trends or connections or unfulfilled niches. From information, you can start strategising. From information, you get business intelligence.

Computers need data. Humans need information. The question you need to ask of your business is whether it has the capability to both gather data and changing data into information both efficiently and accurately. It’s the difference between conducting research and generating actionable insights.


What does your business need from data and information?

From data, information. From information, business intelligence. Every business generates data, large and small, and every business can take advantage of data—it is how you process data that makes the difference.

However, all too often, businesses are working from broken, inaccurate or unimportant data. The information you need will change entirely on your growth stage and what industry you are in, and as a result, the data that you focus on will be unique as well.

  • Do you need to find a new demographic to appeal to?
  • Do you need to find a way to cut costs? 
  • Is it time to expand your physical presence? Perhaps your digital?

All of these questions affect the information that you need, and thus the data you need.

Moreover, the data you gather has to be both reliable and valid. Market research, i.e. data gathering, doesn’t stem from a single, short survey from a select group of customers. It is an in-depth, complex process, in which analysts separate the bad data from the good, the important from the unimportant; the wheat from the chaff, in other words.

Only by working with ‘good’ data can you hope to create ‘good’ information—the cornerstone of any solid business strategy.

Data builds information, and information builds strategic success. Without the first, you can’t have the second or the third. A good business is built upon great market research, which can gather and analyse all of the data that your company is currently gathering, separating the useful from the useless.

Get good information. Start with great data.


Read more: Data analytics provides certainty in a time of uncertainty



Behind the scenes in data science

While data science is something of a buzz word, businesses and brands often aren’t aware of what data scientists actually do—or the insights they can offer. We’re here to rectify that.


A data science project lifecycle

From predicting which of your customers are most likely to leave to identifying your most valuable customer segments, all data science projects follow the same lifecycle.

1. Define objective(s)

Define the main and sub objectives of the analysis. This will steer the project going forward. Any context/business knowledge around the objectives is also collected.

2. Obtain data

Acquiring the relevant/required data is crucial. Typically, this is where a data science team will talk to you to understand and obtain the correct data for the project.


At this stage, the data cleaned and explored for general insights. This step gives data analysts a strong understanding of the intricacies of the data, which then guides model development.

4. Develop models

Models are developed, tested and compared to optimise performance against the project’s objectives. In most cases (depending on the type of model used), deeper insights are gained during this stage.

What do we mean by models? By models we mean advanced algorithms that utilise data to identify and predict relationships between behaviours, attitudes, and outcomes. These can involve machine learning, a type of artificial intelligence that automatically adapts and learns depending on the task it has been set.

5. Results

At this stage, analysts will evaluate the model’s results using historic interpolation and forecasting. The models are then integrated into the client’s ongoing operations and strategy in some form.

6. Monitor

Model performance and other metrics are monitored through a dashboard. Because the models force population change, the performance of these models often decay over time, hence the need for model re-calibration.

Repeat steps 2 to 6 to re-calibrate as often as required.


What data science can do for you

Data cleaning, structuring and storage solutions

Data is only useful if it is accurate, meaningfully structured, and accessible.

In most cases, the first hurdle companies face is getting their data to meet the above criteria. Many simply don’t know how to get it there.

Data cleaning and structuring is mandatory for any modelling work. Without client-side data in a workable format, most of the deliverables are unattainable. Cleaning and structuring is a big job (sometimes months or years in the making) and as such, is typically a project on its own.

This type of data science work is ideal for businesses who:

  • Currently collect data but do not do anything with it.
  • Want to become more data-driven and are at the beginning of that journey.
  • Store data in spreadsheets or paper form.
  • Want to develop models or dashboards.


Power BI Dashboarding

Dynamic dashboards customised to display subject-specific data.

Power BI Dashboarding is a good solution for clients wanting to get more meaning out of their transactional/system data.

Currently, we develop them to display cross-sectional and/or longitudinal survey data gathered from our brand trackers.

To see trends and changes over time requires regular data updates. With each update, the data has to be converted into a structured format (which the DS team performs). This means that the way the data is received, collected, and processed has to be repeatable.

This type of data science work is ideal for businesses who:

  • Receive a repeat service where the data can be tracked over time.
  • Have repeated reports that can be converted into a Power BI dashboard.
  • Frequently capture data they want to track.
  • Want to monitor model/strategy performance.


Cluster Analysis

Uses statistical analysis to groups data points into similar populations to help clients tailor and target their marketing.

Cluster analysis can provide powerful psychographic insights into customer segments. Using attitudinal and/or behavioural segmentation data, we group a client’s customer population into like-minded subsets based on their demography and behaviour observed by the client.

This type of data science work is ideal for businesses that:

  • Understand the types of customers they have at a high level but want to delve deeper.
  • Want to know what the drivers/underlying forces of each segment are.
  • Want to understand the impact each segment has within their customer base.
  • Want to leverage attitudinal and behavioural data to improve overall performance.


Propensity Modelling

Assesses the likelihood of an event occurring.

By drawing on historic behavioural data, we can create a predictive model that can predict binary outcomes (i.e. an event happening or an event not happening). It can be used to assess whether a customer will churn, whether they will purchase and whether a customer will react to communication, which helps companies develop a highly targeted approach for engaging these customers. This type of modelling can also quantify the drivers of the event being predicted.

Like cluster analysis, this task also typically moves through the following four stages:

  1. Clean up data.
  2. Apply clustering algorithms.
  3. Review results and tweak algorithm.
  4. Visualise resulting clusters.

This type of data science work is ideal for businesses that:

  • Understand the drivers towards an event occurring
    (such as customer churn, purchase, reaction, etc.)
  • Want to leverage these drives to improve overall performance.
  • Want to reduce negative event rates systematically.


Mixed Media Modelling

A strain of econometric modelling that measures how effective advertising spend/activity is on a target variable, such as sales.

The result is an equation that receives inputs (advertising activity and other seasonal components) and estimates the target variable (e.g. sales).

This task is a more advanced modelling task and is generally slowed due to data acquisition because the client collects the data at a summarised level. Because data is only top-level, it usually means less cleaning is required.

This type of data science work is ideal for businesses that:

  • Want to understand how effective their advertising activity (across media channels) is in driving sales, trials, profit, etc.
  • Want to isolate and quantify the seasonal effects unrelated to advertising activity.
  • Want to optimise their media mix and allocation.
  • Want to reduce advertising budget but maintain current performance.
  • Want to assess the impact any additional advertising budget might have i.e. to uncover what the expected uplift in sales/trials/profit would be.


Not all models are equal—some must come before others

Data modelling falls into three broad categories:

1. Descriptive models

Purpose: To understand the past, interpret and explain.
Measure: What happened/is currently happening?

2. Predictive models

Purpose: To understand the future, predict accuracy and continue evolution.
Measure: What is to come given the current course?

3. Normative models

Purpose: To answer a question posed through scenario prediction.
Measure: How can we achieve a specific outcome based on what we know?

Each category almost always relies on the previous category being completed first
(i.e. 1 before 2, 2 before 3). Descriptive models are a key component of a predictive modelling solution and predictive models can be utilised in a prescriptive way, making it a normative model.

How far we delve into this hierarchy depends on how far you are willing to go.



Four steps to control the quality of your customer data

Excellent marketing starts with excellent data—clean, complete and up-to-date.

Many companies usually update their customer data over a period of time, and do so reactively rather than proactively. To be able to act fast on the insights derived from your data, this data needs to be complete and well maintained in the first place. This will help immensely with generating sales and ultimately driving more revenue for your business.

To achieve this, simply follow these four steps to quality control your data. 


1. Ensure you have complete data

It sounds so straightforward and simple but many businesses get this wrong. Having the exact data you need fully completed means you can make the most of your customer database and use it to identify key trends and customer behaviour.

Complete data makes it easier to develop more targeted marketing strategies that narrow in on your target audience. This will ensure your sales process is more personalised and thereby more effective. For example, in the B2B sphere, knowing key demographic information such as company name, location, job position, industry and number of company staff allows you to be more targeted in your marketing efforts, so make sure these fields are always filled in by your customers.


2. Detailed data to enable personalisation

By now we all know that communicating in a personal way with our existing customers means that we can generate more sales. These people have already bought something from you so they are more inclined to listen to (or read) what you have say.

It’s a great way to stay in touch and ensure they keep your brand top of mind whilst you are informing them about new products, offers and events. This way, you can be specific about the message you’re sending, increasing the chances of further sales.

The key to doing this the right way though is to have as much detailed data as possible about your audience and using it in the right way.

If you're in the B2B space, using info about their company size by staff and industry demonstrates to your customer that you know them as you send relevant info their way.


3. Dubious duplicates

Having duplicate data of customers is an easy mistake to make but can cause headaches as it gives you a false view of your customers’ information and purchase behaviour. This can happen when you capture data from different points in your business.

It may even confuse revenue figures as it doesn’t give you a true picture of what they’ve bought. To properly clean a database from duplicates can take time and the best way to do this is to perform a de-dupe. To save you the time and hassle, you should have a data expert do this for you.


4. Phone connectivity

As digital as we’ve become these days, email contact just isn’t enough sometimes. This depends on your business model, so you will know if the best way to communicate with your key decision makers is simply to just call them.

Ensure that each of your customer contacts has a working phone number. Again—simple but effective.



Six common Q&As on collecting customer data

If you’re new to collecting customer data, it’s not always clear what data you need to collect and what the rules around data collection are. Below are six common data collection questions we frequently encounter.


1. "What type of data should I collect and why?"

The answer depends on what you plan to use the data for.

For surveying purposes

If you’re looking to survey your customers or clients for customer satisfaction and feedback, you’ll want the basics, such as:

  • Name
  • Phone number
  • Email addresses
  • Age
  • Gender


For CRM purposes

For customer data that will go in your CRM system for and sales and marketing purposes, you’ll also want to collect more complete data such as:

  • Transactions
  • Interactions with your business – e.g. activity on your website and interactions on your social media platforms.
  • Customer/client’s unique identifier in your CRM system
  • Tenure (start and end dates)
  • Frequency of activity
  • Targeting reactions – e.g. what email promotions they’ve engaged with.
  • Product/service mix they use
  • Demographic data points (age, region, country etc.)
  • Company (if applicable)
  • Role title (if applicable).

By having this data in your CRM systems, you can harness interaction data between your business and your customer/client to better understand your consumers’ needs and behaviours.

Having this data means you can communicate in a personal way, such as following up with customers on their order and delivering personalised product recommendations based on past purchases.  


2. "How do I get sales reps to enter complete CRM data?"

Incomplete information is a sign of poor data quality. The best method to get your sales reps to fill in the data properly is two-fold:

a. Make the important fields mandatory in your online forms.

You’ll want to have the following mandatory fields:

  • Name
  • Email address
  • Phone number
  • Address
  • Date
  • Type of customer (e.g. their persona, by product/service purchased or the interaction they’ve had with your business).

b. Engage your sales team in the development of these forms and integrate the accuracy and competition of these forms with the team’s KPIs.


3. "How do I avoid duplicate lead and contact entries?"

Prevent duplicate records by comparing the email address of the contacts. These will be unique for each individual.

In many CRM systems today, you now have an option to check whether the newly added record already exists in the account.


4. "How do I deal with inaccurate records?"

For this scenario, instead of deleting one record and potentially losing important data from one that isn't present in the other, merge the two contacts into a single entity instead.


5. "How much data should I collect?"

The biggest mistake when collecting data is to ask for too much at one time and overwhelming your customers. To good news is that persona data is not the be all and end all. While it is important, a business on the journey of collecting data should have a large focus on their organic data collection capabilities. This organic data stems from customers’ interactions with your platforms, such as your website, app, social media, email, online and in-store transaction software and loyalty programmes. This is the data that will provide you with the most long-term payback.

Important note: You should also consider the ethics of how you use your customers’ data. While it is perfectly legal to collect sensitive data if customers agree to it, you should consider how you're leveraging that information and whether it is appropriate.

For example, lower tier lenders often harness data to target lower income households for hire purchase loans. This is legally fine to do (it is their target market) but ethically speaking, giving those families access to high interest loans to buy expensive consumer goods is not helping them. Instead, it puts them under more financial stress.

In short: with great data comes great responsibility.


6. “What data laws do I need to comply with?”

In New Zealand there are three main data protection and privacy laws to be aware of:

Important note: while New Zealand is obviously not part of the European Union or California, if your business is collecting data from citizens of these regions, these laws still apply to you, particularly if you have international customers.

We will go into more detail on these regulations below.



Data collection and the law: the need-to-knows

If you’re collecting customer data, you need to know the regulations around it.


European General Data Protection Regulation (GDPR)

GDPR is a regulation that requires businesses to protect the personal data and privacy of EU citizens for transactions that occur within EU member states.

Briefly, the GDPR gives European Union citizens the right to:

  • Be informed that their data is being collected and how it will be used.
  • Access the data you have on them.
  • Rectify incorrect data you have on them.
  • Have their data removed from your database.
  • Restrict processing, i.e. they have the right to limit how your business uses their data.
  • Data portability. i.e. to move, copy or transfer personal data easily from one IT environment to another in a safe and secure way, without affecting its usability.[4]
  • Right to object to the processing of their personal data, e.g. the right to stop you using their data for direct marketing.[5]

Even if you’re not in the European Union, the law still applies if you’re collecting customer data on EU citizen, for example, if they provide their name and email to sign up to your newsletter.

Learn more here


The California Consumer Privacy Act (CCPA)

Much like the GDPR, if your business collects any customer data from Californian residents, then this law applies to you.

The key differences between the two laws center around:

  • Legal basis—in the GDPR, you can only process personal data when there is a legal ground for it (i.e. consent has been given). The CCPA does not list the legal grounds that businesses can collect and sell personal information. Instead, consumers can opt-out or ask businesses to not sell on their data.
  • The right not to be discriminated against for exercising your personal data rights—this is explicitly included in the CCPA, but not in the GDPR. Although, the GDPR has some provisions based on the same principle.
  • Monetary penalties—while both the CCPA and GDPR have monetary penalties for not adhering to the law, the amounts differ greatly.

Civil remedies for individuals—under the GDPR, an action can be brought for any violation of the law, while under the CCPA actions can only be brought for failure of security measures and in the context of data breaches.[6]

In short, if you follow the GDPR, you’ll be compliant with most of the CCPA. However, it is critical that you are aware of and check your compliance in the sections where the two regulations differ. We recommend using this guide to ensure you are compliant with both laws.

Learn more here


Privacy Act 2020

The Privacy Act of 2020 came into effect on 1 December 2020 and was designed to replace the 1993 act of the same name. Intended to align New Zealand’s privacy laws with the likes of the GDPR, the 2020 Act applies to both local and overseas organisations that conduct business in New Zealand.

The key updates include:

  • Privacy breaches notifications—organisations must notify affected individuals and the Privacy Commission if a privacy breach occurs.
  • The ability for the Privacy Commission to issue compliance notices to businesses who do not comply with the Act.
  • An individual’s right to access their personal information
  • Regulations on how personal information may be sent overseas.
  • New criminal offences. You cannot mislead an agency to access someone else’s personal information, e.g. phishing to acquire someone’s personal information.

You can read about these changes in more detail here.


Data science in action: three scenarios

Below are three data science scenarios where businesses use data science and data modelling to identify and connect with their most valuable customers.


Churn prevention modelling: retail sector

The problem:

Retailer X has over 1.5 million loyalty card members nationwide. However, an annual churn rate of approximately 60 per cent is causing concern, particularly with new competitors entering the market. Retailer X wants to look at ways to reduce their churn rate nationwide in a cost-effective manner. To do this, they supply 2 years’ worth of data on a subset of their loyalty members to a data science team.

The solution:

Using the data available, a number of models are developed and tested to determine which is the most effective at identifying customers on the brink of churning. The winning model provides high-level insights on attribute effectiveness by way of churn indicators (such as how often they shopped in-store). This goes on to form the basis of a low-cost targeting campaign designed to improve yields of approximately five per cent per annum.


The final model and strategy proves to:

  • Prioritise members who were at greatest risk of churning.
  • Prioritise members who had high potential customer value
  • 4 per cent of retained members in the target population would have churned if not for the targeting. This is almost twice as effective as a scattergun retargeting method.
  • Annual revenue is forecast to lift five per cent with the implementation of a simple email campaign driven by the model.


Customer segmentation: finance sector

The problem:

Bank X is currently facing the challenge of now knowing who their customers are or how they behave—and who they are losing or making money on. There is uncertainty about what needs to be done to target customer groups based on their needs and wants. They supply data scientists with four years of data for each of their customers along with product ownership data and some demographic information.

The solution:

Segmentation is utilised to better understand the behaviours, profiles and profitability of Bank X’s customer groups, resulting in four segments being identified within the customer base of Bank X. The segment profiles provide Bank X with possible leverage points to inform their business and marketing strategy going forward.


The customer segmentation proves to:

  • Successfully partition Bank X’s customer base into four distinct segments.
  • Each segment is closely linked to life and financial maturity stages.
  • Monetary values are assigned to each customer within each segment and therefore each life stage.
  • With this information, communications strategies are developed to accelerate the life stages of those in less profitable segments.


Cross-sell and churn modelling: finance sector


With the segmentation setting the groundwork for target marketing, the sales and marketing departments of Bank X now want know what products should be promoted to particular customers. Using the same dataset as the customer segmentation analysis, a second analysis is done to steer product offering and also to shed more light on attrition issues Bank X is suffering from.


A number of models are developed to power cross-selling strategies and each customer is scored on their likelihood to accept each possible product offered to them. A churn model is also developed to minimise attrition while the product offering models help maximise customer potential.


The propensity models proves to:

  • Enable Bank X to throttle the level of pressure they put on their customers to acquire new products.
  • Leverage known behaviours to convert customers to more profitable segments.
  • Reduce guesswork by knowing who Bank X can present cross-sell offers to.
  • Increase customer satisfaction thanks to its more tailored offerings. Customers are now only offered what they need and/or are ready for.
  • Reduce churn rates via cross-selling products customers are most likely to purchase next, according to the data analysis.


Take your data science learning a step further with our free transactional data guide: Transform with Transactional Data.

New call-to-action


[1] Sisense, State of BI & Analytics Report 2020: Special COVID-19 Edition, 2020.

[2] Louis Columbus, How COVID-19 Is Changing Analytics Spending, Forbes, May 10, 2020.

[3] Ibid.

[4] Information Commissioner’s Office, 2020. Right to Data Portability | ICO,

[5] Information Commissioner’s Office, 2020. Right to Object | ICO,

[6] Data Guidance and the Future of Privacy Forum, 2019. Comparing privacy laws: GDPR v. CCPA.