Data Science for Business

Name: Data Science for Business
Rating: 4.6 (782 reviews)
Author: Foster Provost and Tom Fawcett

Foster Provost and Tom Fawcett

52 min

9 Key Points

4.6 Rate

What's inside?

Dive into the essentials of data science, learn about data mining and analytical thinking, and discover how to leverage these tools to make informed business decisions.

You'll learn

1. What's data science and why should businesses care?

2. Using data smarts in business situations.

3. Tools for digging into data and how to use them.

4. Making business choices based on data, not gut feelings.

5. Solving tricky business issues with data.

6. Using data science to get ahead in business.

Key points

01Shifting to Data-Analytic Thinking

Embracing data-analytic thinking requires a fundamental shift in how we approach business problems, moving away from gut feelings and toward systematic, evidence-based discovery. Far too often, business leaders view data science as a mysterious black box managed exclusively by IT departments or highly specialized mathematicians. However, the true power of data science is only unlocked when business managers and data scientists speak the same language. To succeed in the modern economy, you do not need to know how to write complex code or build neural networks from scratch, but you absolutely must understand the fundamental principles of how data can be leveraged to extract useful knowledge. This is the essence of data-analytic thinking: viewing business problems through the lens of data potential. Consider a famous, real-world scenario from Walmart. When Hurricane Frances was approaching the eastern coast of the United States, Walmart executives did not just rely on conventional wisdom to stock their shelves. Conventional wisdom would suggest stocking up on flashlights, batteries, and bottled water. While they certainly did that, they also turned to their massive historical database of customer transactions to see what happened during previous hurricanes. By applying data-analytic thinking to mountains of transaction records, they discovered a highly unusual, non-obvious pattern: ahead of a hurricane, sales of strawberry Pop-Tarts increased up to seven times their normal rate, and the top-selling item was pre-packaged beer. Armed with this predictive insight, Walmart rapidly dispatched trucks filled with strawberry Pop-Tarts and beer to the stores in the hurricane's path, placing them near the front entrances for easy access. The stores sold out completely, maximizing profit while perfectly meeting immediate customer demand. This is data science in action. It is not about simply summarizing the past; it is about predicting the future to make better, more profitable decisions today. To formalize this process, the industry relies on a structured framework known as CRISP-DM, which stands for the Cross-Industry Standard Process for Data Mining. This framework breaks down the data science lifecycle into sensible, highly logical steps. Let us walk through how this lifecycle operates in a real business environment. The first and arguably most critical phase is Business Understanding. Before a single line of code is written or a spreadsheet is opened, the team must deeply understand the business problem they are trying to solve. What is the ultimate goal? Are we trying to reduce customer churn, increase the response rate to a marketing campaign, or detect fraudulent credit card transactions? Without a clear, well-defined business objective, data science projects often devolve into aimless fishing expeditions that waste time and resources. Once the business goal is firmly established, we move to Data Understanding. This involves looking closely at the raw materials available to us. What data do we currently collect? Is it accurate? Are there missing values? If we want to predict customer churn, do we have historical records of customers who have left in the past? Understanding the strengths and limitations of your data is vital because a predictive model can only be as good as the information fed into it. Following this is the Data Preparation phase, which is notoriously the most time-consuming part of any data project. Real-world data is messy. It is filled with typos, inconsistent formats, and anomalies. Preparing the data involves cleaning it, transforming variables, and merging different datasets together into a neat, tabular format that an algorithm can actually process. Think of this as prepping the ingredients before cooking a complex meal; if you skip the prep work, the final dish will be a disaster. Next comes the Modeling phase, which is where the actual machine learning algorithms are applied. This is the glamorous part of data science that gets the most attention. Here, data scientists select the appropriate mathematical techniques—such as decision trees, regression models, or neural networks—to find the hidden patterns in the prepared data. The goal is to build a model that captures the underlying relationship between the variables and the target outcome. However, building a model is not the end of the journey. The Evaluation phase steps in to ask a crucial question: Does this model actually solve the business problem we defined in step one? A model might be highly accurate in a mathematical sense, but if it is too slow to run in real-time or too complex to explain to regulators, it might be entirely useless for the business. Evaluation ensures that the technical success of the model translates into tangible business value. Finally, we reach Deployment. This is where the model is integrated into the company's actual operations. It might be a background system that automatically flags a potentially fraudulent transaction in milliseconds, or a weekly report that highlights the top 100 customers most at risk of canceling their subscriptions. Deployment brings the model to life, allowing the business to reap the rewards of their analytical efforts. By understanding this structured lifecycle, managers can guide data projects effectively, ensuring that technical efforts remain strictly aligned with strategic business goals.

02Identifying Informative Attributes

Finding the right variables to predict future outcomes is much like searching for a needle in a massive, ever-expanding digital haystack. When trying to solve a predictive business problem, we are usually looking for specific attributes—or features—that have a strong relationship with the target variable we want to predict. For instance, if a telecommunications company wants to predict which customers are likely to cancel their contracts a problem widely known as customer churn, they have hundreds of attributes to consider. They know the customer's age, location, monthly bill amount, data usage, number of calls to customer service, and the type of phone they use. Out of all these attributes, which ones actually contain valuable information about the customer's likelihood to leave? To answer this, we must introduce a core concept of data science: Information Gain. Information gain measures how much a given attribute reduces our uncertainty about the target variable. To understand this, we first need to grasp the concept of entropy, which is simply a measure of disorder, uncertainty, or impurity in a set of data. Let us break this down with a highly relatable example. Suppose you have a jar filled with 100 marbles. If 50 marbles are red and 50 are blue, the jar is completely mixed. The uncertainty—or entropy—is at its absolute maximum because if you reach in blindfolded, you have no idea what color you will pull out. On the other hand, if all 100 marbles are red, the entropy is zero. There is no uncertainty; you know exactly what you will get. In predictive modeling, we want to find attributes that split our mixed, high-entropy dataset into neat, low-entropy subsets. Returning to our telecommunications company, imagine our total customer base is a mixed jar: some customers will stay, and some will churn. We want to find a question we can ask that separates the loyal customers from the churning customers as cleanly as possible. What if we split the customers based on their favorite color? We might find that the group of customers who like blue has a 10% churn rate, and the group who likes red also has a 10% churn rate. Splitting the data by favorite color did not reduce our uncertainty at all. The entropy remains high. Therefore, "favorite color" provides zero information gain. It is a useless attribute for predicting churn. Now, what if we split the customers based on whether their contract expires in less than 30 days? We might discover that the group with expiring contracts has an 80% churn rate, while the group with long-term contracts has only a 2% churn rate. This single question has dramatically separated the churners from the non-churners. The resulting subsets are much purer than the original mixed group. Because this attribute dramatically reduced our uncertainty, we say it provides a massive amount of information gain. This elegant, logical concept is the fundamental engine behind one of the most popular and intuitive machine learning models used in business today: Decision Trees. Building a decision tree is essentially just having an algorithm play a highly efficient game of "Twenty Questions" with your data. When playing Twenty Questions, you do not start by asking, "Is the object a 2018 Honda Civic?" That is far too specific and unlikely to be true. Instead, you start with broad questions that split the possibilities in half, such as, "Is it an animal?" or "Is it bigger than a breadbox?" Decision tree algorithms do exactly the same thing using mathematical calculations. The algorithm scans all the available attributes in the dataset and calculates the information gain for every possible split. It finds the single attribute that provides the highest information gain and makes that the first branching point the root of the tree. Using our telecom example, the first split might be: "Is the contract expiring in less than 30 days?" The algorithm then takes the resulting subsets and repeats the process. For the customers with expiring contracts, it asks: out of the remaining attributes, which one gives us the most information gain now? It might find that "number of calls to customer service in the last week" is the next best predictor. It creates another branch. The tree continues to grow, splitting the data into increasingly pure segments until it reaches a point where further splits do not provide any meaningful information gain. The extraordinary beauty of a decision tree lies in its profound transparency. Unlike complex neural networks that operate as inscrutable black boxes, a decision tree provides a clear, highly readable roadmap of human logic. A business manager can look at a printed decision tree and trace the exact path that leads to a high probability of churn. "Ah, I see. If a customer's contract is expiring soon, AND they have called customer service more than three times this month, AND their monthly bill is over $100, they have a 92% chance of leaving." This transparency is invaluable for decision-making. It not only tells the business who is likely to churn but also offers deep clues as to why. Armed with this specific knowledge, the marketing team can craft highly targeted interventions. Instead of offering a generic 10% discount to everyone, they can proactively call the specific high-risk segment, apologize for the recent customer service issues, and offer a customized, premium retention package. By systematically identifying informative attributes, businesses can cut through the noise of big data and focus entirely on the signals that drive real-world profitability.

Continue reading with LeapAhead app

Full summary is waiting for you in the app

03Navigating the Trap of Overfitting

04Finding Similarity and Hidden Clusters

05Making Optimal Business Decisions

06Visualizing and Evaluating Model Performance

07Mining Text for Predictive Insights

08Conclusion

About Foster Provost and Tom Fawcett

Foster Provost is a Professor at NYU Stern School of Business, specializing in data science. Tom Fawcett is a data science consultant with over 20 years of experience in machine learning and data mining, known for his work on ROC analysis. Both are respected authors in the field.