top of page

Data Mining Project

Introduction & Business Problem

Bubba Gump Shrimp Company is currently a regional food retailer operating through restaurants and other retail channels. After initially experiencing a few years of success in growth and increased revenue, their sales have begun to decline. Bubba Gump is interested in taking advantage of the data they have collected on their customers to reach more success. This company has accumulated data relating to its customers that can be used to better understand the problem and a possible solution.  The data provided by Bubba Gump Shrimp Company is a sample of responses from a given survey. The samples consist of 500 selected customers who have purchased at least one product from any channel; web store, restaurant, or third party. I will analyze the survey responses to reveal the natural clusters within Bubba Gump's customer behavior. There will be visualizations included to describe the patterns found. 

 

Taking the time to define the business problem is an important initial step in data analytics to attain an accurate understanding of the current state. The business problem this company is experiencing and hoping to resolve is a decrease in sales and revenue. This will require understanding the data and identifying patterns that exist inherently and historically.

PLAN FOR ANALYSIS

​

Analytic Method

To reach this goal of solving the business problem it would be best to investigate customer behavior to discover areas in need of improvement. This process is referred to as customer analysis and analyzes the customer experience interacting with their services and products. Analytic methods will be used to provide a detailed procedure for maintaining data quality. Ensuring data integrity is important for results because it will generate business decisions. Using Analytic methods will produce new ideas when the key measures are observed. To gain insights into our target market it will require analyzing customer web store visits, web store purchase, age, marital status, and more. Following analytic methods will enhance the decision-making process, identify new opportunities, and improve operations.

​

This can be paired with qualitative and quantitative research methods to observe and measure the available information. Utilizing and analyzing quantitative data requires statistical analysis to represent the sample accurately. While thematic analysis allows qualitative data to provide context in identifying patterns to understand behavior. By discerning which analysis strategies and research methods to consider, it will enhance the decision-making process, identify new opportunities, as well as improve operational inefficiencies

Analysis Tools

Data mining is the process of sorting large datasets to reveal patterns that can help solve business problems through data analysis (Stedman & Hughes, 2021). There are data mining tools available as software platforms to aid in the process. To solve the business problem Bubba Gump Shrimp Company is experiencing, I will use JMP. JMP is a data analysis software that allows users to leverage data with statistics, discover meaningful insights, and share discoveries. This platform has statistical methods for analytics and features that help during data visualization. JMP will analyze and identify relationships in the Bubba Gump’s customer data and produce graphics that represent the data. Another benefit of using this data mining software is for regression and cluster analysis. A great analysis tool within JMP is linear and logistic regression, as well as the k-means and hierarchical cluster analysis. 

​

The software JMP has data visualization tools to tell the story of the patterns found in data. There are interactive dashboards and web visualizations that represent the trends, outliers, and patterns for the audience to understand. Considering we are analyzing the behavior of the customers to extract insights to increase sales there are certain data visualization tools that will better fit this circumstance. A chart would be a great option to display two axes and compare the influence both variables have. A line chart could compare web store visits with web store spending. Data Visualizations are a great way to express the findings of the relationship between variables. In this project to hypothesize the reason for a decrease in sales, visualizing the customers behavior can provide context. These visualizations will represent complex relationships to promote informed decision making. 

Abstract Leaves

Research Question 

 

A research question guides the project to address an issue and reach a relevant conclusion. These questions are dynamic and can be changed as the project is developed to refine the topic of interest. 

​​

 

What is the demographic (age, marital status, income) of customers that have the highest web store spending? 

 

What areas in the demographic of customers can be used to predict the likelihood of a consumer purchasing Bubba Gump Shrimp Company’s products?

 

What is the number of web store visits that produce the highest results for spending amount?

 

These questions will be a guiding framework during our research process to ensure cohesion.​

 

Research Measurement

It is important to have a clear understanding of the goals in a project to determine when the efforts are sufficient. Measuring success in a research project means establishing the criterion for the team to evaluate the progress. This status of success should include objective and subjective metrics. This requires reviewing the project's scope, evaluating the specifications and reviewing client satisfaction markers (Eby, 2022). For the research questions provided, the research can cease once there are patterns that can answer the questions appropriately. Some questions involve statistics, to predict the likelihood of consumers purchasing our products requires the connection be statistically significant. A marker for success in this project is identifying the cluster group of customers that spend the most on the web store. Another is identifying what number of visits will produce the most in web store sales. Research questions can be effectively resolved is there is ample data to support the claims and connections in the answers

​

Follow-up Questions

Questions after most of the project is completed has the purpose of digging deeper to explore more findings. These follow up questions are used to reveal in depth context or clarify responses. Follow up questions can also be used in a customer survey to receive more information that is not directly imperative to reaching the goal. These questions may even provide clarity or context that could pivot the direction of the research. While this tactic could also be applied to the team to encourage uncovering further insights that may be beneficial.
 

What are the connections between the loyal customers who have purchased products or services more than three times?

What products do customers purchase alongside Bubba Gump Shrimp Company’s items through third party channels?

 

What is the demographic of customers who make repeat purchases through one channel or multiple purchases across multiple channels?

 

Research and support
Considering the nature of the project, the book A Practical Guide to Data Mining for Business and Industry by Andrea Ahlemeyer-Stubbe can be a supporting document. This book focuses on data mining statistics and will serve as a guiding framework when considering applications and methods. It covers data preparation techniques, analytics and methods, as well as intra-customer analysis. This will be a beneficial source because it also includes applied elements for each topic and even case studies to model after. Case studies are a great guide for this project to identify phenomena because the researcher has little control over events. These guides can help direct which measures should be taken in this project. However it falls short when expanding on insights later in the project because it is difficult to replicate results. Depending too heavily on case studies can cause researcher bias and may lack accuracy in data (Sutton-Tyrrell, 1991). 

ANALYSIS

​

Analysis Organization

It is important to create an organized plan to outline the course of action before enacting the analysis process. This allows the investigator to predict and prepare for possible obstacles that can prove effective when issues arise. The initial step in a stepwise approach is to hypothesize, this helps suggest experimental directions to take that can uncover helpful data points. The analysis of this project reflects an organized approach because it begins with identifying the business problem. After truly understanding the issue at hand and the possible contributors, I can estimate what the results of analyzing the data might look like. Although speculating how to reach a worthy consensus with which tools will help provide a thorough project, it is not guaranteed to be the best case. Aspects such as which analysis tool best fits the data set is beyond control. This takes trial and error to find the best fit for the data and the conclusion you are trying to reach.

​

Sources of Error

Error evaluation provides a thorough look into the quality of data and can highlight the effects on producing accurate insights that are used as knowledge for business strategies. One source of error I identified with this data set is that by nature surveys from consumers may involve measurement error. This can be respondents misunderstanding and misinterpreting questions, this could be through vague and unclear questions (Lewis & Sauro, 2021). These measurement errors can lead to response bias and variability. This poses an issue when analyzing the data with certain methods that require standardization. Another moment of experiencing error while testing models was choosing the right data value and type for certain methods. This simply requires reorganizing the data variables and assessing whether it can garner the insights appropriately. 

Meaningful Patterns

While analyzing the Bubba Gump data utilizing the clustering analysis method, I found meaningful patterns that will prove valuable when answering the business problem. The group that had the most web store spending; $358.86, had an average web store visits of 2 (see graph 3). This cluster group was noted to have a mean age of 42, are currently married, and had an income of $55,450. The group that had the second highest web store spending at $142.47, has similar characteristics however the mean age was 38 and the income was at $62,200. As we analyze the clusters, these insights will be helpful during customer analysis. Which is the research process of understanding customers to reveal meaningful insights. By uncovering our most loyal customer’s demographic and behavior we can then direct our marketing strategies more effectively. After looking at what we have so far, more questions may arise. Such as, “what products do these high yielding customers buy?” I believe looking into which products are selling and which are not will further guide this project to fulfilling the goal of resolving declining sales/profits. I would also like to cross examine the link between web store visits and restaurant or third party visits. This may provide insight to see which channel brings the most profit or experiences the most visits.

Clustering Analysis 

Graph 2
Graph 3

Alternative analytic methods
To consider what alternative analytic methods may be suitable to employ, it requires understanding what insights you are expecting. One of the leading questions I mentioned wondering what other items our customers have bought, could be answered using association analysis. This method includes finding frequent sets with high confidence in order to find relationships in the dataset. Overall, this company would benefit from prescriptive analytics, this is the research of insights to help guide decision making. As a new company, who doesn’t have much historical data, and who can't afford to take too many risk; looking toward the future will aid in business strategy. Clustering analysis techniques can be combined to further the insights. 

Display and interpretation
The results of regression analysis and clustering analysis have created patterns that represent the behavior of Bubba Gump’s customers. This behavior helps provide context when attempting to market to the consumer audience. With the business problem of declining sales, I began looking at web store spending and web store visits for answers. With clustering analysis, the results revealed customers that visit the site twice led to the highest web store spending.  It was also found that the customers with the highest spending mean were aged from 37-42 years old (see graph 3). This result will allow us to better market our consumers by changing marketing strategies to reach the target audience. With these conclusions, it brought on more questions that could further develop this project towards a well defined plan. 


The graph uses K-means clustering to show that the group with the highest web store spending of $458.33 visited the site once (see graph 7). It shows that the second highest spending at $405.6 had 2 visits. Finally, the third highest spending at $369.75 had 3 visits. However, after averaging the clusters by web store visits; customers who visited three times spent the highest at an average of $295.01. Those who visited the web store two times spent an average of $225.23 and those who visited only once spent an average of $212.60. All of this was found when the cluster was set to the optimal CCC which is NCluster 13. The graph depicts that web store spending increases when the customer purchases more at our affiliated restaurants. This is a valuable insight that the more our consumers are exposed to our product then it is more likely they will purchase through an alternate channel. The linear fit regression model shows that as web visits increase the web store spending increases (see graph 8).
 

Graph 6
Graph 7
Graph 8
Graph 10
 

Next steps
The process of research iterates through steps, rarely following a linear or predictable process. The possibilities in scope for this project depend on the following inquiries and questions after analyzing the information. I recommend delving deeper into what products are being purchased alongside certain items and which items are purchased most frequently. Association analysis identifies how frequent items occur to uncover unconscious consumer buying patterns (Lutkevich, 2023). The next step in uncovering this in this case is collecting data on what products customers are purchasing with Bubba Gump’s items for third party channels. Which can bring in the question of what items are most and least frequently ordered on the web store or in a restaurant. A new hypothesis could be “Customers who purchased salad also purchased our brand Bubba Gump Shrimp.” With this investigation you can place Bubba Gump Shrimp Company’s products next to certain items in the supermarket or advertise near associated items in store or online. This could increase sales by taking advantage of a natural behavior within customers.

References

Eby, K. (2022, February 28). How to measure project success. Smartsheet. https://www.smartsheet.com/content/measuring-

       project-success#:~:text=a%20complete%20failure.%E2%80%9D-,How%20Do%20You%20Measure%20the

       %20Success%20of%20a%20Project%3F,reviewing%20client%20and%20internal%20satisfaction
Lewis, J., & Sauro, J. (2021, April 27). Four types of potential survey errors. MeasuringU. https://measuringu.com/four-types-of-

       survey-err
Lutkevich, B. (2023, January 30). What are association rules in Data Mining (Association rule mining)?. Business Analytics.

       https://www.techtarget.com/searchbusinessanalytics/definition/association-rules-in-data-

       mining#:~:text=Association%20rule%20mining%2C%20at%20a,and%20a%20consequent%20(then) 
Moss, A. (2021, June 24). What are survey validity and reliability?.CloudResearch. 
https://www.cloudresearch.com

       /resources/blog/survey-validity-and-reliability/#:~:text=What%20is%20Survey%20Validity%3F,what%20it%20claims

       %20to%20show%3F 
Provensal, E. B. (2011, August 31). Modeling type. Modeling type - JMP Resources - Harvard Wiki.

       https://wiki.harvard.edu/confluence/display/hsphhcm757/Modeling+type 
Stedman, C., & Hughes, A. (2021, September 7). What is data mining?. Business Analytics.

       https://www.techtarget.com/searchbusinessanalytics/definition/data-mining 
Sutton-Tyrrell, K. (1991). Assessing bias in case-control studies. proper selection of cases and controls. Stroke, 22(7), 938–942.

       https://doi.org/10.1161/01.str.22.7.938 
 

Contact
Information

Department of Mathematics
Science Center

Fresno, Texas 77545

Greenville, South Carolina 29607

  • GitHub
  • LinkedIn

Thanks for submitting!

©2023 by Jaquasia Nicole Donald.

bottom of page