Cure to Bad Data - Speech
- jaquasianicole
- May 22, 2023
- 2 min read
Bad data is like poison that plagues a dataset and is estimated to cost companies $3.1 trillion dollars a year (Redman, 2016). But what is the cure? My name is Jaquasia Donald and I am an aspiring Data Analyst. Humankind is currently in the information age, characterized by our economy being centered on information technology. Technology is accelerating rapidly and information is a tool to aid in decision making. Data can be leveraged to bring organizations success in a competitive market. However, if it is in a poor format data cannot be extracted, analyzed, or interpreted, transforming it into a silo.
Initiating and maintaining high data quality is the key to understanding the target audience, improving operational capabilities, and increasing productivity. Data quality refers to how well a dataset meets the criterion for accuracy, completeness, validity, consistency, uniqueness, timelines and fitness for a purpose (What is data quality?, n.d.). These are the Six Data Quality dimensions that measure data quality levels, recognize data errors, and assess usability. High data quality should be accurate, void of missing data, and reliable. It should also be formatted, up to date, and relevant (Suer, 2021). Ensuring these standards are met will allow organizations to optimize predictions, promote profits, and reveal valuable insights.
After setting up specific metrics, assessing data quality involves analyzing the impact of the health of the dataset. Having protocols in place will encourage all departments across the hierarchy of the business to value data quality. Supporting clean data will avoid downstream cleansing to follow practices of upstream prevention (Liliendahl, 2010). It is a group effort that can boost results and save time for all departments in a company. After establishing a metric and an enterprise-wide buy-in, teams should implement data quality dashboards to monitor the health of data assets. This will provide an overview of the Key Performance Indicators (KPI) to facilitate what metrics should be improved. Another effort that proves effective in ensuring data quality is scheduling data quality audits regularly. This process examines the organization's data at a deeper level considering the six data quality dimensions mentioned. This gives the team a chance to fix mistakes and make improvements by filling gaps and removing duplicate records.
Data cleaning is a crucial step in data analytics and the health of the data directly affects the ability to reach relevant insights. Data Cleaning is the beginning step in curing bad data. Thank you all for giving me your attention!
References
Liliendahl, H. G. (2010, September 25). Top 5 reasons for downstream cleansing. Liliendahl
on Data Quality. https://liliendahl.com/2010/09/25/top-5-reasons-for-downstream-cleansing/
Redman, T. C. (2016, September 22). Bad Data costs the U.S. $3 trillion per year. Harvard
Business Review. https://hbr.org/2016/09/bad-data-costs-the-u-s-3-trillion-per-year
Suer, M. (2021, August 5). What is data quality and why is it important?. Alation.
https://www.alation.com/blog/what-is-data-quality-why-is-it-important/
What is data quality?. IBM. (n.d.). https://www.ibm.com/topics/data-quality

Comments