Every year, insights from business analytics and machine learning (ML) have a bigger and bigger effect on how organizations solve business problems with data. However, the insights in a business dashboard or predictions from an ML model are only as valuable as the quality of the data behind them.
Building high-quality data sets is a multi-step process known as data wrangling which includes cleaning, mapping, and transforming data into a workable format.
These activities commonly involve the following:
- Merging multiple data sources into a single data set
- Identifying gaps in the data (for example, empty cells in a table) and either filling or deleting them
- Deleting data that’s either unnecessary or irrelevant to the project at hand, such as removing duplicates
- Identifying extreme outliers in the data
This ebook describes how analytics and data science teams can maximize efficiency by leveraging a cloud data platform to unify and govern both data wrangling and feature engineering activities.