Sean Forgatch
May 24, 20191 min
Updated: May 31, 2019
Problem: Need to profile a certain object to understand certain metrics in preparation for Data Warehousing, Engineering, or Science.
Solution: We will utilize the pandas-profiling package in a Python notebook.
Step 1: Import pandas-profiling package
Step 2: Create Pandas Dataframe over source File and Run Report
Step 3: Review Profile
pandas-profiling location
To import the library, all we need to do is type in the pypi package name shown in the screenshot below:
*Note a Pandas dataframe is different from a regular dataframe and must be created using the Pandas library
The results are far superior to other data profiling libraries. However, it is quite difficult to get the raw data out. There is a method which will give you a the data, but you will spend quite a lot of time getting that data into a usable format.