Assignment 7
Description
This dataset contains data from the Wikipedia page, "List of Municipalities in Texas", extracted by using BeautifulSoup.
Summary
I extracted data from Wikipedia and performed Data Engineering techniques such as ETL on it. Through ETL(Extract, Transform, Load), I first extracted the data through APIs and modules such as BeautifulSoup, requests, and pandas, and saved the raw data as a CSV file. I then transformed the data by using functions from the aforementioned APIs and modules, made duplication checks, and cleaned the data by creating my own functions to clean the data. Once the data was clean, I loaded the clean data to a CSV file. After that, I performed Data Analysis on the clean data with the use of modules such as matplotlib(pyplot and pylab), SciPy, NumPy, InteractiveShell, and pandas and created Visualizations in the form of histograms, boxplots, and bar graphs from the clean data.
Link to Github Repository: https://github.com/ZacharySoo01/I310D_Assignment7