Assignment 7

zacharysoo01·January 23, 2025

Description

This dataset contains data from the Wikipedia page, "List of Municipalities in Texas", extracted by using BeautifulSoup.

Summary


I extracted data from Wikipedia and performed Data Engineering techniques such as ETL on it. Through ETL(Extract, Transform, Load), I first extracted the data through APIs and modules such as BeautifulSoup, requests, and pandas, and saved the raw data as a CSV file. I then transformed the data by using functions from the aforementioned APIs and modules, made duplication checks, and cleaned the data by creating my own functions to clean the data. Once the data was clean, I loaded the clean data to a CSV file. After that, I performed Data Analysis on the clean data with the use of modules such as matplotlib(pyplot and pylab), SciPy, NumPy, InteractiveShell, and pandas and created Visualizations in the form of histograms, boxplots, and bar graphs from the clean data.

Link to Github Repository: https://github.com/ZacharySoo01/I310D_Assignment7

Basic info
Author
zacharysoo01
Shared withEveryone
CreatedFebruary 16, 2023
Size61 KB
LicenseN/A
Dictionary1 tables
Original URLGo to check
Publishedimage
public datasets
Advanced features
Insights
Based on the provided information of the dataset, would it be possible to provide some relevant inquiries?
How many rows does the document contain?
What columns are included in the document?

Assignment 7

zacharysoo01·January 23, 2025

Description

This dataset contains data from the Wikipedia page, "List of Municipalities in Texas", extracted by using BeautifulSoup.

Summary


I extracted data from Wikipedia and performed Data Engineering techniques such as ETL on it. Through ETL(Extract, Transform, Load), I first extracted the data through APIs and modules such as BeautifulSoup, requests, and pandas, and saved the raw data as a CSV file. I then transformed the data by using functions from the aforementioned APIs and modules, made duplication checks, and cleaned the data by creating my own functions to clean the data. Once the data was clean, I loaded the clean data to a CSV file. After that, I performed Data Analysis on the clean data with the use of modules such as matplotlib(pyplot and pylab), SciPy, NumPy, InteractiveShell, and pandas and created Visualizations in the form of histograms, boxplots, and bar graphs from the clean data.

Link to Github Repository: https://github.com/ZacharySoo01/I310D_Assignment7

Scan the QR code to ask more questions about the paper
© 2025 Powerdrill. All rights reserved.