Pandas is a significant tool for data science, especially in data processing unit. This blog aims to introduce some foundations of Pandas.
DataFrame
Create DataFrame
Create a dataframe from CSV files:
1 | import pandas as pd |
From dictionary:
>>> data = {'weekday': ['sum', 'mon'], 'city': ['Austin', 'Dallas']}
>>> users = pd.DataFrame(data)
>>> users = pd.DataFrame(data)
>>> users
city weekday
0 Austin sum
1 Dallas mon
Basic Operation
- Use df.head() to show first 5 rows of dataframe.
- Use df.tail() to show last 5 rows of dataframe.
- type(df) to show dataframe type.
- df.columns returns the names of columns,
- df.info() returns useful information to analyze.
- df.index returns index of df.
Series
The columns themselves are also structures called series.
>>> city = users['city']
>>> type(city)
<class 'pandas.core.series.Series'>
column.index returns series index
1 | city.index |
Convert to Numpy
1 | array = df.values |
Assign names to columns
1 | list = ['year', 'month', 'day'] |
Writing Files
Writing CSV:
1 | out_csv = 'data.csv' |
Writing EXCEL:
1
2out_xlsx = 'data.xlsx'
data.to_excel(out_xlsx)