Pandas is a significant tool for data science, especially in data processing unit. This blog aims to introduce some foundations of Pandas.
DataFrame
Create DataFrame
Create a dataframe from CSV files:
1  | import pandas as pd  | 
From dictionary:
>>> data = {'weekday': ['sum', 'mon'], 'city': ['Austin', 'Dallas']}
>>> users = pd.DataFrame(data)
>>> users = pd.DataFrame(data)
>>> users
     city weekday
0  Austin     sum
1  Dallas     mon
Basic Operation
- Use df.head() to show first 5 rows of dataframe.
 - Use df.tail() to show last 5 rows of dataframe.
 - type(df) to show dataframe type.
 - df.columns returns the names of columns,
 - df.info() returns useful information to analyze.
 - df.index returns index of df.
 
Series
The columns themselves are also structures called series.
>>> city = users['city']
>>> type(city)
<class 'pandas.core.series.Series'>
column.index returns series index
1  | city.index  | 
Convert to Numpy
1  | array = df.values  | 
Assign names to columns
1  | list = ['year', 'month', 'day']  | 
Writing Files
Writing CSV:
1  | out_csv = 'data.csv'  | 
Writing EXCEL:
1
2out_xlsx = 'data.xlsx'
data.to_excel(out_xlsx)
