Fork me on GitHub

Pandas Foundation

Pandas is a significant tool for data science, especially in data processing unit. This blog aims to introduce some foundations of Pandas.

DataFrame

Create DataFrame

Create a dataframe from CSV files:

1
2
>>> import pandas as pd
>>> users = pd.readcsv('xxx.csv', index_col = 0)

From dictionary:

>>> data = {'weekday': ['sum', 'mon'], 'city': ['Austin', 'Dallas']}
>>> users = pd.DataFrame(data)
>>> users = pd.DataFrame(data)
>>> users
     city weekday
0  Austin     sum
1  Dallas     mon

Basic Operation

  • Use df.head() to show first 5 rows of dataframe.
  • Use df.tail() to show last 5 rows of dataframe.
  • type(df) to show dataframe type.
  • df.columns returns the names of columns,
  • df.info() returns useful information to analyze.
  • df.index returns index of df.

Series

The columns themselves are also structures called series.

>>> city = users['city']
>>> type(city)
<class 'pandas.core.series.Series'>

column.index returns series index

1
2
>>> city.index
RangeIndex(start=0, stop=2, step=1)

Convert to Numpy

1
array = df.values

Assign names to columns

1
2
list = ['year', 'month', 'day']
df.columns = list

Writing Files

Writing CSV:

1
2
out_csv = 'data.csv'
data.to_csv(out_csv)

Writing EXCEL:

1
2
out_xlsx = 'data.xlsx'
data.to_excel(out_xlsx)