# DataFrame
> DataFrame是一個表格型的數據結構,它含有一組有序的列,每列可以是不同的值類型(數值,字符串,布爾值)。DataFrame既有行索引也有列索引,它可以被看做由`Series`組成的字典。
* * *
## 由等長列表或`NumPy`數組組成的字典 構建`DataFrame`
~~~
from pandas import Series, DataFrame
import pandas as pd
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
'year': [2000, 2001, 2002, 2001, 2002],
'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
frame = DataFrame(data)
print(frame)
//
pop state year
0 1.5 Ohio 2000
1 1.7 Ohio 2001
2 3.6 Ohio 2002
3 2.4 Nevada 2001
4 2.9 Nevada 2002
~~~
如果指定了序列列,則`DataFrame`的列就會按照指定順序進行排列:
~~~
from pandas import Series, DataFrame
import pandas as pd
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
'year': [2000, 2001, 2002, 2001, 2002],
'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
frame = DataFrame(data, columns=['year', 'state', 'pop'])
print(frame)
//
year state pop
0 2000 Ohio 1.5
1 2001 Ohio 1.7
2 2002 Ohio 3.6
3 2001 Nevada 2.4
4 2002 Nevada 2.9
~~~
* * *
## 通過類似字典標記的方式或屬性的方式,可以將`DataFrame`的列獲取為一個`Series`
~~~
from pandas import Series, DataFrame
import pandas as pd
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
'year': [2000, 2001, 2002, 2001, 2002],
'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
frame2 = DataFrame(data, columns=['year', 'state', 'pop', 'debt'],
index=['one', 'two', 'three', 'four', 'five'])
print(frame2['state'])
//
one Ohio
two Ohio
three Ohio
four Nevada
five Nevada
Name: state, dtype: object
print(frame2.year)
//
one 2000
two 2001
three 2002
four 2001
five 2002
Name: year, dtype: int64
~~~
返回的`Series`擁有原`DataFrame`相同的索引。
## 行也可以通過位置或名稱的方式進行獲取,比如使用索引字段`ix`
~~~
from pandas import Series, DataFrame
import pandas as pd
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
'year': [2000, 2001, 2002, 2001, 2002],
'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
frame2 = DataFrame(data, columns=['year', 'state', 'pop', 'debt'],
index=['one', 'two', 'three', 'four', 'five'])
print(frame2.ix['three'])
//
year 2002
state Ohio
pop 3.6
debt NaN
Name: three, dtype: object
~~~
## 列可以通過賦值的方式進行修改,例如給那個空的debt列賦上一個標量值或一組值
~~~
from pandas import Series, DataFrame
import pandas as pd
import numpy as np
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
'year': [2000, 2001, 2002, 2001, 2002],
'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
frame2 = DataFrame(data, columns=['year', 'state', 'pop', 'debt'],
index=['one', 'two', 'three', 'four', 'five'])
frame2['debt'] = 16.5
print(frame2)
//
year state pop debt
one 2000 Ohio 1.5 16.5
two 2001 Ohio 1.7 16.5
three 2002 Ohio 3.6 16.5
four 2001 Nevada 2.4 16.5
five 2002 Nevada 2.9 16.5
frame2['debt'] = np.arange(5.)
print(frame2)
//
year state pop debt
one 2000 Ohio 1.5 0.0
two 2001 Ohio 1.7 1.0
three 2002 Ohio 3.6 2.0
four 2001 Nevada 2.4 3.0
five 2002 Nevada 2.9 4.0
~~~
## 將列表或數值賦值給某個列時,其長度必須根`DataFrame`的長度相匹配。如果賦值的是一個`Series`,就會精確匹配`DataFrame`的索引,所有的空位都會被填上缺失值
~~~
from pandas import Series, DataFrame
import pandas as pd
import numpy as np
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
'year': [2000, 2001, 2002, 2001, 2002],
'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
frame2 = DataFrame(data, columns=['year', 'state', 'pop', 'debt'],
index=['one', 'two', 'three', 'four', 'five'])
val = Series([-1.2, -1.5, -1.7], index=['two', 'four', 'five'])
frame2['debt'] = val
print(frame2)
//
year state pop debt
one 2000 Ohio 1.5 NaN
two 2001 Ohio 1.7 -1.2
three 2002 Ohio 3.6 NaN
four 2001 Nevada 2.4 -1.5
five 2002 Nevada 2.9 -1.7
~~~
## 為不存在的列賦值會創建出一個新列
~~~
from pandas import Series, DataFrame
import pandas as pd
import numpy as np
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
'year': [2000, 2001, 2002, 2001, 2002],
'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
frame2 = DataFrame(data, columns=['year', 'state', 'pop', 'debt'],
index=['one', 'two', 'three', 'four', 'five'])
val = Series([-1.2, -1.5, -1.7], index=['two', 'four', 'five'])
frame2['debt'] = val
frame2['eastern'] = frame2['state'] == 'Ohio'
print(frame2)
//
year state pop debt eastern
one 2000 Ohio 1.5 NaN True
two 2001 Ohio 1.7 -1.2 True
three 2002 Ohio 3.6 NaN True
four 2001 Nevada 2.4 -1.5 False
five 2002 Nevada 2.9 -1.7 False
~~~