韩剧伦理年轻的嫂子6在线观看,青青草欧美,99视频精品全部在线

pandas為我們提供了多種切片方法，而要是不太了解這些方法，就會經常容易混淆。下面舉例對這些切片方法進行說明。

數據介紹

先隨機生成一組數據：

									In [5]: rnd_1 = [random.randrange(1,20) for x in xrange(1000)]

									  ...: rnd_2 = [random.randrange(1,20) for x in xrange(1000)]

									  ...: rnd_3 = [random.randrange(1,20) for x in xrange(1000)]

									  ...: fecha = pd.date_range('2012-4-10', '2015-1-4')

									  ...: 

									  ...: data = pd.DataFrame({'fecha':fecha, 'rnd_1': rnd_1, 'rnd_2': rnd_2, 'rnd_3': rnd_3})

									In [6]: data.describe()

									Out[6]: 

									       rnd_1    rnd_2    rnd_3

									count 1000.000000 1000.000000 1000.000000

									mean   9.946000   9.825000   9.894000

									std    5.553911   5.559432   5.423484

									min    1.000000   1.000000   1.000000

									25%    5.000000   5.000000   5.000000

									50%   10.000000  10.000000  10.000000

									75%   15.000000  15.000000  14.000000

									max   19.000000  19.000000  19.000000

[]切片方法

使用方括號能夠對DataFrame進行切片，有點類似于python的列表切片。按照索引能夠實現行選擇或列選擇或區塊選擇。

									# 行選擇

									In [7]: data[1:5]

									Out[7]: 

									    fecha rnd_1 rnd_2 rnd_3

									1 2012-04-11   1   16   3

									2 2012-04-12   7   6   1

									3 2012-04-13   2   16   7

									4 2012-04-14   4   17   7

									# 列選擇

									In [10]: data[['rnd_1', 'rnd_3']]

									Out[10]: 

									   rnd_1 rnd_3

									0    8   12

									1    1   3

									2    7   1

									3    2   7

									4    4   7

									5    12   8

									6    2   12

									7    9   8

									8    13   17

									9    4   7

									10   14   14

									11   19   16

									12    2   12

									13   15   18

									14   13   18

									15   13   11

									16   17   7

									17   14   10

									18    9   6

									19   11   15

									20   16   13

									21   18   9

									22    1   18

									23    4   3

									24    6   11

									25    2   13

									26    7   17

									27   11   8

									28    3   12

									29    4   2

									..   ...  ...

									970   8   14

									971   19   5

									972   13   2

									973   8   10

									974   8   17

									975   6   16

									976   3   2

									977   12   6

									978   12   10

									979   15   13

									980   8   4

									981   17   3

									982   1   17

									983   11   5

									984   7   7

									985   13   14

									986   6   19

									987   13   9

									988   3   15

									989   19   6

									990   7   11

									991   11   7

									992   19   12

									993   2   15

									994   10   4

									995   14   13

									996   12   11

									997   11   15

									998   17   14

									999   3   8

									[1000 rows x 2 columns]

									# 區塊選擇

									In [11]: data[:7][['rnd_1', 'rnd_2']]

									Out[11]: 

									  rnd_1 rnd_2

									0   8   17

									1   1   16

									2   7   6

									3   2   16

									4   4   17

									5   12   19

									6   2   7

不過對于多列選擇，不能像行選擇時一樣使用1：5這樣的方法來選擇。

									In [12]: data[['rnd_1':'rnd_3']]

									 File "<ipython-input-13-6291b6a83eb0>", line 1

									  data[['rnd_1':'rnd_3']]

									         ^

									SyntaxError: invalid syntax

loc

loc可以讓你按照索引來進行行列選擇。

									In [13]: data.loc[1:5]

									Out[13]: 

									    fecha rnd_1 rnd_2 rnd_3

									1 2012-04-11   1   16   3

									2 2012-04-12   7   6   1

									3 2012-04-13   2   16   7

									4 2012-04-14   4   17   7

									5 2012-04-15   12   19   8

這里需要注意的是，loc與第一種方法不同之處在于會把第5行也選擇進去，而第一種方法只會選擇到第4行為止。

									data.loc[2:4, ['rnd_2', 'fecha']]

									Out[14]: 

									  rnd_2   fecha

									2   6 2012-04-12

									3   16 2012-04-13

									4   17 2012-04-14

loc能夠選擇在兩個特定日期之間的數據，需要注意的是這兩個日期必須都要在索引中。

									In [15]: data_fecha = data.set_index('fecha')

									  ...: data_fecha.head()

									Out[15]: 

									      rnd_1 rnd_2 rnd_3

									fecha             

									2012-04-10   8   17   12

									2012-04-11   1   16   3

									2012-04-12   7   6   1

									2012-04-13   2   16   7

									2012-04-14   4   17   7

									In [16]: # 生成兩個特定日期

									  ...: fecha_1 = dt.datetime(2013, 4, 14)

									  ...: fecha_2 = dt.datetime(2013, 4, 18)

									  ...: 

									  ...: # 生成切片數據

									  ...: data_fecha.loc[fecha_1: fecha_2]

									Out[16]: 

									      rnd_1 rnd_2 rnd_3

									fecha             

									2013-04-14   17   10   5

									2013-04-15   14   4   9

									2013-04-16   1   2   18

									2013-04-17   9   15   1

									2013-04-18   16   7   17

更新：如果沒有特殊需求，強烈建議使用loc而盡量少使用[]，因為loc在對DataFrame進行重新賦值操作時會避免chained indexing問題，使用[]時編譯器很可能會給出SettingWithCopy的警告。

具體可以參見官方文檔：http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

iloc

如果說loc是按照索引（index）的值來選取的話，那么iloc就是按照索引的位置來進行選取。iloc不關心索引的具體值是多少，只關心位置是多少，所以使用iloc時方括號中只能使用數值。

									# 行選擇

									In [17]: data_fecha[10: 15]

									Out[17]: 

									      rnd_1 rnd_2 rnd_3

									fecha             

									2012-04-20   14   6   14

									2012-04-21   19   14   16

									2012-04-22   2   6   12

									2012-04-23   15   8   18

									2012-04-24   13   8   18

									# 列選擇

									In [18]: data_fecha.iloc[:,[1,2]].head()

									Out[18]: 

									      rnd_2 rnd_3

									fecha          

									2012-04-10   17   12

									2012-04-11   16   3

									2012-04-12   6   1

									2012-04-13   16   7

									2012-04-14   17   7

									# 切片選擇

									In [19]: data_fecha.iloc[[1,12,34],[0,2]]

									Out[19]: 

									      rnd_1 rnd_3

									fecha          

									2012-04-11   1   3

									2012-04-22   2   12

									2012-05-14   17   10

at的使用方法與loc類似，但是比loc有更快的訪問數據的速度，而且只能訪問單個元素，不能訪問多個元素。

									In [20]: timeit data_fecha.at[fecha_1,'rnd_1']

									The slowest run took 3783.11 times longer than the fastest. This could mean that an intermediate result is being cached.

									100000 loops, best of 3: 11.3 µs per loop

									In [21]: timeit data_fecha.loc[fecha_1,'rnd_1']

									The slowest run took 121.24 times longer than the fastest. This could mean that an intermediate result is being cached.

									10000 loops, best of 3: 192 µs per loop

									In [22]: data_fecha.at[fecha_1,'rnd_1']

									Out[22]: 17

iat

iat對于iloc的關系就像at對于loc的關系，是一種更快的基于索引位置的選擇方法，同at一樣只能訪問單個元素。

									In [23]: data_fecha.iat[1,0]

									Out[23]: 1

									In [24]: timeit data_fecha.iat[1,0]

									The slowest run took 6.23 times longer than the fastest. This could mean that an intermediate result is being cached.

									100000 loops, best of 3: 8.77 µs per loop

									In [25]: timeit data_fecha.iloc[1,0]

									10000 loops, best of 3: 158 µs per loop

以上說過的幾種方法都要求查詢的秩在索引中，或者位置不超過長度范圍，而ix允許你得到不在DataFrame索引中的數據。

									In [28]: date_1 = dt.datetime(2013, 1, 10, 8, 30)

									  ...: date_2 = dt.datetime(2013, 1, 13, 4, 20)

									  ...: 

									  ...: # 生成切片數據

									  ...: data_fecha.ix[date_1: date_2]

									Out[28]: 

									      rnd_1 rnd_2 rnd_3

									fecha             

									2013-01-11   19   17   19

									2013-01-12   10   9   17

									2013-01-13   15   3   10