自学内容网 自学内容网

《Python数据分析:活用pandas库》学习笔记Day1:Panda DataFrame基础知识

Python数据分析:活用pandas库
在这里插入图片描述
Python强大易用,是数据处理和数据分析利器,而众多库的加持令其如虎添翼。Pandas就是其中一个非常流行的开源库,它可以确保数据的准确性,将数据可视化,还可以高效地操作大型数据集。借助它,Python可以快速地自动化和执行几乎任何数据分析任务。
本书细致讲解了Pandas的基础知识和常见用法,通过简单的实例展示了如何使用Pandas解决复杂的现实问题,以及如何利用matplotlib、seaborn、statsmodels和sklearn等库辅助进行Python数据分析,涵盖了数据处理、数据可视化、数据建模等内容。此外,本书还简单介绍了Python数据分析生态系统。
本书是Python数据分析入门书,每个概念都通过简单实例来阐述,便于读者理解与上手。具体内容包括:Python及Pandas基础知识,加载和查看数据集,Pandas的DataFrame对象和Series对象,使用matplotlib、seaborn和Pandas提供的绘图方法为探索性数据分析作图,连接与合并数据集,处理缺失数据,清理数据,转换数据类型,处理字符串,应用函数,分组操作,拟合及评估模型,正则化方法与聚类技术等。

第一章 Pandas DataFrame基础知识

1.1简介

pandas主要是用来进行数据处理/数据分析的第三方库,其中不仅包含了数据处理、甚至还有统计分析等相关计算,其内部封装了numpy的相关组件。

pandas的主要数据类型有:series(一维结构)、dataframe(二维结构)、pannel(三维结构)

1.2加载数据集

从官网下载数据集:https://www.ituring.com.cn/book/2557 ,下载后解压缩,本文的python源代码都放置在notebooks文件夹中,本节内容将针对data文件夹下的gapminder.tsv进行操作
在这里插入图片描述
在这里插入图片描述

import pandas as pd

df=pd.read_csv("..\data\gapminder.tsv",sep="\t")
print(df.shape)
(1704, 6)
print(df.dtypes)
country       object
continent     object
year           int64
lifeExp      float64
pop            int64
gdpPercap    float64
dtype: object
print(df.head(6))
       country continent  year  lifeExp       pop   gdpPercap
0  Afghanistan      Asia  1952   28.801   8425333  779.445314
1  Afghanistan      Asia  1957   30.332   9240934  820.853030
2  Afghanistan      Asia  1962   31.997  10267083  853.100710
3  Afghanistan      Asia  1967   34.020  11537966  836.197138
4  Afghanistan      Asia  1972   36.088  13079460  739.981106
5  Afghanistan      Asia  1977   38.438  14880372  786.113360
print(df.tail(6))
       country continent  year  lifeExp       pop   gdpPercap
1698  Zimbabwe    Africa  1982   60.363   7636524  788.855041
1699  Zimbabwe    Africa  1987   62.351   9216418  706.157306
1700  Zimbabwe    Africa  1992   60.377  10704340  693.420786
1701  Zimbabwe    Africa  1997   46.809  11404948  792.449960
1702  Zimbabwe    Africa  2002   39.989  11926563  672.038623
1703  Zimbabwe    Africa  2007   43.487  12311143  469.709298
print(df.sample(6))
       country continent  year  lifeExp       pop     gdpPercap
545      Gabon    Africa  1977   52.790    706367  21745.573280
1341    Serbia    Europe  1997   72.232  10336594   7914.320304
1494     Syria      Asia  1982   64.590   9410494   3761.837715
1389  Slovenia    Europe  1997   75.130   2011612  17161.107350
274       Chad    Africa  2002   50.525   8835739   1156.181860
462      Egypt    Africa  1982   56.006  45681811   3503.729636
print(df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1704 entries, 0 to 1703
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   country    1704 non-null   object 
 1   continent  1704 non-null   object 
 2   year       1704 non-null   int64  
 3   lifeExp    1704 non-null   float64
 4   pop        1704 non-null   int64  
 5   gdpPercap  1704 non-null   float64
dtypes: float64(2), int64(2), object(2)
memory usage: 66.6+ KB
None

##1.3查看列、行、单元格

###1.3.1获取列子集:通过名称、位置、范围来指定

print(df["country"].head())
0    Afghanistan
1    Afghanistan
2    Afghanistan
3    Afghanistan
4    Afghanistan
Name: country, dtype: object
print(df["country"])
0       Afghanistan
1       Afghanistan
2       Afghanistan
3       Afghanistan
4       Afghanistan
           ...     
1699       Zimbabwe
1700       Zimbabwe
1701       Zimbabwe
1702       Zimbabwe
1703       Zimbabwe
Name: country, Length: 1704, dtype: object
print(df[["country","continent","year"]])
          country continent  year
0     Afghanistan      Asia  1952
1     Afghanistan      Asia  1957
2     Afghanistan      Asia  1962
3     Afghanistan      Asia  1967
4     Afghanistan      Asia  1972
...           ...       ...   ...
1699     Zimbabwe    Africa  1987
1700     Zimbabwe    Africa  1992
1701     Zimbabwe    Africa  1997
1702     Zimbabwe    Africa  2002
1703     Zimbabwe    Africa  2007

[1704 rows x 3 columns]
print(df.loc[:,["country","continent","year"]])
          country continent  year
0     Afghanistan      Asia  1952
1     Afghanistan      Asia  1957
2     Afghanistan      Asia  1962
3     Afghanistan      Asia  1967
4     Afghanistan      Asia  1972
...           ...       ...   ...
1699     Zimbabwe    Africa  1987
1700     Zimbabwe    Africa  1992
1701     Zimbabwe    Africa  1997
1702     Zimbabwe    Africa  2002
1703     Zimbabwe    Africa  2007

[1704 rows x 3 columns]
print(df.iloc[:,[0,1,-4]])
          country continent  year
0     Afghanistan      Asia  1952
1     Afghanistan      Asia  1957
2     Afghanistan      Asia  1962
3     Afghanistan      Asia  1967
4     Afghanistan      Asia  1972
...           ...       ...   ...
1699     Zimbabwe    Africa  1987
1700     Zimbabwe    Africa  1992
1701     Zimbabwe    Africa  1997
1702     Zimbabwe    Africa  2002
1703     Zimbabwe    Africa  2007

[1704 rows x 3 columns]

###1.3.2获取行子集:通过行名称loc、行索引iloc来指定

print(df.loc[0])  #第一行
country      Afghanistan
continent           Asia
year                1952
lifeExp           28.801
pop              8425333
gdpPercap     779.445314
Name: 0, dtype: object
print(df.loc[99])  #第100行
country      Bangladesh
continent          Asia
year               1967
lifeExp          43.453
pop            62821884
gdpPercap    721.186086
Name: 99, dtype: object
print(df.iloc[1703])  #最后一行
country        Zimbabwe
continent        Africa
year               2007
lifeExp          43.487
pop            12311143
gdpPercap    469.709298
Name: 1703, dtype: object
print(df.iloc[-1])  #最后一行
country        Zimbabwe
continent        Africa
year               2007
lifeExp          43.487
pop            12311143
gdpPercap    469.709298
Name: 1703, dtype: object
print(df.loc[-1])  #最后一行,错误
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

f:\zk\py\jupyter\lib\site-packages\pandas\core\indexes\range.py in get_loc(self, key, method, tolerance)
    350                 try:
--> 351                     return self._range.index(new_key)
    352                 except ValueError as err:


ValueError: -1 is not in range


The above exception was the direct cause of the following exception:


KeyError                                  Traceback (most recent call last)

<ipython-input-37-1c8cb0fb85f1> in <module>
----> 1 print(df.loc[-1])  #最后一行


f:\zk\py\jupyter\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
    893 
    894             maybe_callable = com.apply_if_callable(key, self.obj)
--> 895             return self._getitem_axis(maybe_callable, axis=axis)
    896 
    897     def _is_scalar_access(self, key: Tuple):


f:\zk\py\jupyter\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
   1122         # fall thru to straight lookup
   1123         self._validate_key(key, axis)
-> 1124         return self._get_label(key, axis=axis)
   1125 
   1126     def _get_slice_axis(self, slice_obj: slice, axis: int):


f:\zk\py\jupyter\lib\site-packages\pandas\core\indexing.py in _get_label(self, label, axis)
   1071     def _get_label(self, label, axis: int):
   1072         # GH#5667 this will fail if the label is not present in the axis.
-> 1073         return self.obj.xs(label, axis=axis)
   1074 
   1075     def _handle_lowerdim_multi_index_axis0(self, tup: Tuple):


f:\zk\py\jupyter\lib\site-packages\pandas\core\generic.py in xs(self, key, axis, level, drop_level)
   3737                 raise TypeError(f"Expected label or tuple of labels, got {key}") from e
   3738         else:
-> 3739             loc = index.get_loc(key)
   3740 
   3741             if isinstance(loc, np.ndarray):


f:\zk\py\jupyter\lib\site-packages\pandas\core\indexes\range.py in get_loc(self, key, method, tolerance)
    351                     return self._range.index(new_key)
    352                 except ValueError as err:
--> 353                     raise KeyError(key) from err
    354             raise KeyError(key)
    355         return super().get_loc(key, method=method, tolerance=tolerance)


KeyError: -1
print(df.tail(1),"\n")  #
print(df.iloc[-1])
       country continent  year  lifeExp       pop   gdpPercap
1703  Zimbabwe    Africa  2007   43.487  12311143  469.709298 

country        Zimbabwe
continent        Africa
year               2007
lifeExp          43.487
pop            12311143
gdpPercap    469.709298
Name: 1703, dtype: object
print(type(df.tail(1)),type(df.iloc[-1]))
<class 'pandas.core.frame.DataFrame'> <class 'pandas.core.series.Series'>

###1.3.3混合 df.loc[[行],[列]] df.iloc[[行],[列]] 可以在loc内的索引值可以用切片代替a🅱️c

print(df.loc[:,["year","pop"]],df.iloc[:,[2,4]])
      year       pop
0     1952   8425333
1     1957   9240934
2     1962  10267083
3     1967  11537966
4     1972  13079460
...    ...       ...
1699  1987   9216418
1700  1992  10704340
1701  1997  11404948
1702  2002  11926563
1703  2007  12311143

[1704 rows x 2 columns]       year       pop
0     1952   8425333
1     1957   9240934
2     1962  10267083
3     1967  11537966
4     1972  13079460
...    ...       ...
1699  1987   9216418
1700  1992  10704340
1701  1997  11404948
1702  2002  11926563
1703  2007  12311143

[1704 rows x 2 columns]    
print(df.loc[:,["lifeExp","pop"]],"\n",df.iloc[:,list(range(3,5))])
      lifeExp       pop
0      28.801   8425333
1      30.332   9240934
2      31.997  10267083
3      34.020  11537966
4      36.088  13079460
...       ...       ...
1699   62.351   9216418
1700   60.377  10704340
1701   46.809  11404948
1702   39.989  11926563
1703   43.487  12311143

[1704 rows x 2 columns] 
       lifeExp       pop
0      28.801   8425333
1      30.332   9240934
2      31.997  10267083
3      34.020  11537966
4      36.088  13079460
...       ...       ...
1699   62.351   9216418
1700   60.377  10704340
1701   46.809  11404948
1702   39.989  11926563
1703   43.487  12311143

[1704 rows x 2 columns]
print(df.iloc[1:10:2,:2])
       country continent
1  Afghanistan      Asia
3  Afghanistan      Asia
5  Afghanistan      Asia
7  Afghanistan      Asia
9  Afghanistan      Asia
print(df.iloc[1:10:2,::])
       country continent  year  lifeExp       pop   gdpPercap
1  Afghanistan      Asia  1957   30.332   9240934  820.853030
3  Afghanistan      Asia  1967   34.020  11537966  836.197138
5  Afghanistan      Asia  1977   38.438  14880372  786.113360
7  Afghanistan      Asia  1987   40.822  13867957  852.395945
9  Afghanistan      Asia  1997   41.763  22227415  635.341351

##1.4 分组和聚合方式

print(df.groupby("year"))
print(df.groupby("year")["lifeExp"])
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0DB33F28>
<pandas.core.groupby.generic.SeriesGroupBy object at 0x0DB33D30>
for a,b in df.groupby("year"):
    print (a,"\n",b)
1952 
                  country continent  year  lifeExp       pop    gdpPercap
0            Afghanistan      Asia  1952   28.801   8425333   779.445314
12               Albania    Europe  1952   55.230   1282697  1601.056136
24               Algeria    Africa  1952   43.077   9279525  2449.008185
36                Angola    Africa  1952   30.015   4232095  3520.610273
48             Argentina  Americas  1952   62.485  17876956  5911.315053
...                  ...       ...   ...      ...       ...          ...
1644             Vietnam      Asia  1952   40.412  26246839   605.066492
1656  West Bank and Gaza      Asia  1952   43.160   1030585  1515.592329
1668         Yemen, Rep.      Asia  1952   32.548   4963829   781.717576
1680              Zambia    Africa  1952   42.038   2672000  1147.388831
1692            Zimbabwe    Africa  1952   48.451   3080907   406.884115

[142 rows x 6 columns]
1957 
                  country continent  year  lifeExp       pop    gdpPercap
1            Afghanistan      Asia  1957   30.332   9240934   820.853030
13               Albania    Europe  1957   59.280   1476505  1942.284244
25               Algeria    Africa  1957   45.685  10270856  3013.976023
37                Angola    Africa  1957   31.999   4561361  3827.940465
49             Argentina  Americas  1957   64.399  19610538  6856.856212
...                  ...       ...   ...      ...       ...          ...
1645             Vietnam      Asia  1957   42.887  28998543   676.285448
1657  West Bank and Gaza      Asia  1957   45.671   1070439  1827.067742
1669         Yemen, Rep.      Asia  1957   33.970   5498090   804.830455
1681              Zambia    Africa  1957   44.077   3016000  1311.956766
1693            Zimbabwe    Africa  1957   50.469   3646340   518.764268

[142 rows x 6 columns]
1962 
                  country continent  year  lifeExp       pop    gdpPercap
2            Afghanistan      Asia  1962   31.997  10267083   853.100710
14               Albania    Europe  1962   64.820   1728137  2312.888958
26               Algeria    Africa  1962   48.303  11000948  2550.816880
38                Angola    Africa  1962   34.000   4826015  4269.276742
50             Argentina  Americas  1962   65.142  21283783  7133.166023
...                  ...       ...   ...      ...       ...          ...
1646             Vietnam      Asia  1962   45.363  33796140   772.049160
1658  West Bank and Gaza      Asia  1962   48.127   1133134  2198.956312
1670         Yemen, Rep.      Asia  1962   35.180   6120081   825.623201
1682              Zambia    Africa  1962   46.023   3421000  1452.725766
1694            Zimbabwe    Africa  1962   52.358   4277736   527.272182

[142 rows x 6 columns]
1967 
                  country continent  year  lifeExp       pop    gdpPercap
3            Afghanistan      Asia  1967   34.020  11537966   836.197138
15               Albania    Europe  1967   66.220   1984060  2760.196931
27               Algeria    Africa  1967   51.407  12760499  3246.991771
39                Angola    Africa  1967   35.985   5247469  5522.776375
51             Argentina  Americas  1967   65.634  22934225  8052.953021
...                  ...       ...   ...      ...       ...          ...
1647             Vietnam      Asia  1967   47.838  39463910   637.123289
1659  West Bank and Gaza      Asia  1967   51.631   1142636  2649.715007
1671         Yemen, Rep.      Asia  1967   36.984   6740785   862.442146
1683              Zambia    Africa  1967   47.768   3900000  1777.077318
1695            Zimbabwe    Africa  1967   53.995   4995432   569.795071

[142 rows x 6 columns]
1972 
                  country continent  year  lifeExp       pop    gdpPercap
4            Afghanistan      Asia  1972   36.088  13079460   739.981106
16               Albania    Europe  1972   67.690   2263554  3313.422188
28               Algeria    Africa  1972   54.518  14760787  4182.663766
40                Angola    Africa  1972   37.928   5894858  5473.288005
52             Argentina  Americas  1972   67.065  24779799  9443.038526
...                  ...       ...   ...      ...       ...          ...
1648             Vietnam      Asia  1972   50.254  44655014   699.501644
1660  West Bank and Gaza      Asia  1972   56.532   1089572  3133.409277
1672         Yemen, Rep.      Asia  1972   39.848   7407075  1265.047031
1684              Zambia    Africa  1972   50.107   4506497  1773.498265
1696            Zimbabwe    Africa  1972   55.635   5861135   799.362176

[142 rows x 6 columns]
1977 
                  country continent  year  lifeExp       pop     gdpPercap
5            Afghanistan      Asia  1977   38.438  14880372    786.113360
17               Albania    Europe  1977   68.930   2509048   3533.003910
29               Algeria    Africa  1977   58.014  17152804   4910.416756
41                Angola    Africa  1977   39.483   6162675   3008.647355
53             Argentina  Americas  1977   68.481  26983828  10079.026740
...                  ...       ...   ...      ...       ...           ...
1649             Vietnam      Asia  1977   55.764  50533506    713.537120
1661  West Bank and Gaza      Asia  1977   60.765   1261091   3682.831494
1673         Yemen, Rep.      Asia  1977   44.175   8403990   1829.765177
1685              Zambia    Africa  1977   51.386   5216550   1588.688299
1697            Zimbabwe    Africa  1977   57.674   6642107    685.587682

[142 rows x 6 columns]
1982 
                  country continent  year  lifeExp       pop    gdpPercap
6            Afghanistan      Asia  1982   39.854  12881816   978.011439
18               Albania    Europe  1982   70.420   2780097  3630.880722
30               Algeria    Africa  1982   61.368  20033753  5745.160213
42                Angola    Africa  1982   39.942   7016384  2756.953672
54             Argentina  Americas  1982   69.942  29341374  8997.897412
...                  ...       ...   ...      ...       ...          ...
1650             Vietnam      Asia  1982   58.816  56142181   707.235786
1662  West Bank and Gaza      Asia  1982   64.406   1425876  4336.032082
1674         Yemen, Rep.      Asia  1982   49.113   9657618  1977.557010
1686              Zambia    Africa  1982   51.821   6100407  1408.678565
1698            Zimbabwe    Africa  1982   60.363   7636524   788.855041

[142 rows x 6 columns]
1987 
                  country continent  year  lifeExp       pop    gdpPercap
7            Afghanistan      Asia  1987   40.822  13867957   852.395945
19               Albania    Europe  1987   72.000   3075321  3738.932735
31               Algeria    Africa  1987   65.799  23254956  5681.358539
43                Angola    Africa  1987   39.906   7874230  2430.208311
55             Argentina  Americas  1987   70.774  31620918  9139.671389
...                  ...       ...   ...      ...       ...          ...
1651             Vietnam      Asia  1987   62.820  62826491   820.799445
1663  West Bank and Gaza      Asia  1987   67.046   1691210  5107.197384
1675         Yemen, Rep.      Asia  1987   52.922  11219340  1971.741538
1687              Zambia    Africa  1987   50.821   7272406  1213.315116
1699            Zimbabwe    Africa  1987   62.351   9216418   706.157306

[142 rows x 6 columns]
1992 
                  country continent  year  lifeExp       pop    gdpPercap
8            Afghanistan      Asia  1992   41.674  16317921   649.341395
20               Albania    Europe  1992   71.581   3326498  2497.437901
32               Algeria    Africa  1992   67.744  26298373  5023.216647
44                Angola    Africa  1992   40.647   8735988  2627.845685
56             Argentina  Americas  1992   71.868  33958947  9308.418710
...                  ...       ...   ...      ...       ...          ...
1652             Vietnam      Asia  1992   67.662  69940728   989.023149
1664  West Bank and Gaza      Asia  1992   69.718   2104779  6017.654756
1676         Yemen, Rep.      Asia  1992   55.599  13367997  1879.496673
1688              Zambia    Africa  1992   46.100   8381163  1210.884633
1700            Zimbabwe    Africa  1992   60.377  10704340   693.420786

[142 rows x 6 columns]
1997 
                  country continent  year  lifeExp       pop     gdpPercap
9            Afghanistan      Asia  1997   41.763  22227415    635.341351
21               Albania    Europe  1997   72.950   3428038   3193.054604
33               Algeria    Africa  1997   69.152  29072015   4797.295051
45                Angola    Africa  1997   40.963   9875024   2277.140884
57             Argentina  Americas  1997   73.275  36203463  10967.281950
...                  ...       ...   ...      ...       ...           ...
1653             Vietnam      Asia  1997   70.672  76048996   1385.896769
1665  West Bank and Gaza      Asia  1997   71.096   2826046   7110.667619
1677         Yemen, Rep.      Asia  1997   58.020  15826497   2117.484526
1689              Zambia    Africa  1997   40.238   9417789   1071.353818
1701            Zimbabwe    Africa  1997   46.809  11404948    792.449960

[142 rows x 6 columns]
2002 
                  country continent  year  lifeExp       pop    gdpPercap
10           Afghanistan      Asia  2002   42.129  25268405   726.734055
22               Albania    Europe  2002   75.651   3508512  4604.211737
34               Algeria    Africa  2002   70.994  31287142  5288.040382
46                Angola    Africa  2002   41.003  10866106  2773.287312
58             Argentina  Americas  2002   74.340  38331121  8797.640716
...                  ...       ...   ...      ...       ...          ...
1654             Vietnam      Asia  2002   73.017  80908147  1764.456677
1666  West Bank and Gaza      Asia  2002   72.370   3389578  4515.487575
1678         Yemen, Rep.      Asia  2002   60.308  18701257  2234.820827
1690              Zambia    Africa  2002   39.193  10595811  1071.613938
1702            Zimbabwe    Africa  2002   39.989  11926563   672.038623

[142 rows x 6 columns]
2007 
                  country continent  year  lifeExp       pop     gdpPercap
11           Afghanistan      Asia  2007   43.828  31889923    974.580338
23               Albania    Europe  2007   76.423   3600523   5937.029526
35               Algeria    Africa  2007   72.301  33333216   6223.367465
47                Angola    Africa  2007   42.731  12420476   4797.231267
59             Argentina  Americas  2007   75.320  40301927  12779.379640
...                  ...       ...   ...      ...       ...           ...
1655             Vietnam      Asia  2007   74.249  85262356   2441.576404
1667  West Bank and Gaza      Asia  2007   73.422   4018332   3025.349798
1679         Yemen, Rep.      Asia  2007   62.698  22211743   2280.769906
1691              Zambia    Africa  2007   42.384  11746035   1271.211593
1703            Zimbabwe    Africa  2007   43.487  12311143    469.709298

[142 rows x 6 columns]
print(df.groupby("year")["lifeExp"].mean())
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0DB33388>
<pandas.core.groupby.generic.SeriesGroupBy object at 0x0DB33BE0>
year
1952    49.057620
1957    51.507401
1962    53.609249
1967    55.678290
1972    57.647386
1977    59.570157
1982    61.533197
1987    63.212613
1992    64.160338
1997    65.014676
2002    65.694923
2007    67.007423
Name: lifeExp, dtype: float64
print(df.groupby(["year","continent"])["lifeExp","gdpPercap"].mean())
                  lifeExp     gdpPercap
year continent                         
1952 Africa     39.135500   1252.572466
     Americas   53.279840   4079.062552
     Asia       46.314394   5195.484004
     Europe     64.408500   5661.057435
     Oceania    69.255000  10298.085650
1957 Africa     41.266346   1385.236062
     Americas   55.960280   4616.043733
     Asia       49.318544   5787.732940
     Europe     66.703067   6963.012816
     Oceania    70.295000  11598.522455
1962 Africa     43.319442   1598.078825
     Americas   58.398760   4901.541870
     Asia       51.563223   5729.369625
     Europe     68.539233   8365.486814
     Oceania    71.085000  12696.452430
1967 Africa     45.334538   2050.363801
     Americas   60.410920   5668.253496
     Asia       54.663640   5971.173374
     Europe     69.737600  10143.823757
     Oceania    71.310000  14495.021790
1972 Africa     47.450942   2339.615674
     Americas   62.394920   6491.334139
     Asia       57.319269   8187.468699
     Europe     70.775033  12479.575246
     Oceania    71.910000  16417.333380
1977 Africa     49.580423   2585.938508
     Americas   64.391560   7352.007126
     Asia       59.610556   7791.314020
     Europe     71.937767  14283.979110
     Oceania    72.855000  17283.957605
1982 Africa     51.592865   2481.592960
     Americas   66.228840   7506.737088
     Asia       62.617939   7434.135157
     Europe     72.806400  15617.896551
     Oceania    74.290000  18554.709840
1987 Africa     53.344788   2282.668991
     Americas   68.090720   7793.400261
     Asia       64.851182   7608.226508
     Europe     73.642167  17214.310727
     Oceania    75.320000  20448.040160
1992 Africa     53.629577   2281.810333
     Americas   69.568360   8044.934406
     Asia       66.537212   8639.690248
     Europe     74.440100  17061.568084
     Oceania    76.945000  20894.045885
1997 Africa     53.598269   2378.759555
     Americas   71.150480   8889.300863
     Asia       68.020515   9834.093295
     Europe     75.505167  19076.781802
     Oceania    78.190000  24024.175170
2002 Africa     53.325231   2599.385159
     Americas   72.422040   9287.677107
     Asia       69.233879  10174.090397
     Europe     76.700600  21711.732422
     Oceania    79.740000  26938.778040
2007 Africa     54.806038   3089.032605
     Americas   73.608120  11003.031625
     Asia       70.728485  12473.026870
     Europe     77.648600  25054.481636
     Oceania    80.719500  29810.188275


<ipython-input-91-8e3b916ee35a>:1: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.
  print(df.groupby(["year","continent"])["lifeExp","gdpPercap"].mean())
print(df.groupby(["year","continent"])["lifeExp","gdpPercap"].mean().reset_index())
    year continent    lifeExp     gdpPercap
0   1952    Africa  39.135500   1252.572466
1   1952  Americas  53.279840   4079.062552
2   1952      Asia  46.314394   5195.484004
3   1952    Europe  64.408500   5661.057435
4   1952   Oceania  69.255000  10298.085650
5   1957    Africa  41.266346   1385.236062
6   1957  Americas  55.960280   4616.043733
7   1957      Asia  49.318544   5787.732940
8   1957    Europe  66.703067   6963.012816
9   1957   Oceania  70.295000  11598.522455
10  1962    Africa  43.319442   1598.078825
11  1962  Americas  58.398760   4901.541870
12  1962      Asia  51.563223   5729.369625
13  1962    Europe  68.539233   8365.486814
14  1962   Oceania  71.085000  12696.452430
15  1967    Africa  45.334538   2050.363801
16  1967  Americas  60.410920   5668.253496
17  1967      Asia  54.663640   5971.173374
18  1967    Europe  69.737600  10143.823757
19  1967   Oceania  71.310000  14495.021790
20  1972    Africa  47.450942   2339.615674
21  1972  Americas  62.394920   6491.334139
22  1972      Asia  57.319269   8187.468699
23  1972    Europe  70.775033  12479.575246
24  1972   Oceania  71.910000  16417.333380
25  1977    Africa  49.580423   2585.938508
26  1977  Americas  64.391560   7352.007126
27  1977      Asia  59.610556   7791.314020
28  1977    Europe  71.937767  14283.979110
29  1977   Oceania  72.855000  17283.957605
30  1982    Africa  51.592865   2481.592960
31  1982  Americas  66.228840   7506.737088
32  1982      Asia  62.617939   7434.135157
33  1982    Europe  72.806400  15617.896551
34  1982   Oceania  74.290000  18554.709840
35  1987    Africa  53.344788   2282.668991
36  1987  Americas  68.090720   7793.400261
37  1987      Asia  64.851182   7608.226508
38  1987    Europe  73.642167  17214.310727
39  1987   Oceania  75.320000  20448.040160
40  1992    Africa  53.629577   2281.810333
41  1992  Americas  69.568360   8044.934406
42  1992      Asia  66.537212   8639.690248
43  1992    Europe  74.440100  17061.568084
44  1992   Oceania  76.945000  20894.045885
45  1997    Africa  53.598269   2378.759555
46  1997  Americas  71.150480   8889.300863
47  1997      Asia  68.020515   9834.093295
48  1997    Europe  75.505167  19076.781802
49  1997   Oceania  78.190000  24024.175170
50  2002    Africa  53.325231   2599.385159
51  2002  Americas  72.422040   9287.677107
52  2002      Asia  69.233879  10174.090397
53  2002    Europe  76.700600  21711.732422
54  2002   Oceania  79.740000  26938.778040
55  2007    Africa  54.806038   3089.032605
56  2007  Americas  73.608120  11003.031625
57  2007      Asia  70.728485  12473.026870
58  2007    Europe  77.648600  25054.481636
59  2007   Oceania  80.719500  29810.188275


<ipython-input-97-5ac49de90e81>:1: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.
  print(df.groupby(["year","continent"])["lifeExp","gdpPercap"].mean().reset_index())
print(df.groupby(["continent"])["country"].nunique())

continent
Africa      52
Americas    25
Asia        33
Europe      30
Oceania      2
Name: country, dtype: int64
print(df.groupby(["continent"])["country"].value_counts())
continent  country       
Africa     Algeria           12
           Angola            12
           Benin             12
           Botswana          12
           Burkina Faso      12
                             ..
Europe     Switzerland       12
           Turkey            12
           United Kingdom    12
Oceania    Australia         12
           New Zealand       12
Name: country, Length: 142, dtype: int64
print(df.groupby(["continent"])["country"].unique())
continent
Africa      [Algeria, Angola, Benin, Botswana, Burkina Fas...
Americas    [Argentina, Bolivia, Brazil, Canada, Chile, Co...
Asia        [Afghanistan, Bahrain, Bangladesh, Cambodia, C...
Europe      [Albania, Austria, Belgium, Bosnia and Herzego...
Oceania                              [Australia, New Zealand]
Name: country, dtype: object
import matplotlib
print(df.groupby("year")["lifeExp"].mean().plot())
---------------------------------------------------------------------------

ModuleNotFoundError                       Traceback (most recent call last)

<ipython-input-112-3be0cbb50fbe> in <module>
----> 1 import matplotlib
      2 print(df.groupby("year")["lifeExp"].mean().plot())


ModuleNotFoundError: No module named 'matplotlib'

原文地址:https://blog.csdn.net/xcntime/article/details/144290033

免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!