Numpy Pandas 和 Matplotlib 是数据分析领域着名的三大模块，今天我们来一起学习下这三剑客

### Numpy 数组

Numpy 是 Python 的一个第三方库，就是 Numerical Python 的意思。这是一个科学计算的的核心库，有着强大的多维数组对象

Numpy 数组是一个功能强大的 N 维数组对象，它以行和列的形式存在，我们可以通过 Python 列表来初始化 Numpy 数组并访问其元素

#### 1 维数组

```import numpy as np
a=np.array([1,2,3])
print(a)```

Output:

`[1 2 3]`

#### 多维数组

```a=np.array([(1,2,3),(4,5,6)])
print(a)```

Output:

```[[ 1 2 3]
[4 5 6]]```

#### Python NumPy Array v/s List

```import numpy as np
import time
import sys
S= range(1000)
print(sys.getsizeof(5)*len(S))
D= np.arange(1000)
print(D.size*D.itemsize)```

Output:

```import time
import sys
SIZE = 1000000
L1= range(SIZE)
L2= range(SIZE)
A1= np.arange(SIZE)
A2=np.arange(SIZE)
start= time.time()
result=[(x,y) for x,y in zip(L1,L2)]
print((time.time()-start)*1000)
start=time.time()
result= A1+A2
print((time.time()-start)*1000)```

Ouput:

```156.21376037597656
46.89955711364746```

#### ndim

```import numpy as np
a = np.array([(1,2,3),(4,5,6)])
print(a.ndim)```

#### itemsize

```import numpy as np
a = np.array([(1,2,3)])
print(a.itemsize)```

Output:

#### dtype

```import numpy as np
a = np.array([(1,2,3)])
print(a.dtype)```

Output:

`int32`

#### size 和 shape

```import numpy as np
a = np.array([(1,2,3,4,5,6)])
print(a.size)
print(a.shape)```

Output:

```6
(1,6)```

#### reshape

```import numpy as np
a = np.array([(8,9,10),(11,12,13)])
print(a)
a=a.reshape(3,2)
print(a)```

Output:

```[[ 8 9 10] [11 12 13]]
[[ 8 9] [10 11] [12 13]]```

#### slicing

```import numpy as np
a=np.array([(1,2,3,4),(3,4,5,6)])
print(a[0,2])```

Output:

```import numpy as np
a=np.array([(1,2,3,4),(3,4,5,6)])
print(a[0:,2])```

Output:

`[3 5]`

```import numpy as np
a=np.array([(8,9),(10,11),(12,13)])
print(a[0:2,1])```

Output:

`[9 11]`

#### linspace

```import numpy as np
a=np.linspace(1,3,10)
print(a)```

Output:

```[1.         1.22222222 1.44444444 1.66666667 1.88888889 2.11111111
2.33333333 2.55555556 2.77777778 3.        ]```

#### max/ min

```import numpy as np
a= np.array([1,2,3])
print(a.min())
print(a.max())
print(a.sum())```

Output:

#### axis 轴

```a= np.array([(1,2,3),(3,4,5)])
print(a.sum(axis=0))```

Output:

`[4 6 8]`

#### Square Root & Standard Deviation

```import numpy as np
a=np.array([(1,2,3),(3,4,5,)])
print(np.sqrt(a))
print(np.std(a))```

Output:

```[[ 1. 1.41421356 1.73205081]
[ 1.73205081 2. 2.23606798]]
1.29099444874```

```import numpy as np
x= np.array([(1,2,3),(3,4,5)])
y= np.array([(1,2,3),(3,4,5)])
print(x+y)```

Output:

`[[ 2 4 6] [ 6 8 10]]`

```import numpy as np
x= np.array([(1,2,3),(3,4,5)])
y= np.array([(1,2,3),(3,4,5)])
print(x-y)
print(x*y)
print(x/y)```

Output:

```[[0 0 0] [0 0 0]]
[[ 1 4 9] [ 9 16 25]]
[[ 1. 1. 1.] [ 1. 1. 1.]]```

#### Vertical & Horizontal Stacking

```import numpy as np
x= np.array([(1,2,3),(3,4,5)])
y= np.array([(1,2,3),(3,4,5)])
print(np.vstack((x,y)))
print(np.hstack((x,y)))```

Output:

```[[1 2 3] [3 4 5] [1 2 3] [3 4 5]]
[[1 2 3 1 2 3] [3 4 5 3 4 5]]```

#### ravel

```import numpy as np
x= np.array([(1,2,3),(3,4,5)])
print(x.ravel())```

Output:

`[ 1 2 3 3 4 5]`

#### Numpy 特殊方法

Numpy 有各种各样的特殊函数可用，例如 sine, cosine, tan, log 等。首先，让我们从 sine （正弦函数）开始，下面用到的模块 Matplotlib 我们会在下面介绍

```import numpy as np
import matplotlib.pyplot as plt
x= np.arange(0,3*np.pi,0.1)
y=np.sin(x)
plt.plot(x,y)
plt.show()```

```import numpy as np
import matplotlib.pyplot as plt
x= np.arange(0,3*np.pi,0.1)
y=np.tan(x)
plt.plot(x,y)
plt.show()```

```a= np.array([1,2,3])
print(np.exp(a))```

Output:

`[ 2.71828183   7.3890561   20.08553692]`

```import numpy as np
import matplotlib.pyplot as plt
a= np.array([1,2,3])
print(np.log(a))```

Output:

`[ 0.          0.69314718  1.09861229]`

```import numpy as np
import matplotlib.pyplot as plt
a= np.array([1,2,3])
print(np.log10(a))```

Output:

`[ 0.        0.30103      0.47712125]`

### Pandas

Pandas 是什幺就不过多介绍了，咱们直接进入主题，来看看 Pandas 的常用操作

#### Slicing the Data Frame

```import pandas as pd
XYZ_web= {'Day':[1,2,3,4,5,6], "Visitors":[1000, 700,6000,1000,400,350], "Bounce_Rate":[20,20, 23,15,10,34]}
df= pd.DataFrame(XYZ_web)
print(df)```

Output:

```     Bounce_Rate Day Visitors
0     20          1   1000
1     20          2   700
2     23          3   6000
3     15          4   1000
4     10          5   400
5     34          6   350```

`print(df.head(2))`

Output:

```     Bounce_Rate Day Visitors
0      20         1   1000
1      20         2    700```

`print(df.tail(2))`

Output:

```  Bounce_Rate Day Visitors
4      10      5    400
5      34      6    350```

#### Merging & Joining

```import pandas as pd
df1= pd.DataFrame({ "HPI":[80,90,70,60],"Int_Rate":[2,1,2,3],"IND_GDP":[50,45,45,67]}, index=[2001, 2002,2003,2004])
df2=pd.DataFrame({ "HPI":[80,90,70,60],"Int_Rate":[2,1,2,3],"IND_GDP":[50,45,45,67]}, index=[2005, 2006,2007,2008])
merged= pd.merge(df1,df2)
print(merged)```

Output：

```   HPI   IND_GDP Int_Rate
0  80      50      2
1  90      45      1
2  70      45      2
3  60      67      3```

```df1 = pd.DataFrame({"HPI":[80,90,70,60],"Int_Rate":[2,1,2,3], "IND_GDP":[50,45,45,67]}, index=[2001, 2002,2003,2004])
df2 = pd.DataFrame({"HPI":[80,90,70,60],"Int_Rate":[2,1,2,3],"IND_GDP":[50,45,45,67]}, index=[2005, 2006,2007,2008])
merged= pd.merge(df1,df2,on ="HPI")
print(merged)```

Output:

```      IND_GDP  Int_Rate  Low_Tier_HPI  Unemployment
2001     50      2         50.0            1.0
2002     45      1         NaN             NaN
2003     45      2         45.0            3.0
2004     67      3         67.0            5.0
2004     67      3         34.0            6.0```

```df1 = pd.DataFrame({"Int_Rate":[2,1,2,3], "IND_GDP":[50,45,45,67]}, index=[2001, 2002,2003,2004])
df2 = pd.DataFrame({"Low_Tier_HPI":[50,45,67,34],"Unemployment":[1,3,5,6]}, index=[2001, 2003,2004,2004])
joined= df1.join(df2)
print(joined)```

Output:

```       IND_GDP  Int_Rate Low_Tier_HPI  Unemployment
2001     50       2         50.0           1.0
2002     45       1         NaN            NaN
2003     45       2         45.0           3.0
2004     67       3         67.0           5.0
2004     67       3         34.0           6.0```

#### Concatenation

Concatenation 是将 DataFrame 粘合在一起的操作， 我们可以选择要串联的维度。 为此，只需使用“pd.concat”并传入 DataFrame 列表以连接在一起

```df1 = pd.DataFrame({"HPI":[80,90,70,60],"Int_Rate":[2,1,2,3], "IND_GDP":[50,45,45,67]}, index=[2001, 2002,2003,2004])
df2 = pd.DataFrame({"HPI":[80,90,70,60],"Int_Rate":[2,1,2,3],"IND_GDP":[50,45,45,67]}, index=[2005, 2006,2007,2008])
concat= pd.concat([df1,df2])
print(concat)```

Output:

```       HPI  IND_GDP Int_Rate
2001    80    50       2
2002    90    45       1
2003    70    45       2
2004    60    67       3
2005    80    50       2
2006    90    45       1
2007    70    45       2
2008    60    67       3```

```df1 = pd.DataFrame({"HPI":[80,90,70,60],"Int_Rate":[2,1,2,3], "IND_GDP":[50,45,45,67]}, index=[2001, 2002,2003,2004])
df2 = pd.DataFrame({"HPI":[80,90,70,60],"Int_Rate":[2,1,2,3],"IND_GDP":[50,45,45,67]}, index=[2005, 2006,2007,2008])
concat= pd.concat([df1,df2],axis=1)
print(concat)```

Output:

```       HPI  IND_GDP  Int_Rate HPI  IND_GDP Int_Rate
2001   80.0  50.0       2.0   NaN    NaN     NaN
2002   90.0  45.0       1.0   NaN    NaN     NaN
2003   70.0  45.0       2.0   NaN    NaN     NaN
2004   60.0  67.0       3.0   NaN    NaN     NaN
2005   NaN   NaN        NaN   80.0   50.0    2.0
2006   NaN   NaN        NaN   90.0   45.0    1.0
2007   NaN   NaN        NaN   70.0   45.0    2.0
2008   NaN   NaN        NaN   60.0   67.0    3.0```

#### Change the index

```import pandas as pd
df= pd.DataFrame({"Day":[1,2,3,4], "Visitors":[200, 100,230,300], "Bounce_Rate":[20,45,60,10]})
df.set_index("Day", inplace= True)
print(df)```

Output:

```     Bounce_Rate  Visitors
Day
1      20           200
2      45           100
3      60           230
4      10           300```

```import pandas as pd
df = pd.DataFrame({"Day":[1,2,3,4], "Visitors":[200, 100,230,300], "Bounce_Rate":[20,45,60,10]})
df = df.rename(columns={"Visitors":"Users"})
print(df)```

Output:

```  Bounce_Rate  Day  Users
0    20         1    200
1    45         2    100
2    60         3    230
3    10         4    300```

```import pandas as pd
country.to_html('edu.html')```

Output:

```import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import style
style.use('fivethirtyeight')
df= df.set_index(["Country Code"])
sd = sd.reindex(columns=['2010','2011'])
db= sd.diff(axis=1)
db.plot(kind="bar")
plt.show()```

### Matplotlib

Matplotlib 支持多种图表的绘制

```from matplotlib import pyplot as plt
#Plotting to our canvas
plt.plot([1,2,3],[4,5,1])
#Showing what we plotted
plt.show()```

Output:

```from matplotlib import pyplot as plt
x = [5,2,7]
y = [2,16,4]
plt.plot(x,y)
plt.title('Info')
plt.ylabel('Y axis')
plt.xlabel('X axis')
plt.show()```

Output:

```from matplotlib import pyplot as plt
from matplotlib import style
style.use('ggplot')
x = [5,8,10]
y = [12,16,6]
x2 = [6,9,11]
y2 = [6,15,7]
plt.plot(x,y,'g',label='line one', linewidth=5)
plt.plot(x2,y2,'c',label='line two',linewidth=5)
plt.title('Epic Info')
plt.ylabel('Y axis')
plt.xlabel('X axis')
plt.legend()
plt.grid(True,color='k')
plt.show()```

Output:

#### Bar Graph

```from matplotlib import pyplot as plt
plt.bar([0.25,1.25,2.25,3.25,4.25],[50,40,70,80,20],
label="BMW",width=.5)
plt.bar([.75,1.75,2.75,3.75,4.75],[80,20,20,50,60],
label="Audi", color='r',width=.5)
plt.legend()
plt.xlabel('Days')
plt.ylabel('Distance (kms)')
plt.title('Information')
plt.show()```

Output:

#### Histogram

```import matplotlib.pyplot as plt
population_age = [22,55,62,45,21,22,34,42,42,4,2,102,95,85,55,110,120,70,65,55,111,115,80,75,65,54,44,43,42,48]
bins = [0,10,20,30,40,50,60,70,80,90,100]
plt.hist(population_age, bins, histtype='bar', rwidth=0.8)
plt.xlabel('age groups')
plt.ylabel('Number of people')
plt.title('Histogram')
plt.show()```

Output:

#### Scatter Plot

```import matplotlib.pyplot as plt
x = [1,1.5,2,2.5,3,3.5,3.6]
y = [7.5,8,8.5,9,9.5,10,10.5]
x1=[8,8.5,9,9.5,10,10.5,11]
y1=[3,3.5,3.7,4,4.5,5,5.2]
plt.scatter(x,y, label='high income low saving',color='r')
plt.scatter(x1,y1,label='low income high savings',color='b')
plt.xlabel('saving*100')
plt.ylabel('income*1000')
plt.title('Scatter Plot')
plt.legend()
plt.show()```

Output:

#### Area Plot

```import matplotlib.pyplot as plt
days = [1,2,3,4,5]
sleeping =[7,8,6,11,7]
eating = [2,3,4,3,2]
working =[7,8,7,2,2]
playing = [8,5,7,8,13]
plt.plot([],[],color='m', label='Sleeping', linewidth=5)
plt.plot([],[],color='c', label='Eating', linewidth=5)
plt.plot([],[],color='r', label='Working', linewidth=5)
plt.plot([],[],color='k', label='Playing', linewidth=5)
plt.stackplot(days, sleeping,eating,working,playing, colors=['m','c','r','k'])
plt.xlabel('x')
plt.ylabel('y')
plt.title('Stack Plot')
plt.legend()
plt.show()```

Output:

#### Pie Chart

```import matplotlib.pyplot as plt
days = [1,2,3,4,5]
sleeping =[7,8,6,11,7]
eating = [2,3,4,3,2]
working =[7,8,7,2,2]
playing = [8,5,7,8,13]
slices = [7,2,2,13]
activities = ['sleeping','eating','working','playing']
cols = ['c','m','r','b']
plt.pie(slices,
labels=activities,
colors=cols,
startangle=90,
explode=(0,0.1,0,0),
autopct='%1.1f%%')
plt.title('Pie Plot')
plt.show()```

Output:

#### Working With Multiple Plots

```import numpy as np
import matplotlib.pyplot as plt
def f(t):
return np.exp(-t) * np.cos(2*np.pi*t)
t1 = np.arange(0.0, 5.0, 0.1)
t2 = np.arange(0.0, 5.0, 0.02)
plt.subplot(221)
plt.plot(t1, f(t1), 'bo', t2, f(t2))
plt.subplot(222)
plt.plot(t2, np.cos(2*np.pi*t2))
plt.show()```

Output: