Deeplearning 笔记：Numpy dot,cumsum,花式索引,布尔屏蔽等

1. `os.abspath(file)`表示当前所在py文件的绝对路径，注意这个函数不能单独在命令行运行必须要在py文件里面。`os.path.dirname()`表示其所在的文件夹

2. Numpy

2.1 `numpy.ndarray()`

标题中的函数就是numpy的构造函数，我们可以使用这个函数创建一个ndarray对象。构造函数有如下几个可选参数：

参数	类型	作用
shape	int型tuple	多维数组的形状
dtype	data-type	数组中元素的类型
buffer		用于初始化数组的buffer
offset	int	buffer中用于初始化数组的首个数据的偏移
strides	int型tuple	每个轴的下标增加1时，数据指针在内存中增加的字节数
order	‘C’ 或者 ‘F’	‘C’:行优先；’F’:列优先

参考numpy中 C order与F order的区别是什么？

order参数的C和F是numpy中数组元素存储区域的两种排列格式，即C语言格式和Fortran语言格式。

创建一个3×3的2维数组
1
2
3
> import numpy as np
> a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=np.float32)
>

数组a在内存中的数据存储区域中存储方式(默认order=”C”，其中一个格子是4bytes)：
1
2
> |1|2|3|4|5|6|7|8|9|
>

在C语言中当第一维数组也就是第0轴的下标增加1时，元素在内存中的地址增加3个元素的字节数，在此例中也就是12bytes，从1的地址增加12bytes到4的地址。此时
1
2
> a.strides = (12, 4)
>

若以F order创建数组：

1 2	> b = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=np.float32, order="F") >

数组b在内存中的数据存储区域中的存储方式：
1
2
> |1|4|7|2|5|8|3|6|9|
>

在Fortran语言中，第一维也就是第0轴下标增加1时，元素的地址增加一个元素字节数，在此例子中也就是4 bytes，从1的地址增加4bytes到4的地址。
1
2
> b.strides = (4, 12)
>

2.2

多维数组在内存中的存储顺序问题。

以一个二维数组a[2][2]为例，在C语言中，其在内存中存储为
1
2
> a[0][0] a[0][1] a[1][0] a[1][1]
>

而在Fortran语言中，其顺序为
1
2
> a[0][0] a[1][0] a[0][1] a[1][1]
>

实例：

np.ndarray(shape=(2,3), dtype=int, buffer=np.array([1,2,3,4,5,6,7]), offset=0, order="C") 
array([[1, 2, 3],
       [4, 5, 6]])
np.ndarray(shape=(2,3), dtype=int, buffer=np.array([1,2,3,4,5,6,7]), offset=0, order="F")
array([[1, 3, 5],
       [2, 4, 6]])

参考数据格式汇总及type, astype, dtype区别

2.3 `dot()`函数

>>> a
array([4, 6, 2])
>>> b
array([-2,  5, 10])
>>> c = a.dot(b)
>>> c #4*(-2)+6*5+2*10=42 对于一维数组得到的是数组的内积（一一对应相乘）
42
>>> type(c)
<class 'numpy.int32'>
>>> d
array([[1, 2],
       [3, 4]])
>>> e
array([[5, 6],
       [7, 8]])
>>> f = d.dot(e)
>>> f #二维数组得到的是矩阵积 1*5+2*7=19, 3*5+4*7=43
array([[19, 22],
       [43, 50]])
>>> type(f)
<class 'numpy.ndarray'>

2.4 `numpy.cumsum(a, axis=None)`函数

import numpy as np
a = np.range(4)
a = a.reshape((2,2))
array([[0, 1],
       [2, 3]])
>>> np.cumsum(a) #没有axis参数，就把数组a当作一维数组得到[0, 0+1, 0+1+2, 0+1+2+3]
array([0, 1, 3, 6], dtype=int32)
>>> np.cumsum(a, 0) #axis=0, 就以行为单位，对应列的元素累加得到[[0, 1], [0+2, 1+3]]
array([[0, 1],
       [2, 4]], dtype=int32)
>>> np.cumsum(a, 1) #axis=1, 就以列为单位，对应元素行的元素累加得到[[0, 0+1], [2, 2+3]]
array([[0, 1],
       [2, 5]], dtype=int32)

2.5 花式索引

>>> a = np.arange(0, 100, 10)
>>> a
array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])
>>> indices = [1, 5, -1]
>>> b = a[indices] #用列表作为参数，进行高效索引
>>> b
array([10, 50, 90])

2.6 布尔屏蔽

import matplotlib.pyplot as plt
a = np.linspace(0, 2 * np.pi, 50) 
a    
array([0.        , 0.12822827, 0.25645654, 0.38468481, 0.51291309,
       0.64114136, 0.76936963, 0.8975979 , 1.02582617, 1.15405444,
       1.28228272, 1.41051099, 1.53873926, 1.66696753, 1.7951958 ,
       1.92342407, 2.05165235, 2.17988062, 2.30810889, 2.43633716,
       2.56456543, 2.6927937 , 2.82102197, 2.94925025, 3.07747852,
       3.20570679, 3.33393506, 3.46216333, 3.5903916 , 3.71861988,
       3.84684815, 3.97507642, 4.10330469, 4.23153296, 4.35976123,
       4.48798951, 4.61621778, 4.74444605, 4.87267432, 5.00090259,
       5.12913086, 5.25735913, 5.38558741, 5.51381568, 5.64204395,
       5.77027222, 5.89850049, 6.02672876, 6.15495704, 6.28318531])
b = np.sin(a)
b
array([ 0.00000000e+00,  1.27877162e-01,  2.53654584e-01,  3.75267005e-01,
        4.90717552e-01,  5.98110530e-01,  6.95682551e-01,  7.81831482e-01,
        8.55142763e-01,  9.14412623e-01,  9.58667853e-01,  9.87181783e-01,
        9.99486216e-01,  9.95379113e-01,  9.74927912e-01,  9.38468422e-01,
        8.86599306e-01,  8.20172255e-01,  7.40277997e-01,  6.48228395e-01,
        5.45534901e-01,  4.33883739e-01,  3.15108218e-01,  1.91158629e-01,
        6.40702200e-02, -6.40702200e-02, -1.91158629e-01, -3.15108218e-01,
       -4.33883739e-01, -5.45534901e-01, -6.48228395e-01, -7.40277997e-01,
       -8.20172255e-01, -8.86599306e-01, -9.38468422e-01, -9.74927912e-01,
       -9.95379113e-01, -9.99486216e-01, -9.87181783e-01, -9.58667853e-01,
       -9.14412623e-01, -8.55142763e-01, -7.81831482e-01, -6.95682551e-01,
       -5.98110530e-01, -4.90717552e-01, -3.75267005e-01, -2.53654584e-01,
       -1.27877162e-01, -2.44929360e-16])
plt.plot(a,b) #画出(a, b) 的曲线 图一
b >= 0	#数组元素中从0到24索引值的元素都为True
array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False])
mask = (b >= 0) #b的数组元素大于等于零就是True,否则是False，然后把组成新的数组，并赋值给mask
plt.plot(a[mask], b[mask], 'bo') #画出a, b索引值为True的点，及0-24索引值 图二
a <= np.pi / 2
array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False])
mask = [(b >= 0) & (a <= np.pi / 2)] #(b >= 0) & (a <= np.pi / 2)两则的交集为0-12索引值为True
plt.plot(a[mask], b[mask], 'go') #画出(a,b)的点 图三
plt.show() #显示出三个图像，由于之前两个没有显示，这里调用show()会全部显示出来。图四

图一

图二

图三

图四

注意下面的代码，系统出现了警告提示，意思就是说不能用非元组序列来作为数组的索引，将在后面的版本中视为错误。

plt.plot(a[mask], b[mask], 'go')
Warning (from warnings module):
  File "__main__", line 1
FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
[<matplotlib.lines.Line2D object at 0x0C78ADF0>]

根据提示可以改为：

1 2	plt.plot(a[tuple(mask)], b[tuple(mask)], 'go') [<matplotlib.lines.Line2D object at 0x0C6391B0>]

最后补充一点：

>>> a
array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])
>>> b = (a>3)
>>> b
array([[False, False],
       [False, False],
       [ True,  True],
       [ True,  True],
       [ True,  True]])
>>> a[b]
array([4, 5, 6, 7, 8, 9])
>>> a[a>3] #这样也等同于用一个布尔数组做参数的方法
array([4, 5, 6, 7, 8, 9])
>>>

1. os.abspath(__file__)表示当前所在py文件的绝对路径，注意这个函数不能单独在命令行运行必须要在py文件里面。os.path.dirname()表示其所在的文件夹

2. Numpy

2.1 numpy.ndarray()

2.2

参考数据格式汇总及type, astype, dtype区别

2.3 dot()函数

2.4 numpy.cumsum(a, axis=None)函数

2.5 花式索引

2.6 布尔屏蔽

1. `os.abspath(file)`表示当前所在py文件的绝对路径，注意这个函数不能单独在命令行运行必须要在py文件里面。`os.path.dirname()`表示其所在的文件夹

2.1 `numpy.ndarray()`

2.3 `dot()`函数

2.4 `numpy.cumsum(a, axis=None)`函数