本站内容均来自兴趣收集,如不慎侵害的您的相关权益,请留言告知,我们将尽快删除.谢谢.
看吴恩达2022年第二部分‘Advanced Learning Algorithm’其中的week2下面多类分类中的py文件时遇到了这样一段代码:
# make dataset for example centers = [[-5, 2], [-2, -2], [1, 2], [5, -2]] X_train, y_train = make_blobs(n_samples=2000, centers=centers, cluster_std=1.0,random_state=30)
关于这个center参数,官方文档中的英文解释是:
centers : int or ndarray of shape (n_centers, n_features), default=None The number of centers to generate, or the fixed center locations. If n_samples is an int and centers is None, 3 centers are generated. If n_samples is array-like, centers must be either None or an array of length equal to the length of n_samples.
再使用中文检索和英文检索得到的解释就是:
centers:int或形状数组[n_centers,n_features],可选 (默认= None)要生成的中心数或固定的中心位置。 如果n_samples是一个int且center为None,则将生成3个中心。 如果n_samples是数组类,则中心必须为None或长度等于n_samples长度的数组。
这解释了感觉还是看不懂!!!
举例:
那就来试一试修改代码观察如何变化:
centers = [[-5, 2], [-2, -2], [1, 2], [5, -2]] X_train, y_train = make_blobs(n_samples=20, centers=centers, cluster_std=1.0,random_state=30) print(X_train) print(y_train)
输出结果为:
[[-5.10069672 2.30379318] [ 6.11347211 -3.92116972] [ 0.994222 1.53252103] [-5.97071094 2.47055962] [-2.28564551 -1.46163252] [ 5.81050091 -3.04477837] [-4.86570341 0.89314453] [-0.42177445 -1.89250206] [ 4.29859758 -1.15091215] [-6.72596243 3.58509537] [-1.9033676 3.61689037] [ 1.98501786 0.29953473] [-6.26405266 3.52790535] [-2.76404783 -2.77518851] [-0.61615283 -1.23961492] [ 2.42550989 1.33524488] [-4.08389663 -1.06221829] [ 4.31077063 -2.85275686] [ 0.5769847 3.06448209] [ 3.89985619 -3.31564409]] [0 3 2 0 1 3 0 1 3 0 2 2 0 1 1 2 1 3 2 3]
分析:
可以观察到结果中第一项[-5.10069672 2.30379318]
与预先设置的[-5,2]
相近(因为设置的标准差为1.0),此时对应的y的分类结果是0;再看结果中第二项[ 6.11347211 -3.92116972]
与预先设置的[5,-2]
相近(因为设置的标准差为1.0),此时对应的y的分类结果是3。
推导:
center列表[[-5, 2], [-2, -2], [1, 2], [5, -2]]
中的每一个小的列表的index值分别代表y的取值。这个列表表示的并不是区间,而是可以看作特征变量x1,x2(下面有验证)。生成的标准差为1的随机样本之一比如结果的第5项[-2.28564551 -1.46163252]
,它的值接近于列表[-2, -2]
,观察y此时对应为1,正好的是[-2, -2]
在centers列表中的下标位置。
验证:
假设我现在要生成含有3个特征变量(x1,x2,x3), 5种分类结果的样本集合。
centers = [[-5, 2, 1], [5, -2, 2], [10, 2, 3], [15, -2, 4], [20, 3, 6]] X_train, y_train = make_blobs(n_samples=20, centers=centers, cluster_std=1.0,random_state=30) print(X_train) print(y_train)
查看输出结果:
[[ 5.76038508 -2.28564551 2.53836748] [ 7.0966324 3.61689037 4.42550989] [19.00816561 1.83671763 5.9786649 ] [ 9.33524488 2.98501786 1.29953473] [-4.52944038 1.89930328 1.30379318] [21.2418555 4.70774688 6.3231534 ] [ 8.89985619 0.68435591 3.81050091] [-6.26405266 3.52790535 0.02928906] [19.61298353 1.07269461 6.55075659] [ 8.95522163 1.31077063 2.14724314] [14.97080728 -0.60594402 3.60213256] [-6.72596243 3.58509537 1.13429659] [20.95435581 3.7827765 4.2067621 ] [ 4.23595217 -2.77518851 3.38384717] [ 2.91610337 -1.06221829 1.994222 ] [16.11347211 -3.92116972 3.29859758] [-6.10685547 3.57822555 1.10749794] [16.01912738 -0.1011187 3.64515036] [ 4.53252103 -2.4230153 3.06448209] [15.84908785 -0.94930021 3.46312554]] [1 2 4 2 0 4 2 0 4 2 3 0 4 1 1 3 0 3 1 3]
随机取一行,比如第13行[20.95435581 3.7827765 4.2067621 ]
接近于centers中的[20, 3, 6]
。此时对应的y的分类结果是第五类(index为4)。
Be First to Comment