【机器学习】K-means(非常详细)
有四个牧师去郊区布道,一开始牧师们随意选了几个布道点,并且把这几个布道点的情况公告给了郊区所有的村民,于是每个村民到离自己家最近的布道点去听课。
听课之后,大家觉得距离太远了,于是每个牧师统计了一下自己的课上所有的村民的地址,搬到了所有地址的中心地带,并且在海报上更新了自己的布道点的位置。
牧师每一次移动不可能离所有人都更近,有的人发现A牧师移动以后自己还不如去B牧师处听课更近,于是每个村民又去了离自己最近的布道点……
就这样,牧师每个礼拜更新自己的位置,村民根据自己的情况选择布道点,最终稳定了下来。
1.2 算法步骤
所以 K線戰法和菲波納奇技術 K線戰法和菲波納奇技術 K-means 的算法步骤为:
- 选择初始化的 k 个样本作为初始聚类中心 a= ;
- 针对数据集中每个样本 x_i 计算它到 k 个聚类中心的距离并将其分到距离最小的聚类中心所对应的类中;
- 针对每个类别 a_j ,重新计算它的聚类中心 a_j=\frac<\left| c_i \right|>\sum_
x (即属于该类的所有样本的质心); - 重复上面 2 3 两步操作,直到达到某个中止条件(迭代次数、最小误差变化等)。
1.3 复杂度
时间复杂度: O(tknm) ,其中,t 为迭代次数,k 为簇的数目,n 为样本点数,m 为样本点维度。
空间复杂度: O(m(n+k)) ,其中,k 为簇的数目,m 为样本点维度,n 为样本点数。
2. 优缺点
2.1 优点
- 容易理解,聚类效果不错,虽然是局部最优, 但往往局部最优就够了;
- 处理大数据集的时候,该算法可以保证较好的伸缩性;
- 当簇近似高斯分布的时候,效果非常不错;
- 算法复杂度低。
2.2 缺点
- K 值需要人为设定,不同 K 值得到的结果不一样;
- 对初始的簇中心敏感,不同选取方式会得到不同结果;
- 对异常值敏感;
- 样本只能归为一类,不适合多分类任务;
- 不适合太离散的分类、样本类别不平衡的分类、非凸形状的分类。
3. 算法调优与改进
K線戰法和菲波納奇技術针对 K線戰法和菲波納奇技術 K-means 算法的缺点,我们可以有很多种调优方式:如数据预处理(去除异常点),合理选择 K 值,高维映射等。以下将简单介绍:
3.1 数据预处理
3.2 合理选择 K 值
K 值的选取对 K-means 影响很大,这也是 K-means 最大的缺点,常见的选取 K 值的方法有:手肘法、Gap statistic 方法。
当 K < 3 时,曲线急速下降;当 K >3 时,曲线趋于平稳,通过手肘法我们认为拐点 3 为 K 的最佳值。
Gap(K)=\text(\log D_k)-\log D_k \\
其中 D_k 为损失函数,这里 E(logD_k) 指的是 logD_k 的期望。这个数值通常通过蒙特卡洛模拟产生,我们在样本里所在的区域中按照均匀分布随机产生和原始样本数一样多的随机样本,并对这个随机样本做 K-Means,从而得到一个 D_k K線戰法和菲波納奇技術 。如此往复多次,通常 20 次,我们可以得到 20 个 logD_k 。对这 20 个数值求平均值,就得到了 E(logD_k) 的近似值。最终可以计算 Gap Statisitc。而 Gap statistic 取得最大值所对应的 K 就是最佳的 K。
由图可见,当 K=3 时,Gap(K) 取值最大,所以最佳的簇数是 K=3。
Github 上一个项目叫 gap_statistic ,可以更方便的获取建议的类簇个数。
3.3 采用核函数
基于欧式距离的 K-means 假设了了各个数据簇的数据具有一样的的先验概率并呈现球形分布,但这种分布在实际生活中并不常见。面对非凸的数据分布形状时我们可以引入核函数来优化,这时算法又称为核 K-means 算法,是核聚类方法的一种。核聚类方法的主要思想是通过一个非线性映射,将输入空间中的数据点映射到高位的特征空间中,并在新的特征空间中进行聚类。非线性映射增加了数据点线性可分的概率,从而在经典的聚类算法失效的情况下,通过引入核函数可以达到更为准确的聚类结果。
3.4 K-means++
- 随机选取一个中心点 a_1 ;
- 计算数据到之前 n 个聚类中心最远的距离 D(x) ,并以一定概率 \frac<\sum> 选择新中心点 a_i ;
- 重复第二步。
简单的来说,就是 K-means++ 就是选择离已选中心点最远的点。这也比较符合常理,聚类中心当然是互相离得越远越好。
但是这个算法的缺点在于,难以并行化。所以 K線戰法和菲波納奇技術 k-means II 改变取样策略,并非按照 k-means++ 那样每次遍历只取样一个样本,而是每次遍历取样 k 个,重复该取样过程 log(n ) 次,则得到 klog(n) 个样本点组成的集合,然后从这些点中选取 k 个。当然一般也不需要 log(n) 次取样,5 次即可。
3.5 ISODATA
ISODATA 的全称是迭代自组织数据分析法。它解决了 K線戰法和菲波納奇技術 K 的值需要预先人为的确定这一缺点。而当遇到高维度、海量的数据集时,人们往往很难准确地估计出 K 的大小。ISODATA 就是针对这个问题进行了改进,它的思想也很直观:当属于某个类别的样本数过少时把这个类别去除,当属于某个类别的样本数过多、分散程度较大时把这个类别分为两个子类别。
4. 收敛证明
我们先来看一下 K-means 算法的步骤:先随机选择初始节点,然后计算每个样本所属类别,然后通过类别再跟新初始化节点。这个过程有没有想到之前介绍的 EM 算法 。
我们需要知道的是 K線戰法和菲波納奇技術 K-means 聚类的迭代算法实际上是 EM 算法。EM 算法解决的是在概率模型中含有无法观测的隐含变量情况下的参数估计问题。在 K-means 中的隐变量是每个类别所属类别。K-means 算法迭代步骤中的 每次确认中心点以后重新进行标记 对应 EM 算法中的 E 步 求当前参数条件下的 Expectation 。而 根据标记重新求中心点 对应 EM 算法中的 M 步 求似然函数最大化时(损失函数最小时)对应的参数 。
Download K-Lite Codec Pack
There are four different variants of the K-Lite Codec Pack. Ranging from a very small bundle that contains only the most essential decoders to a large and more comprehensive bundle. The global differences between the variants can be found below . Detailed differences can be found on the comparison of abilities and comparison of contents pages.
These codec packs are compatible with K線戰法和菲波納奇技術 Windows Vista/7/8/8.1/10. Old versions also with XP.
The packs include both 32-bit and 64-bit codecs, so they work great on both x86 and x64 variants of Windows!
Basic
Small but extremely powerful!
Already contains everything you need to play all common audio and video file formats.
Supports playback of:
- AVI, MKV, MP4, FLV, MPEG, MOV, TS, M2TS, WMV, RM, RMVB, OGM, WebM
- MP3, FLAC, M4A, AAC, OGG, 3GP, AMR, APE, MKA, Opus, Wavpack, Musepack
- DVD and Blu-ray (after decryption)
- and many more audio and video K線戰法和菲波納奇技術 formats!
Provides lots of useful functionality, such as:
- Subtitle display
- Hardware accelerated video decoding
- Audio bitstreaming
- Video thumbnails in Explorer
- File association options
- Broken codec detection
- and much more!
Note: the Basic version does NOT include a player
You need to use it together with an already installed DirectShow player such as Windows Media Player. For playback issues with WMP please read our F.A.K線戰法和菲波納奇技術 K線戰法和菲波納奇技術 Q. for solutions.
We strongly recommend using K-Lite Standard. That includes MPC-HC, which is a much better player than WMP.
Standard
Same as K線戰法和菲波納奇技術 Basic, plus:
- Media Player Classic Home Cinema (MPC-HC)
This is an excellent player. Highly recommended!
It provides high quality playback and many useful options. - MediaInfo Lite
This is a tool for getting details about media files.
This is the recommended variant for the average user. Use this if you don't know what you need. It already contains everything that you need for K線戰法和菲波納奇技術 K線戰法和菲波納奇技術 playback. The extra components that are included in the larger versions K線戰法和菲波納奇技術 provide no benefit for the majority of users.
Same as Standard, plus:
- MadVR
An advanced video renderer with high quality upscaling algorithms. - Plugin for 3D video decoding (H.264 MVC)
Note: this requires using MPC-HC with madVR, and also a compatible graphics driver. Recent NVIDIA drivers no longer support 3D video (but you could try "3D Fix Manager").
Same as Full, plus:
- DC-Bass K線戰法和菲波納奇技術 K線戰法和菲波納奇技術 Source Mod
For decoding OptimFrog and Tracker audio files (very rare formats). - GraphStudioNext
A tool for creating and testing DirectShow graphs.K線戰法和菲波納奇技術 - A few ACM/VFW codecs such as x264VFW and Lagarith
This K線戰法和菲波納奇技術 type of codec is used by certain video editing/encoding applications for working with AVI files. For example VirtualDub.
These type of codecs K線戰法和菲波納奇技術 are not used or needed for video playback! - ffdshow audio K線戰法和菲波納奇技術 processor
DirectShow filter that provides some audio processing options. Not used K線戰法和菲波納奇技術 by default. Also not needed or recommened. - ffdshow video processor K線戰法和菲波納奇技術
DirectShow filter that provides some video processing options. Not used by default. Also not needed or recommened.
Important note:
The K-Lite K線戰法和菲波納奇技術 Codec Pack does not expand the import abilities of professional video editors such as Vegas Movie Studio or Adobe Premiere. Those applications often only support importing a small set of file formats, and do not support using the type of codecs that are included in the codec pack (DirectShow/VFW). Modern editors often only use their K線戰法和菲波納奇技術 K線戰法和菲波納奇技術 own internal codecs or only support external codecs of the Media K線戰法和菲波納奇技術 Foundation type.
Update
Additional updates for the latest version of the codec pack.
This is not a stand-alone installer. This update requires that the latest version of Basic/Standard/Full/Mega is already installed.
Beta versions contain the latest updates and improvements, but they have not yet been tested as well as normal releases.
For experienced users who like to try out the latest stuff and want to provide feedback.
Beta versions are available for Basic/Standard/Full/Mega.
Old versions
Are you looking for an older version? Those can be found here.
If the K線戰法和菲波納奇技術 latest version gives you a problem, then please report that to us so we can fix it!
It’s almost time to ride the K!
The K Line will serve the communities of West Adams, Jefferson Park, Baldwin Hills, Leimert Park, Hyde Park, Inglewood, Westchester and more. Seven K Line stations will be opening in fall 2022.
It’s almost time to ride the K!
The K Line will serve the communities of West Adams, Jefferson Park, Baldwin Hills, Leimert Park, Hyde Park, Inglewood, Westchester and more. Seven K Line stations will be opening in fall 2022.
Welcome
The K Line was designed and built with the help of community input and local voices to provide a faster, more convenient and reliable way to connect to jobs, schools and the rich cultural places throughout these communities. The K Line will connect to the Metro E K線戰法和菲波納奇技術 Line (Expo), which travels between downtown LA and Santa Monica. By 2024, the K Line will also connect to the new LAX/Metro K線戰法和菲波納奇技術 Transit Center Station, the new Aviation/Century Station and the Metro C K線戰法和菲波納奇技術 Line (Green).
Meet the K Line!
Note: Aviation/Century Station is projected to open in 2023, while LAX/Metro Transit Center Station is scheduled to open in 2024.
transformer中的Q,K,V到底是什么?
吴昊
Q、K是用来搬运信息的,而V本身就是从token当中搬运出来的信息
1. 啥是Q、K
聪明的 小明 在一间 明亮的 教室 写 一份很难的 作业。
假设“小明”这个主语所对应的token是一个高维空间 \bold^d 当中的一个点,当我们用“聪明的”这个形容词来修饰这个主语之后,那这个主语在空间 \bold^d 当中的位置相较于不被定语修饰的token的位置应该是有一定修改的,否则修饰和没修饰就没区别。
这个修改是怎么产生的?就是通过 \bold_ 与 \bold_ 做内积之后,与 \bold_ 线程并且加上 \bold_ 得来的,相当于Q和K决定了修改的幅度(幅度可正可副),而V用来决定修改的方向。