CLIQUE算法的基本思路.ppt

资源ID：5421902 资源大小：233KB 全文页数：25页
资源格式： PPT 下载积分：15金币

快捷下载

会员登录下载

三方登录下载：

下载资源需要15金币

邮箱/手机：
温馨提示：	用户名和密码都是您填写的邮箱或者手机号，方便查询和重复下载（系统自动生成）
支付方式：
验证码：	换一换

加入VIP免费专享

账号：
密码：
验证码：	换一换
当日自动登录忘记密码？

友情提示

1、下载资料失败解决办法

2、PDF文件下载后，可能会被浏览器默认打开，此种情况可以点击浏览器菜单，保存网页到桌面，就可以正常下载了。

3、本站不支持迅雷下载，请使用电脑自带的IE浏览器，或者360浏览器、谷歌浏览器下载即可。

4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩，下载后原文更清晰。

5、试题试卷类文档，如果标题没有明确说明有答案则都视为没有答案，请知晓。

网站客服

侵权投诉

CLIQUE算法的基本思路.ppt

CLIQUE算法的基本思路,采用基于密度的算法聚类（cluster）就是一个区域，满足该区域中的点的密度大于与之相邻的区域。把数据空间分割成网格单元(unit)，将落到某个单元中的点的个数当成这个单元的密度（density）。可以指定一个数值，当某个单元中的点的个数大于该数值时，我们就说这个单元格是稠密（dense）的。聚类也就定义为连通的所有的稠密单元格的集合。,基本概念,设A=A1,A2,Ad是n个域的集合，那么S=A1A2Ad就是一个d维空间，我们将A1,A2,Ad看成是S的维（属性）；算法的输入是一个n维空间中的点集,设为V=v1,v2,vm，其中vi=vi1,vi2,vid。vi的第j个分量vijAj；通过一个输入参数，可以将空间S的每一维分成相同的个区间，从而将整个空间分成了有限个不相交的类矩形单元(units)，每一个这样的矩形单元可以描述为u1,u2,ud,其中ui=li,hi)是一个前闭后开区间；,基本概念,一个点v=v1,v2,vd落入一个单元u=u1,u2,ud中，当且仅当对于每一个ui都有li。密度阈值是另一个输入参数；,基本概念,对于S的任何子空间，例如子空间Sub=At1At2Atk，（kd，并且当ij时有titj成立），可以在该子空间中定义单元格，选择率等相同概念。,基本概念,一个聚类（cluster）可以定义为，在k维空间中由一些连通的稠密单元组成的最大单元集；两个k维中的单元格u1,u2称为连通的（connected）当且仅当：（1）这两个单元格有一个公共的面；或者（2）u1,u2都跟另一个单元格u3连通；两个单元格u1=rt1,rt2,rtk,u2=rt1,rt2,rtk有一个公共的面是指，存在k-1个维度（不妨设这k-1维就是At1,At2,Atk-1），有rtj=rtj成立（j=1,2,k-1），并且对于第Atk维有htk=ltk，或者htk=ltk成立；,基本概念,区域(region)是指一个每一边都与坐标轴平行的类矩形。也就是说这类区域是由单元格组成的且具有规则的形状，这样一个区域就可以用区间的交的形式表示出来；区域R包含于一个聚类C，当且仅当RC=R；进一步我们称这样的R是最大的（maximal）当且仅当没有一个R的超集R也包含于C；一个聚类C的最小描述是上述最大区域（maximal region）的一个集合R，R中的最大区域刚好覆盖C，集合r中的最大区域是没有冗余的，即R的任何子集都不能覆盖C；,例子,d-demensional spaceNumber of intervalsunitselectivity of a unitdensity threshold Dense unitClusterRegion maximal regionminimal description of a cluster,例子,subspace,问题描述,Given a set of data points and the input parameters,and,find clusters in all subspaces of the original data space and present a mimimal description of each cluster in the form of a DNF expression.,CLIQUE算法,Identification of subspace that contain clustersIdentification of clustersGeneration of minimal description for the clusters,第一步：识别含有聚类的子空间,A bottom-up algorithm to find dense unitsDetermines 1-dimensional dense units by making a pass over the dataHaving determined(k-1)-dimensional dense units,the candidate k-dimensional units are determined using candidate generation procedure.MDL-based pruningTo decide which subspaces(and the corresponding dense units)are interesting.MDL-Minimal Description Length,candidate generation procedure,Input:Dk-1,the set of all(k-1)-dimensional dense unitOutput:a superset of the set of all k-dimensional dense unitsAlgorithm:,MDL-based pruning,Coverage of subspace sjSort the subspaces in the descending order of their coverageDivide the sorted list of subspaces into two sets:the selected set I and the pruned set PHow to arrive at the cut point,MDL-based pruning,The code length is minimized to determine the optimal cut point i,MDL-based pruning,第二步：识别聚类,Input:a set of dense units D,all in the same k-dimensional space SOutput:a partition of D into D 1,D q,such that all units in D i are connected and no two units u iD i,u jD j with ij are connected.Each such partition is a clusterMethod:depth-first search algorithmStart with some unit u in D,assign it the first cluster number,and find all the units it is connected toIf there still are units in D that have not yet been visited,find one and repeat the procedure.,depth-first search algorithm,第三步：产生最小聚类描述,Input:disjoint sets of connected k-dimensional units in the same subspace,each such set is a clusterOutput:a concise description for each clusterMethod:Covering with maximal regionsMinimal cover,Concept:Cover of a cluster,For a cluster C in a k-dimensional subspace S,a set W of regions in the same subspace S is a cover of C if every region RW is contained in C,and each unit in C is contained in at least one of the region in W.,1.Covering with maximal regions,Input:a set C of connected dense units in the same k-dimensional space SOutput:a set W of maximal region such that W is a cover of CMethod:Greedy growth algorithm,Greedy growth algorithm,Begin with an arbitrary dense unit u1 C and greedily grow a maximal region R1 that covers u1.Add R1 to WFind another unit u2 C that is not yet covered by any of maximal region in W.greedily grow a maximal region R2 that covers u2.Add R2 to WRepeat this procedure until all units in C are covered by some maximal region in R,Obtain a maximal region covering a dense unit u,Start with u and grow it along dimension a1,as much as possible in both directions(to the left and to the right of the unit),using connected dense units contained in CGrow this region along a2Repeated for all the dimensions,yielding a maximal region covering u,2.minimal cover,Input:a cover for each clusterOutput:a minimal cover(minimality is defined in terms of the number of maximal regions required to cover the cluster)Method:Remove from the cover the smallest(in number of units)maximal region which is redundantRepeat the procedure until no maximal region can be removed.,算法小结,第1步：根据delta的值将原数据表的每一维划分成相等的区间；将每一维上区间的定义保存到“Interval_Define”表中；第2步：n=1；这时所有单元都为候选稠密单元；第3步：扫描原数据表，找出n维子空间中落在每个候选稠密单元的数据点数；第4步：根据select thresh的值找出n维子空间中的稠密单元；第5步：用MDL-based算法修剪子空间；第6步：由n维子空间中的稠密单元集求出n+1维子空间中的侯选稠密单元集，若n+1维子空间中的侯选稠密单元集不为空，跳转第3步第6步：用depth-first-search algorithm找出n维空间中的聚类；第7步：用greedy growth algorithm求覆盖每个聚类的最大区域集；第8步：求出每个聚类的最小覆盖；第9步：将聚类信息保存到“Minning_Result_XB”表中。,

注意事项

本文（CLIQUE算法的基本思路.ppt）为本站会员（小飞机）主动上传，三一办公仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若此文所含内容侵犯了您的版权或隐私，请立即通知三一办公（点击联系客服），我们立即给予删除！

温馨提示：如果因为网速或其他原因下载失败请重新下载，重复下载不扣分。