首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 468 毫秒
1.
针对文本在聚类或分类时,由于数据高维稀疏导致相似度值低的问题,提出一种基于改进文本相似度计算的聚类方法.首先,利用向量空间模型VSM表示文本,采用余弦函数计算文本之间的相似度;然后,基于网络中节点的相似性传播原理,通过设置阈值找到与各个文本相似度较大的文本集合,进而使用Jaccard系数将两个文本之间相似度计算转化为两个文本集合之间的相似度计算;最后根据得到的文本相似度矩阵,利用谱聚类算法对文本进行聚类.在WebKB上的实验结果表明,与传统的K-means、谱聚类方法相比,该方法提高了聚类的准确度,召回率与F值.  相似文献   

2.
杨莉云  颜远海 《河南科学》2019,37(4):507-513
孤立点的存在使聚类中心的计算产生较大误差,影响K-means算法的聚类效果.针对该问题,引入谢林模型,使孤立点能够自动移动到其邻居所在位置,消除孤立点,同时,对K-means算法过程中的距离计算、初始聚类中心选取环节进行改进,提出基于孤立点自适应的K-means算法.该算法首先对原始数据进行归一化处理,以提高距离计算的准确性;然后,根据谢林模型的基本思想,将孤立点移动到其最近的多邻邻居;接着,由类簇的数目确定邻居样本的搜索范围,确定初始聚类中心;最后,根据移动后的数据集和初始聚类中心,进行K-means聚类.在UCI机器学习数据库中经典聚类数据集上的实验结果表明,该算法可显著提升聚类的精度,同时,簇的内聚性也比较好.  相似文献   

3.
为了有效解决云计算环境下海量数据的并行聚类问题,以典型的基于距离的Kmeans聚类算法为例,提出了一种MapReduce并行聚类优化算法.首先将差分进化算法与K-means算法相结合,从而利用差分进化算法的强大全局搜索能力克服典型K-means算法对初始中心较为敏感的缺点,利于增强全局最优解的稳定性.然后把优化后的算法在Hadoop的Map Reduce框架下做了并行化的设计.实验结果表明,与其他多种分布式设计相比,提出的并行聚类优化算法能够在保证聚类效果的前提下,大大减少了运算的时间,提高了大规模数据的聚类效率.  相似文献   

4.
针对二分K-means算法存在的误判实例无法再参与后续划分并降低了聚类的精度的问题.提出一种基于部分实例重判的二分K-means算法,通过区分目标簇和候选簇,过滤出候选簇中的召回实例,对召回实例所应归属的簇进行重判,实现了误判实例的正确聚类.实验结果表明,改进算法对三个实验数据集都是有效的,在不同程度上提高了聚类的准确性,同时对算法的运行速度也有小幅度的提升.  相似文献   

5.
针对电子病历中疾病诊断文本同义词识别和命名标准化问题,提出了一种自适应的文本聚类方法.首先提出了一种新的基于集合的文本相似性度量算法;然后采用基于相似度分布的文本聚类算法实现同义文本识别,该算法能够自动确定类簇个数;最后采用基于序列模式的中心概念提取算法实现了疾病命名的标准化,同时对聚类簇进行合并和优化,进一步提升了聚类的准确性.测试结果表明,所述方法具有较高的准确率和聚类效率,在病历文本的预处理、分类和分析中具有广泛意义.  相似文献   

6.
短文本聚类在数据挖掘中发挥着重要的作用,传统的短文本聚类模型存在维度高、数据稀疏和缺乏语义信息等问题,针对互联网短文本特征稀疏、语义存在奇异性和动态性而导致的短文本聚类性能较差的问题,提出了一种基于特征词向量的文本表示和基于特征词移动距离的短文本聚类算法。首先使用Skip-gram模型(Continuous Skip-gram Model)在大规模语料中训练得到表示特征词语义的词向量;然后利用欧式距离计算特征词相似度,引入EMD(Earth Movers Distance)来计算短文本间的相似度;最后将其应用到Kmeans聚类算法中实现短文本聚类。在3个数据集上进行的评测结果表明,效果优于传统的聚类算法。  相似文献   

7.
近邻传播(Affinity Propagation,AP)聚类是基于数据点间消息传递的算法,主要通过数据间的相似度实现聚类.与传统的聚类方法相比,AP聚类无需事先给定聚类数目就可实现聚类,因此具有快速高效的优点,然而在处理高维复杂数据集时存在随着聚类效率提升而准确度不高的问题.为改善AP聚类算法的效率和精度,提出基于类内和类间距离的粗粒度并行AP聚类算法——IOCAP.首先引入粒度思想将初始数据集划分成多个子集;其次对各子集结合类内和类间距离进行相似度矩阵的改进计算,最后基于MapReduce模型实现改进后的并行AP聚类.在真实数据集上的实验表明,IOCAP算法在大数据集上有较好的适应性,能在保持AP聚类效果的同时有效地提升算法精度.  相似文献   

8.
传统K-means聚类算法通过欧式距离计算样本的相似度,将数据所有的属性特征均平等对待,忽略每个属性特征的不同贡献,导致样本相似度计算的准确率不高。针对这个不足,提出一种特征加权的K-means算法进行优化。首先,运用Softmax和Sigmoid逻辑回归函数计算特征权重,使得加权的欧式距离更能准确地表示样本相似度;其次,优化初始聚类中心选择策略,选择距离较大的K个样本作为初始聚类中心,可有效避免样本的错误聚类及空簇问题。实验结果表明,在UCI标准数据集中采用加权K-means聚类算法可以有效减少迭代次数,提高聚类的准确率、精确率和召回率。  相似文献   

9.
提出一种基于预聚类的潜在语义文献检索算法.首先,对待检索文档集进行预聚类,在潜在语义分析方法的基础上采用k-means聚类算法,寻找出各聚类簇的中心点;其次,在检索时,通过计算查询向量与各聚类簇中心点的相似度来进行检索.此方法有效解决了现有潜在语义文献检索算法在检索时需耗费大量时间计算查询向量与各文本向量之间的相似度的不足.另外还针对文献检索的特点,重新给出特征权重计算方法.实验结果表明,该方法缩短了检索的时间,提高了检索的效率.  相似文献   

10.
针对原始K-means算法的一系列问题,提出一种基于半监督的K-means聚类改进算法,能够自动进行聚类,找出最优K值,并且最大限度地找出孤立点.首先根据样本集自身的特点,按照"类内尽可能相似"原则一步一步形成数据集,然后对数据集进行"去噪"与合并相似簇,最后,利用少量的标记信息指导和修正聚类结果.在UCI的多个数据集...  相似文献   

11.
Language markedness is a common phenomenon in languages, and is reflected from hearing, vision and sense, i.e. the variation in the three aspects such as phonology, morphology and semantics. This paper focuses on the interpretation of markedness in language use following the three perspectives, i.e. pragmatic interpretation, psychological interpretation and cognitive interpretation, with an aim to define the function of markedness.  相似文献   

12.
The Williston Basin is a significant petroleum province, containing oil production zones that include the Middle Cambrian to Lower Ordovician, Upper Ordovician, Middle Devonian, Upper Devonian and Mississippian and within the Jurassic and Cretaceous. The oils of the Williston Basin exhibit a wide range of geochemical characteristics defined as "oil families", although the geochemical signature of the Cambrian Deadwood Formation and Lower Ordovician Winnipeg reservoired oils does not match any "oil family". Despite their close stratigraphic proximity, it is evident that the oils of the Lower Palaeozoic within the Williston Basin are distinct. This suggests the presence of a new "oil family" within the Williston Basin. Diagnostic geochemical signatures occur in the gasoline range chromatograms, within saturate fraction gas chromatograms and biomarker fingerprints. However, some of the established criteria and cross-plots that are currently used to segregate oils into distinct genetic families within the basin do not always meet with success, particularly when applied to the Lower Palaeozoic oils of the Deadwood and Winnipeg Formation.  相似文献   

13.
王慧 《科技信息》2008,(10):240-240
Wuthering Heights, Emily Bronte's only novel, was published in December of 1847 under the pseudonym Ellis Bell. The book did not gain immediate success, but it is now thought one of the finest novels in the English language. Catherine is the key character of this masterpiece, because everybody and everything center on her though she had a short life. We can understand this masterpiece better if we know Catherine well.  相似文献   

14.
何延凌 《科技信息》2008,(4):258-258
Language is a means of verbal communication. People use language to communicate with each other. In the society, no two speakers are exactly alike in the way of speaking. Some differences are due to age, gender, statue and personality. Above all, gender is one of the obvious reasons. The writer of this paper tries to describe the features of women's language from these perspectives: pronunciation, intonation, diction, subjects, grammar and discourse. From the discussion of the features of women's language, more attention should be paid to language use in social context. What's more, the linguistic phenomena in a speaking community can be understood more thoroughly.  相似文献   

15.
The discovery of the prolific Ordovician Red River reservoirs in 1995 in southeastern Saskatchewan was the catalyst for extensive exploration activity which resulted in the discovery of more than 15 new Red River pools. The best yields of Red River production to date have been from dolomite reservoirs. Understanding the processes of dolomitization is, therefore, crucial for the prediction of the connectivity, spatial distribution and heterogeneity of dolomite reservoirs.The Red River reservoirs in the Midale area consist of 3~4 thin dolomitized zones, with a total thickness of about 20 m, which occur at the top of the Yeoman Formation. Two types of replacement dolomite were recognized in the Red River reservoir: dolomitized burrow infills and dolomitized host matrix. The spatial distribution of dolomite suggests that burrowing organisms played an important role in facilitating the fluid flow in the backfilled sediments. This resulted in penecontemporaneous dolomitization of burrow infills by normal seawater. The dolomite in the host matrix is interpreted as having occurred at shallow burial by evaporitic seawater during precipitation of Lake Almar anhydrite that immediately overlies the Yeoman Formation. However, the low δ18O values of dolomited burrow infills (-5.9‰~ -7.8‰, PDB) and matrix dolomites (-6.6‰~ -8.1‰, avg. -7.4‰ PDB) compared to the estimated values for the late Ordovician marine dolomite could be attributed to modification and alteration of dolomite at higher temperatures during deeper burial, which could also be responsible for its 87Sr/86Sr ratios (0.7084~0.7088) that are higher than suggested for the late Ordovician seawaters (0.7078~0.7080). The trace amounts of saddle dolomite cement in the Red River carbonates are probably related to "cannibalization" of earlier replacement dolomite during the chemical compaction.  相似文献   

16.
Location based services is promising due to its novel working style and contents.A software platform is proposed to provide application programs of typical location based services and support new applications developing efficiently. The analysis shows that this scheme is easy implemented, low cost and adapt to all kinds of mobile nework system.  相似文献   

17.
以AC-13级配为基础,将橡胶颗粒代替部分集料掺入混合料中,以低温弯曲试验为评价方法对不同橡胶颗粒掺量下沥青混合料的低温抗裂性进行研究,并引入应变能密度值对混合料的低温抗裂性进行综合评价.试验结果表明:橡胶颗粒沥青混合料试件的破坏微应变均超过2 300,满足冬寒区的技术指标;无论是否掺加橡胶颗粒,随着温度的下降,沥青混合料破坏时的最大弯拉强度增大,弯拉应变降低,劲度模量增大;弯曲应变能密度在胶粒掺量为1%左右时具有较大的弯曲应变能密度值,此时橡胶颗粒沥青混合料具有较好的低温抗裂性.  相似文献   

18.
理论推导与室内实验相结合,建立了低渗透非均质砂岩油藏启动压力梯度确定方法。首先借助油藏流场与电场相似的原理,推导了非均质砂岩油藏启动压力梯度计算公式。其次基于稳定流实验方法,建立了非均质砂岩油藏启动压力梯度测试方法。结果表明:低渗透非均质砂岩油藏的启动压力梯度确定遵循两个等效原则。平面非均质油藏的启动压力梯度等于各级渗透率段的启动压力梯度关于长度的加权平均;纵向非均质油藏的启动压力梯度等于各渗透率层的启动压力梯度关于渗透率与渗流面积乘积的加权平均。研究成果可用于有效指导低渗透非均质砂岩油藏的合理井距确定,促进该类油藏的高效开发。  相似文献   

19.
As an American modern novelist who were famous in the literary world, Hemingway was not a person who always followed the trend but a sharp observer. At the same time, he was a tragedy maestro, he paid great attention on existence, fate and end-result. The dramatis personae's tragedy of his works was an extreme limit by all means tragedy on the meaning of fearless challenge that failed. The beauty of tragedy was not produced on the destruction of life, but now this kind of value was in the impact activity. They performed for the reader about the tragedy on challenging for the limit and the death.  相似文献   

20.
Quality traits in wheat (Triticum aestirum L.) were studied by quantitative trait locus (QTL) analysis in a recombinant inbred line (RIL) population, a set of 131 lines derived from Chuan 35050 × Shannong 483 cross (ChSh). Grains from RILs were assayed for 21 quality traits related to protein and starch. A total of 35 putative QTLs for 19 traits with a single QTL explaining 7.99-40.52% of phenotypic variations were detected on 10 chromosomes, 1D, 2A, 2D, 3B, 3D, 5A, 6A, 6B, 6D, and 7B. The additive effects of 30 QTLs were positive, contributed by Chuan 35050, the remaining 5 QTLs were negative with the additive effect contributed by Shannong 483. For protein traits, 15 QTLs were obtained and most of them were located on chromosomes 1 D, 3B and 6D, while 20 QTLs for starch traits were detected and most of them were located on chromosomes 3D, 6B and 7B. Only 7 QTLs for protein and starch traits were co-located in three regions on chromosomes 1D, 2A and 2D. These protein and starch trait QTLs showed a distinct distribution pattern in certain regions and chromosomes. Twenty-two QTLs were clustered in 6 regions of 5 chromosomes. Two QTL clusters for protein traits were located on chromosomes 1D and 3B, respectively, three clusters for starch traits on chromosomes 3D, 6B and 7B, and one cluster including protein and starch traits on chromosome 1D.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号