首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
为了更好地解决DNA微阵列数据的分类问题并进一步提高系统的识别率,提出了一种用于DNA微阵列数据分类的演化硬件多分类器Ada Boost选择性集成学习方法.在系统集成阶段,介绍了2种改进的Ada Boost算法,分别探讨了以样本标记提升抽样有效容量和直接面向组合分类器分类精度提升的选择性集成策略.对急性白血病、肺癌、结肠癌数据集进行了试验.结果表明,基于Ada Boost集成学习的演化硬件方法对白血病、肺癌、结肠癌的平均识别率为97.06%,99.32%,和94.44%.相对于传统演化硬件集成学习方法,文中方法保证更优识别率的同时有效降低了硬件实现代价.  相似文献   

2.
【目的】机器学习中不同算法适用于具有不同分布特征的数据集。在用整个训练集上训练得到的单个分类器预测新样本类别时,由于缺少对局部区域样本的针对性,可能导致分类器对某一区域数据的预测能力较差而产生错误分类。为了解决这个问题,提出基于k-means+ +的多分类器选择算法。【方法】首先用3种分类综合性能较好的算法———Ada-Boost、SVM、随机森林(RF)在训练集上分别训练得到3个分类器作为候选基分类器,然后利用k-means++算法将训练数据集分为k个簇,用3个候选分类器分别对每个簇进行分类测试,选择对这一簇中数据分类精度最高的分类器作为与它的数据相似数据的分类器。在对新样本进行类别预测时,首先判定样本属于哪个簇,然后用它的分类器进行分类预测。【结果】实验结果表明,新算法在9个UCI数据集上优于单个分类算法。【结论】基于局部区域动态选择最优分类器可以提高模型分类准确性。
  相似文献   

3.
分类器模拟算法及其应用   总被引:3,自引:0,他引:3  
针对标准数据集在评估多分类器系统的组合方法时存在的不足,设计了一种新的分类器模拟算法.该算法利用分类器的识别率建立混淆矩阵,由混淆矩阵生成基分类器的决策,进而结合分类器之间的相关性度量生成所有的模拟数据.通过实验评估表明,该算法能够模拟任意多个分类器和任意多个模式类别的数据,且能够表达出分类器之间的关联性.又应用生成的模拟数据集对多数投票和堆叠泛化这2种组合方法进行了实验,结果表明分类器之间的负相关有助于提高系统的性能,特别是当单个分类器识别率取0.8、关联度从0.829 5降至-0.484 7时,多数投票和堆叠泛化的性能分别提高了14.98%和41.99%.  相似文献   

4.
为了充分利用数据信息进而提高分类正确率,提出一种证据神经网络的分类器,并据此构造了多分类器系统。首先将训练数据中的含混数据视为新类别——混合类,将原始的训练数据重组成含有混合类的训练数据,然后使用证据神经网络分类器系统用重组后含混合类的训练数据进行训练,对分类输出进行证据建模,并使用多种不同的证据组合规则实现多分类器融合。采用人工数据集和UCI数据集进行对比实验,结果表明:与其他采用神经网络的多分类器系统相比,采用证据神经网络的多分类器系统能有效提高分类正确率;在数据集Magic 04和Waveform2上,采用提出的多分类器系统比采用投票法的神经网络多分类器系统的分类正确率分别提高了6%和10%左右。  相似文献   

5.
为提高分类系统的性能,提出一种统一多种元学习算法的元学习框架,定义并描述了并行和串行两种组合方式.由基分类器的分类结果构成新属性,并加入到特征向量中以形成元数据.通过扩展特征向量,元学习增强了对假设空间的表达能力,降低了系统的偏差.在加州大学提供的标准数据集上对元学习策略进行了实验研究,结果表明:与多数投票、最大规则、最小规则等融合方法实现的多分类器系统相比,并行和串行组合在所用数据集上的平均分类错误率可分别降低39.12%和40.56%,且在n分交叉验证中n值的增加并不能改进分类性能,串行组合中的基分类器的顺序对分类错误率没有显著影响.  相似文献   

6.
基于演化超网络的中文文本分类方法   总被引:2,自引:0,他引:2  
为了提高中文文本的分类效果,提出了一种基于演化超网络的中文文本分类方法.采用中国科学院计算技术研究所的汉语词法分析系统对中文文本进行分词,保留文本中的名词、动词和形容词作为特征;以X2统计方法进行特征选择;利用布尔权重计算特征权值.经处理后的特征向量作为系统的训练集和测试集数据.运用超边替代策略训练超网络分类模型,并实现对测试集特征向量的分类.对不同阶数设定下的演化超网络模型进行了性能分析,并将其与传统的KNN和SVM算法进行了比较.结果表明,本方法对复旦大学语料和搜狐语料可获得87.2%和72.5%的宏识别率、86.9%和70.5%的宏召回率、87.0%和71.5%的宏F1,接近或优于KNN和SVM分类方法.所提出的方法是一种有效的中文文本分类手段.  相似文献   

7.
基于代表的邻域覆盖粗糙集分类算法,在某些数据集上表现良好,数据的类别不平衡问题严重影响算法的分类精度.为尽量消除类别不平衡问题的影响,在k折交叉验证方法的基础上,针对基于代表的邻域覆盖粗糙集分类算法,提出了3种集成策略.策略1依靠k折交叉验证,获得对应的k个基分类器,所有的基分类器组成委员会对未分类样本分类;在策略1的基础上,策略2选择分类精度相对较高的基分类器组成委员会,对未分类的样本进行分类;策略3在前2种策略的基础上,利用主动学习的思想,对训练集进行扩充,得到新的分类器再对未分类样本分类.实验所用数据集为UCI标准数据集,且对k的取值做了对比实验.结果显示,3种策略均有不同程度的提升,且k取5时总能取得较好的提升效果.对于不同数据集,应选择相适应的改进策略.  相似文献   

8.
基于机器学习的网络异常检测方法是入侵检测领域的重要研究内容.传统的机器学习方法需要大量的已标记样本对分类器进行训练,然而已标记样本通常较难获取,导致分类器训练困难;此外单分类器训练面临难以消除的分类偏向性和检测孔洞.针对上述问题,本文提出了一种基于多分类器协同训练的异常检测方法MCAD,该方法利用少量的已标记样本和大量的未标记样本对多个分类器进行协同训练,以减少分类的偏向性和检测孔洞.对比实验采用经典的网络异常检测数据集KDD CUP99对MCAD的异常检测性能进行验证。实验结果表明,MCAD有效地降低了检测器训练代价,提高了网络异常检测性能.  相似文献   

9.
针对多分类器系统差异性评价中无法直接处理模糊数据的问题,提出了一种采用互补信息熵的分类器集成差异性度量(CIE)方法。首先利用训练数据生成一系列基分类器,并对测试数据进行分类,将分类结果依次组合生成分类数据空间;然后采用模糊关系条件下的互补信息熵度量分类数据空间蕴含的不确定信息量,据此信息量判断基分类器间的差异性;最后以加入基分类器后数据空间差异性增加为选择分类器的基本准则,构建集成分类器系统,用于验证CIE差异性度量与集成分类精度之间的关系。实验结果表明,与Q统计方法相比,利用CIE方法进行分类器集成,平均集成分类精度提高了2.03%,分类器系统集成规模降低约17%,而且提高了集成系统处理多样化数据的能力。  相似文献   

10.
Boosting算法中基分类器权重的动态赋值   总被引:3,自引:1,他引:2  
Boosting是一种有效的分类器组合方法,其通过加权投票来组合多个基分类器进行分类.在对基分类器进行权重赋值时,该算法采用了以基分类器在当前训练集上的错误率的某种变形来对基分类器进行权重赋值,这是一种静态的赋值方法.介绍一种动态地对基分类器进行赋权重的方法,这种方法利用当前测试实例属于某个被错误分类数据子集的程度,并按照程度的大小给相应的基分类器赋适当的权重.跟静态赋权重相比,这种方法考虑了测试实例属性取值的不同,进而能动态地调整基分类器的权重,从而达到进一步优化分类性能的目的.实验表明,动态权重赋值的方法在大多数情况下跟静态赋值相比具有更好的分类性能.  相似文献   

11.
The discovery of the prolific Ordovician Red River reservoirs in 1995 in southeastern Saskatchewan was the catalyst for extensive exploration activity which resulted in the discovery of more than 15 new Red River pools. The best yields of Red River production to date have been from dolomite reservoirs. Understanding the processes of dolomitization is, therefore, crucial for the prediction of the connectivity, spatial distribution and heterogeneity of dolomite reservoirs.The Red River reservoirs in the Midale area consist of 3~4 thin dolomitized zones, with a total thickness of about 20 m, which occur at the top of the Yeoman Formation. Two types of replacement dolomite were recognized in the Red River reservoir: dolomitized burrow infills and dolomitized host matrix. The spatial distribution of dolomite suggests that burrowing organisms played an important role in facilitating the fluid flow in the backfilled sediments. This resulted in penecontemporaneous dolomitization of burrow infills by normal seawater. The dolomite in the host matrix is interpreted as having occurred at shallow burial by evaporitic seawater during precipitation of Lake Almar anhydrite that immediately overlies the Yeoman Formation. However, the low δ18O values of dolomited burrow infills (-5.9‰~ -7.8‰, PDB) and matrix dolomites (-6.6‰~ -8.1‰, avg. -7.4‰ PDB) compared to the estimated values for the late Ordovician marine dolomite could be attributed to modification and alteration of dolomite at higher temperatures during deeper burial, which could also be responsible for its 87Sr/86Sr ratios (0.7084~0.7088) that are higher than suggested for the late Ordovician seawaters (0.7078~0.7080). The trace amounts of saddle dolomite cement in the Red River carbonates are probably related to "cannibalization" of earlier replacement dolomite during the chemical compaction.  相似文献   

12.
AcomputergeneratorforrandomlylayeredstructuresYUJia shun1,2,HEZhen hua2(1.TheInstituteofGeologicalandNuclearSciences,NewZealand;2.StateKeyLaboratoryofOilandGasReservoirGeologyandExploitation,ChengduUniversityofTechnology,China)Abstract:Analgorithmisintrod…  相似文献   

13.
本文叙述了对海南岛及其毗邻大陆边缘白垩纪到第四纪地层岩石进行古地磁研究的全部工作过程。通过分析岩石中剩余磁矢量的磁偏角及磁倾角的变化,提出海南岛白垩纪以来经历的构造演化模式如下:早期伴随顺时针旋转而向南迁移,后期伴随逆时针转动并向北运移。联系该地区及邻区的地质、地球物理资料,对海南岛上述的构造地体运动提出以下认识:北部湾内早期有一拉张作用,主要是该作用使湾内地壳显著伸长减薄,形成北部湾盆地。从而导致了海南岛的早期构造运动,而海南岛后期的构造运动则主要是受南海海底扩张的影响。海南地体运动规律的阐明对于了解北部湾油气盆地的形成演化有重要的理论和实际意义。  相似文献   

14.
Various applications relevant to the exciton dynamics,such as the organic solar cell,the large-area organic light-emitting diodes and the thermoelectricity,are operating under temperature gradient.The potential abnormal behavior of the exicton dynamics driven by the temperature difference may affect the efficiency and performance of the corresponding devices.In the above situations,the exciton dynamics under temperature difference is mixed with  相似文献   

15.
The elongation method,originally proposed by Imamura was further developed for many years in our group.As a method towards O(N)with high efficiency and high accuracy for any dimensional systems.This treatment designed for one-dimensional(ID)polymers is now available for three-dimensional(3D)systems,but geometry optimization is now possible only for 1D-systems.As an approach toward post-Hartree-Fock,it was also extended to  相似文献   

16.
17.
The explosive growth of the Internet and database applications has driven database to be more scalable and available, and able to support on-line scaling without interrupting service. To support more client's queries without downtime and degrading the response time, more nodes have to be scaled up while the database is running. This paper presents the overview of scalable and available database that satisfies the above characteristics. And we propose a novel on-line scaling method. Our method improves the existing on-line scaling method for fast response time and higher throughputs. Our proposed method reduces unnecessary network use, i.e. , we decrease the number of data copy by reusing the backup data. Also, our on-line scaling operation can be processed parallel by selecting adequate nodes as new node. Our performance study shows that our method results in significant reduction in data copy time.  相似文献   

18.
R-Tree is a good structure for spatial searching. But in this indexing structure,either the sequence of nodes in the same level or sequence of traveling these nodes when queries are made is random. Since the possibility that the object appears in different MBR which have the same parents node is different, if we make the subnode who has the most possibility be traveled first, the time cost will be decreased in most of the cases. In some case, the possibility of a point belong to a rectangle will shows direct proportion with the size of the rectangle. But this conclusion is based on an assumption that the objects are symmetrically distributing in the area and this assumption is not always coming into existence. Now we found a more direct parameter to scale the possibility and made a little change on the structure of R-tree, to increase the possibility of founding the satisfying answer in the front sub trees. We names this structure probability based arranged R-tree (PBAR-tree).  相似文献   

19.
There are numerous geometric objects stored in the spatial databases. An importance function in a spatial database is that users can browse the geometric objects as a map efficiently. Thus the spatial database should display the geometric objects users concern about swiftly onto the display window. This process includes two operations:retrieve data from database and then draw them onto screen. Accordingly, to improve the efficiency, we should try to reduce time of both retrieving object and displaying them. The former can be achieved with the aid of spatial index such as R-tree, the latter require to simplify the objects. Simplification means that objects are shown with sufficient but not with unnecessary detail which depend on the scale of browse. So the major problem is how to retrieve data at different detail level efficiently. This paper introduces the implementation of a multi-scale index in the spatial database SISP (Spatial Information Shared Platform) which is generalized from R-tree. The difference between the generalization and the R-tree lies on two facets: One is that every node and geometric object in the generalization is assigned with a importance value which denote the importance of them, and every vertex in the objects are assigned with a importance value,too. The importance value can be use to decide which data should be retrieve from disk in a query. The other difference is that geometric objects in the generalization are divided into one or more sub-blocks, and vertexes are total ordered by their importance value. With the help of the generalized R-tree, one can easily retrieve data at different detail levels.Some experiments are performed on real-life data to evaluate the performance of solutions that separately use normal spatial index and multi-scale spatial index. The results show that the solution using multi-scale index in SISP is satisfying.  相似文献   

20.
The geographic information service is enabled by the advancements in general Web service technology and the focused efforts of the OGC in defining XML-based Web GIS service. Based on these models, this paper addresses the issue of services chaining,the process of combining or pipelining results from several interoperable GIS Web Services to create a customized solution. This paper presents a mediated chaining architecture in which a specific service takes responsibility for performing the process that describes a service chain. We designed the Spatial Information Process Language (SIPL) for dynamic modeling and describing the service chain, also a prototype of the Spatial Information Process Execution Engine (SIPEE) is implemented for executing processes written in SIPL. Discussion of measures to improve the functionality and performance of such system will be included.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号