首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
针对神经网络分类模型对美国联邦运输统计局(Bureau of Transportation Statistics, BTS)航班数据集中的不均衡数据预测误差较大的问题,采用自适应合成采样算法(adaptive synthetic sampling approach, ADASYN)和合成少数类过采样算法(synthetic minority over-sampling technique, SMOTE)对航班延误类别进行平衡处理,并用随机森林(random forest, RF)模型进行训练和贝叶斯调参。结果表明:与不经过平衡采样的方法比较,该方法在权重平均下的精确率、召回率和F1评分分别提高了19%、8%和16%;分类预测准确率提升8.03%,模型拟合指数AUC(area under curve)提升5.4%。同时,采用多特征相融合的图神经网络模型Graph WaveNet对航班平均延误时间进行预测。实验结果表明:与单特征模型比较,该模型平均绝对误差和均方根误差分别降低了16%和12.45%。这些方法和结果对研究航班延误分类和预测算法研究具有参考价值。  相似文献   

2.
采用少类样本合成过采样技术(SMOTE)与二叉树多类支持向量机(BTSVM)相结合的入侵检测算法来解决实际应用中经常遇到的类别不平衡的分类问题.该方法首先对不平衡类别的训练集使用BTSVM分类,然后对求出各分类器中的支持向量使用SMOTE方法进行向上采样,最后用不平衡类别的测试集在新的分类模型中进行测试.实验结果表明本算法能够有效地提高不平衡数据集的分类性能.  相似文献   

3.
针对不平衡数据集的低分类准确性,提出基于蚁群聚类改进的SMOTE不平衡数据过采样算法ACC-SMOTE。一方面利用改进的蚁群聚类算法将少数类样本划分为不同的子簇,充分考虑类间与类内数据的不平衡,根据子簇所占样本的比例运用SMOTE算法进行过采样,从而降低类内数据的不平衡度;另一方面对过采样后的少数类样本采用Tomek Links数据清理技术进行及时修正,清除数据集中的噪声和抽样方法产生的重叠样例,从而保证合成样本的质量。本文所用训练数据集和测试数据集均为UCI数据集。实验结果表明本算法可以明显提高不平衡数据集的分类精度,从而提高分类器的分类性能。  相似文献   

4.
提出一种改进随机子空间与C4.5决策树算法相结合的分类算法.以C4.5算法构建决策树作为集成学习的基分类器,每次迭代初始,将SMOTE采样技术与随机子空间方法相结合,生成在特征空间和数据分布上差异明显的合成样例,为基分类器提供多样化的平衡训练数据集,采用绝大多数投票方法进行最终决策的融合输出.实验结果表明,该方法对少数类和多数类均具有较高的识别率.  相似文献   

5.
张阳  张涛  陈锦  王禹  邹琪 《北京理工大学学报》2019,39(12):1258-1262
网络入侵检测已经广泛运用机器学习模型,但是研究者们多关注模型选择和参数优化,很少考虑数据不平衡的影响,往往会导致少数类入侵样本的检测效果较差.针对该问题,以SMOTE (synthetic minority oversampling technique)数据再平衡算法为研究重点,应用入侵检测数据集KDD99作为原始训练集,使用简单抽样和SMOTE算法生成再平衡训练集.采用多种机器学习模型分别在原始训练集和再平衡训练集进行5折交叉验证.实验结果表明,与原始训练集相比,使用再平衡训练集建模能够在不降低甚至提高多数类样本识别效果前提下,使少数类样本的识别准确率和召回率增强10%~20%.因此,SMOTE算法对不平衡样本下的网络入侵检测有显著的提升作用.   相似文献   

6.
面向不平衡数据集的一种精化Borderline-SMOTE方法   总被引:2,自引:0,他引:2  
合成少数类过采样技术(SMOTE)是一种被广泛使用的用来处理不平衡问题的过采样方法,SMOTE方法通过在少数类样本和它们的近邻间线性插值来实现过采样.Borderline-SMOTE方法在SMOTE方法的基础上进行了改进,只对少数类的边界样本进行过采样,从而改善样本的类别分布.通过进一步对边界样本加以区分,对不同的边界样本生成不同数目的合成样本,提出了面向不平衡数据集的一种精化Borderline-SMOTE方法(RB-SMOTE).仿真实验采用支持向量机作为分类器对几种过采样方法进行比较,实验中采用了10个不平衡数据集,它们的不平衡率从0.064 7到0.536 0.实验结果表明:RB-SMOTE方法能有效地改善不平衡数据集的类分布的不平衡性.  相似文献   

7.
现实世界中的数据挖掘经常涉及从类别分布不平衡的数据集学习,少数类的数量相比于其他类较少.从包含少数类的数据集中学习,通常会产生偏向于多数类的预测分类器,但对少数类的预测精度较差.针对少数类学习提出一种新的集成算法Cost-SMOTEBoost,该算法是SMOTE算法和AdaCost算法的结合.通过实验表明,Cost-SMOTEBoost算法在不降低精确率的情况下提高了召回率,从而提高了在分布不平衡数据集上的表现.  相似文献   

8.
不平衡数据集广泛存在,对其的有效识别往往是分类的重点,但传统的支持向量机在不平衡数据集上的分类效果不佳.本文提出将数据采样方法与SVM结合,先对原始数据中的少类样本进行SMOTE采样,再使用SVM进行分类.人工数据集和UCI数据集的实验均表明,使用SMOTE采样以后,SVM的分类性能得到了提升.  相似文献   

9.
少数类样本合成过抽样技术(SMOTE)是一种过抽样数据预处理算法,是在两个少数类之间随机插入一个新的少数类样本.为了解决SMOTE算法生成少数样本随机性的局限性,在考虑多数类样本分布会对少数样本的生成产生影响的基础上,提出了改进的SMOTE算法.在WEKA平台上分别使用改进前后的SMOTE算法对选用的UCI数据集进行过抽样数据预处理,并使用朴素贝叶斯、决策树和K邻近分类器对过抽样后的数据集进行分类,选择几何均数(G-mean)和曲线下面积(AUC)两个评价指标,实验显示改进后的SMOTE算法预处理的数据集的分类效果更好,证明改进后的SMOTE算法生成的少数类样本更加合理.  相似文献   

10.
针对传统的合成少数类过采样技术(synthetic minority oversampling technique,SMOTE)在类别区域重合的数据集应用时,可能产生多个更接近多数类的人工样例,甚至突破类别边界,从而影响整体分类性能的情况,提出了一种最近三角区域的SMOTE方法,使合成的人工样例只出现在少数类样例的最近三角区域内部,并且删除掉距离多数类更近的合成样例,从而使生成的样例更接近少数类,且不突破原始的类别边界。实验分别在人工数据集和改进的UCI数据集上进行,并和原始的SMOTE方法分别在G-mean和F-value的评价指标上进行了对比。实验结果验证了改进的SMOTE方法在类别区域有重合的数据集上要优于原始SMOTE方法。  相似文献   

11.
Language markedness is a common phenomenon in languages, and is reflected from hearing, vision and sense, i.e. the variation in the three aspects such as phonology, morphology and semantics. This paper focuses on the interpretation of markedness in language use following the three perspectives, i.e. pragmatic interpretation, psychological interpretation and cognitive interpretation, with an aim to define the function of markedness.  相似文献   

12.
The discovery of the prolific Ordovician Red River reservoirs in 1995 in southeastern Saskatchewan was the catalyst for extensive exploration activity which resulted in the discovery of more than 15 new Red River pools. The best yields of Red River production to date have been from dolomite reservoirs. Understanding the processes of dolomitization is, therefore, crucial for the prediction of the connectivity, spatial distribution and heterogeneity of dolomite reservoirs.The Red River reservoirs in the Midale area consist of 3~4 thin dolomitized zones, with a total thickness of about 20 m, which occur at the top of the Yeoman Formation. Two types of replacement dolomite were recognized in the Red River reservoir: dolomitized burrow infills and dolomitized host matrix. The spatial distribution of dolomite suggests that burrowing organisms played an important role in facilitating the fluid flow in the backfilled sediments. This resulted in penecontemporaneous dolomitization of burrow infills by normal seawater. The dolomite in the host matrix is interpreted as having occurred at shallow burial by evaporitic seawater during precipitation of Lake Almar anhydrite that immediately overlies the Yeoman Formation. However, the low δ18O values of dolomited burrow infills (-5.9‰~ -7.8‰, PDB) and matrix dolomites (-6.6‰~ -8.1‰, avg. -7.4‰ PDB) compared to the estimated values for the late Ordovician marine dolomite could be attributed to modification and alteration of dolomite at higher temperatures during deeper burial, which could also be responsible for its 87Sr/86Sr ratios (0.7084~0.7088) that are higher than suggested for the late Ordovician seawaters (0.7078~0.7080). The trace amounts of saddle dolomite cement in the Red River carbonates are probably related to "cannibalization" of earlier replacement dolomite during the chemical compaction.  相似文献   

13.
何延凌 《科技信息》2008,(4):258-258
Language is a means of verbal communication. People use language to communicate with each other. In the society, no two speakers are exactly alike in the way of speaking. Some differences are due to age, gender, statue and personality. Above all, gender is one of the obvious reasons. The writer of this paper tries to describe the features of women's language from these perspectives: pronunciation, intonation, diction, subjects, grammar and discourse. From the discussion of the features of women's language, more attention should be paid to language use in social context. What's more, the linguistic phenomena in a speaking community can be understood more thoroughly.  相似文献   

14.
AcomputergeneratorforrandomlylayeredstructuresYUJia shun1,2,HEZhen hua2(1.TheInstituteofGeologicalandNuclearSciences,NewZealand;2.StateKeyLaboratoryofOilandGasReservoirGeologyandExploitation,ChengduUniversityofTechnology,China)Abstract:Analgorithmisintrod…  相似文献   

15.
理论推导与室内实验相结合,建立了低渗透非均质砂岩油藏启动压力梯度确定方法。首先借助油藏流场与电场相似的原理,推导了非均质砂岩油藏启动压力梯度计算公式。其次基于稳定流实验方法,建立了非均质砂岩油藏启动压力梯度测试方法。结果表明:低渗透非均质砂岩油藏的启动压力梯度确定遵循两个等效原则。平面非均质油藏的启动压力梯度等于各级渗透率段的启动压力梯度关于长度的加权平均;纵向非均质油藏的启动压力梯度等于各渗透率层的启动压力梯度关于渗透率与渗流面积乘积的加权平均。研究成果可用于有效指导低渗透非均质砂岩油藏的合理井距确定,促进该类油藏的高效开发。  相似文献   

16.
As an American modern novelist who were famous in the literary world, Hemingway was not a person who always followed the trend but a sharp observer. At the same time, he was a tragedy maestro, he paid great attention on existence, fate and end-result. The dramatis personae's tragedy of his works was an extreme limit by all means tragedy on the meaning of fearless challenge that failed. The beauty of tragedy was not produced on the destruction of life, but now this kind of value was in the impact activity. They performed for the reader about the tragedy on challenging for the limit and the death.  相似文献   

17.
本文叙述了对海南岛及其毗邻大陆边缘白垩纪到第四纪地层岩石进行古地磁研究的全部工作过程。通过分析岩石中剩余磁矢量的磁偏角及磁倾角的变化,提出海南岛白垩纪以来经历的构造演化模式如下:早期伴随顺时针旋转而向南迁移,后期伴随逆时针转动并向北运移。联系该地区及邻区的地质、地球物理资料,对海南岛上述的构造地体运动提出以下认识:北部湾内早期有一拉张作用,主要是该作用使湾内地壳显著伸长减薄,形成北部湾盆地。从而导致了海南岛的早期构造运动,而海南岛后期的构造运动则主要是受南海海底扩张的影响。海南地体运动规律的阐明对于了解北部湾油气盆地的形成演化有重要的理论和实际意义。  相似文献   

18.
There are numerous geometric objects stored in the spatial databases. An importance function in a spatial database is that users can browse the geometric objects as a map efficiently. Thus the spatial database should display the geometric objects users concern about swiftly onto the display window. This process includes two operations:retrieve data from database and then draw them onto screen. Accordingly, to improve the efficiency, we should try to reduce time of both retrieving object and displaying them. The former can be achieved with the aid of spatial index such as R-tree, the latter require to simplify the objects. Simplification means that objects are shown with sufficient but not with unnecessary detail which depend on the scale of browse. So the major problem is how to retrieve data at different detail level efficiently. This paper introduces the implementation of a multi-scale index in the spatial database SISP (Spatial Information Shared Platform) which is generalized from R-tree. The difference between the generalization and the R-tree lies on two facets: One is that every node and geometric object in the generalization is assigned with a importance value which denote the importance of them, and every vertex in the objects are assigned with a importance value,too. The importance value can be use to decide which data should be retrieve from disk in a query. The other difference is that geometric objects in the generalization are divided into one or more sub-blocks, and vertexes are total ordered by their importance value. With the help of the generalized R-tree, one can easily retrieve data at different detail levels.Some experiments are performed on real-life data to evaluate the performance of solutions that separately use normal spatial index and multi-scale spatial index. The results show that the solution using multi-scale index in SISP is satisfying.  相似文献   

19.
20.
The elongation method,originally proposed by Imamura was further developed for many years in our group.As a method towards O(N)with high efficiency and high accuracy for any dimensional systems.This treatment designed for one-dimensional(ID)polymers is now available for three-dimensional(3D)systems,but geometry optimization is now possible only for 1D-systems.As an approach toward post-Hartree-Fock,it was also extended to  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号