首页 | 本学科首页   官方微博 | 高级检索  
     检索      

结合PCA的t-SNE算法的并行化实现方法
引用本文:徐旸,王佳斌,彭凯.结合PCA的t-SNE算法的并行化实现方法[J].华侨大学学报(自然科学版),2022,0(5):685-692.
作者姓名:徐旸  王佳斌  彭凯
作者单位:华侨大学 工学院, 福建 泉州 362021
摘    要:为了提高大数据环境下高维非线性数据的处理速度和精确度,提出一种结合主成分分析(PCA)的基于t分布的随机近邻嵌入(t-SNE)算法.首先,通过主成分分析法对原始数据进行预处理,去除噪声点;然后,结合t-SNE算法,构建K最邻近(K-NN)图,以表示高维空间中数据的相似关系;最后,在Spark平台上进行并行化运算,并在BREAST CANCER,MNIST和CIFAR-10数据集上进行实验.结果表明:文中算法完成了高维数据至低维空间的有效映射,提升了算法的效率和精确度,可应用于大规模高维数据的降维.

关 键 词:高维数据  Spark平台  降维  可视化  t-SNE算法

Parallel Implementation Method of t-SNE Algorithm Combined With PCA
XU Yang,WANG Jiabin,PENG Kai.Parallel Implementation Method of t-SNE Algorithm Combined With PCA[J].Journal of Huaqiao University(Natural Science),2022,0(5):685-692.
Authors:XU Yang  WANG Jiabin  PENG Kai
Institution:College of Engineering, Huaqiao University, Quanzhou 362021, China
Abstract:In order to improve the processing speed and accuracy of high-dimensional nonlinear data based on t distribution in the big data environment, a random neighbor embedding(t-SNE)algorithm combined with principal component analysis(PCA)is proposed. Firstlly, the original data is preprocessed by the principal component analysis method to remove noise points. Then, combined with the t-SNE algorithm, the K nearest neighbor(K-NN)graph is constructed to represent the similarity relationship of the data in the high-dimensional space. Finally, on the Spark platform carry out parallel operation and experiment on BREAST CANCER, MNIST and CIFAR-10 data sets. The results show that the proposed algorithm complete the effective mapping of high-dimensional data to low-dimensional space, improves the efficiency and accuracy of the algorithm, and can be applied to large-scale high-dimensional data dimensionality reduction.
Keywords:high-dimensional data  Spark platform  dimensionality reduction  visualization  t-SNE algorithm
点击此处可从《华侨大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《华侨大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号