首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于Hive的海量公交客流起讫点挖掘方法
引用本文:许智宏,王怡峥,王利琴,董永峰.基于Hive的海量公交客流起讫点挖掘方法[J].科学技术与工程,2020,20(20):8300-8309.
作者姓名:许智宏  王怡峥  王利琴  董永峰
作者单位:河北工业大学人工智能与数据科学学院,天津300401;河北省大数据计算重点实验室,天津 300401;河北工业大学人工智能与数据科学学院,天津300401
基金项目:天津市科技计划项目(No.14ZCDGSF00124)、天津市自然科学基金项目(No.16JCYBJC15600)
摘    要:目前起讫点(origin-destination,OD)挖掘方法普遍存在无法并行分析多条线路、低效率、预测率不足的问题。考虑到Hive在海量数据上的查询性能优势,基于Hive实现了OD挖掘,克服了上述问题。基于时间阈值匹配上车站点,失配记录基于站点上客数再次匹配。基于表连接的出行链算法预测下车站点,预测失败的记录基于概率进行两次预测。以石家庄2018年1月1日—2018年3月27日的IC卡刷卡数据和调度数据进行OD挖掘,在清洗后的11 312 505条出行记录中挖掘出11 270 037条OD记录,预测率达到99.6%,出行与吸引校验质量较高,Hive并行调优开启后耗时17 829.04 s。可见该方法满足生产环境中离线挖掘OD的业务需求。

关 键 词:客流  起讫点(OD)  下车站点  Hive
收稿时间:2019/8/29 0:00:00
修稿时间:2020/4/17 0:00:00

A Methodology of Massive Bus Passenger Origin-Destination Mining Based on Hive
XU Zhi-hong,WANG Yi-zheng,DONG Yong-feng.A Methodology of Massive Bus Passenger Origin-Destination Mining Based on Hive[J].Science Technology and Engineering,2020,20(20):8300-8309.
Authors:XU Zhi-hong  WANG Yi-zheng  DONG Yong-feng
Institution:School of Artificial Intelligence, Hebei University of Technology
Abstract:The current OD mining method has some ubiquitous problem that unable to analyze multiple lines in parallel, low efficiency, and low prediction rate. Considering the query performance advantages of Hive on massive data, OD mining based on Hive overcomes the problems above. The time threshold was used to match the boarding station, the failed matched record will be matched again base on the number of boarding passenger. Trip-chaining method base on table joining was used to match alighting station, the failed predicted record will be matched twice base on probability. The IC card consumption data and the scheduling data in Shijiazhuang city from January 1, 2018, to March 27, 2018, were used to do OD mining, 11,270,037 OD records were mined from cleaned 11,312,505 trip records. The matching rate reached 99.6%, with the high quality of travel and attraction checking results. Spend 17829.04s on running this method with Hive parallel on. The results show that the method satisfies the business requirements of offline OD mining in a production environment.
Keywords:passenger flow  origin-destination  alighting station  Hive  OD
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《科学技术与工程》浏览原始摘要信息
点击此处可从《科学技术与工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号