数据科学与工程系列学术报告

发布时间:2017-12-13浏览量:169

报告题目:Trajectory Similarity Join in Spatial Networks
主讲人:Prof. Dr. Shuo Shang
时间:2017-09-22 13:30
地点:华师大中北校区数学馆201室
主办单位:华东师范大学数据科学与工程学院

个人简介:
商烁,沙特阿卜杜拉国王科技大学(KAUST)极限计算中心研究员,博士生导师,中国人民大学大数据管理与分析方法研究北京市重点实验室主任助理。中国GIS协会理论与方法委员会委员,中国计算机学会数据库专委会委员,2016年教育部-中移动联合实验室评审专家组成员。曾入选北京市科技新星计划、北京市优秀人才计划及中国石油大学(北京)青年拔尖人才计划。2008年本科毕业于北京大学,2012年博士毕业于澳大利亚昆士兰大学,2012-2013在丹麦奥尔堡大学任博士后/研究助理教授,2013-2016任教于中国石油大学(北京)。研究方向包括城市计算、时空数据库、社交媒体分析等。在相关领域发表论文40余篇,含CCF A类论文15篇,所发表SCI论文影响因子之和大于50,SCI引用150余次,Google Scholar引用560余次。曾担任APWeb/WAIM 2017大会演示主席、ICDE 2013移动对象分会场主席,SIGMOD 2018、CIKM 2017、DASFAA 2014、2015 程序委员会委员,并担任VLDB Journal、IEEE TKDE、ACM TIST、ACM TSAS、IEEE TITS、Geoinformatica等数据管理和数据挖掘领域顶级期刊的评审专家。

报告内容:The matching of similar pairs of objects, called similarity join, is fundamental functionality in data management. We consider the case of trajectory similarity join (TS-Join), where the objects are trajectories of vehicles moving in road networks. Thus, given two sets of trajectories and a threshold θ, the TS-Join returns all pairs of trajectories from the two sets with similarity above θ. This join targets applications such as trajectory near-duplicate detection, data cleaning, ridesharing recommendation, and traffic congestion prediction.

With these applications in mind, we provide a purposeful definition of similarity. To enable efficient TS-Join processing on large sets of trajectories, we develop search space pruning techniques and take into account the parallel processing capabilities of modern processors. Specifically, we present a two-phase divide-and-conquer algorithm. For each trajectory, the algorithm first finds similar trajectories. Then it merges the results to achieve a final result. The algorithm exploits an upper bound on the spatiotemporal similarity and a heuristic scheduling strategy for search space pruning. The algorithm's per-trajectory searches are independent of each other and can be performed in parallel, and the merging has constant cost.  An empirical study with real data offers insight in the performance of the algorithm and demonstrates that is capable of outperforming a well-designed baseline algorithm by an order of magnitude.

shangshuo.jpg