数据科学与工程系列学术报告“From Structure-based to Semantic-based: Towards Effective XML Keyword Search”

发布时间:2018-01-12浏览量:156

时间:201811811:00-12:00

地点:华东师范大学中北校区数学馆201

报告主题:From Structure-based to Semantic-based: Towards Effective XML Keyword Search

Abstract:

Keyword search in XML has gained popularity as it provides a user-friendly and easy way for users to query the XML data. Existing XML keyword search approaches on XML trees such as Lowest Common Ancestor (LCA) and its variants such as SLCA, MLCA, VLCA, and ELCA, are all LCA-based and they rely on the hierarchical structure of the XML document. This causes serious problems in processing XML keyword queries, such as meaningless answers, duplicated answers, incomplete answers, missing answers, and schema dependent answers. We analyze these serious problems of existing keyword search methods and show that the main reason of causing these problems is due to the unawareness of the Object-Relationship-Attribute (ORA) semantics in XML.

With the knowledge of ORA-semantics in the XML document, we are able to detect duplications of objects and relationship and resolve the first three problems of the LCA-based search approaches.

We present a new novel concept, called Common Relative (CR), and an algorithm based on the CR semantics to find more answers beyond LCA, i.e., the missing answers. The algorithm is independent of schema designs of the same data content as well.

We extend the keyword query language to include keywords that match the metadata, i.e., the tag names in XML document, and with group-by and aggregate functions including count, max, min, sum, etc. To process extended keyword queries correctly, we must use the ORA-semantics in the XML document to detect duplications of objects and relationships. Without using ORA-semantics, keyword queries with aggregate functions will be computed wrongly and return incorrect answers.

ORA-Semantics can also be used to improve the quality of many database research areas such as RDB keyword search, data and schema integration, etc.

Speaker: Ling, Tok Wang

ling tok wang.jpg

Dr. LING Tok Wang is a professor in Computer Science Department at the National University of Singapore. He was Head of IT Division, Deputy Head of the Department of Information Systems and Computer Science, and Vice Dean of the School of Computing of the University. His research interests include Database Modeling, Entity-Relationship Approach, Object-Oriented Data Model, Normalization Theory, Semi-Structured Data Model, XML Twig Pattern Query Processing, XML and Relational Database Keyword Query Processing. He serves/served on the steering committees of 5 international conferences, including ER, DASFAA, DOOD, and BigComp. He was the steering committee chair of both ER and DASFAA, and currently the steering committee chair of BigComp. He served as Conference Co-chair of 11 international conferences, including ER 2004, DASFAA 2005, SIGMOD 2007, VLDB 2010, BigComp 2015, and ER 2018. He served as Program Committee Co-chair of 6 international conferences, including DASFAA 1995, ER 1998, ER 2003, and ER 2011. He received the ACM Recognition of Service Award in 2007, the DASFAA Outstanding Contributions Award in 2010, and the Peter P. Chen Award in 2011.