Web Database Metasearch Engine Project

Collaborative Research
Achieving Information Integration of Web Databases
Through the Construction of Metasearch Engines


This project was sponsored by research grants from the National Science Foundation from July 2005 to June 2009.

Principal Investigators at SUNY at Binghamton

PI: Prof. Weiyi Meng, Co-PI: Prof. Madhusudhan Govindaraju
Department of Computer Science
State University of New York at Binghamton
NSF grant number: IIS-0414981

Principal Investigator at University at Illinois at Chicago

PI: Prof. Clement Yu
Department of Computer Science
University at Illinois at Chicago
NSF grant number: IIS-0414939

Project Overview

The goal of this collaborative research project is to develop technologies for providing integrated access to Web databases. The approach consists of developing highly automated solutions to the following tasks: discovering Web databases from the Web, clustering them according to their application domains, integrating the search interfaces of the Web databases in the same domain into an integrated search interface, mapping each query submitted to an integrated interface to its underlying Web databases, extracting and annotating the search result records from the result pages returned from the local Web databases, and merging the results to form an integrated response for presentation to the user. Web services based interfaces of Web databases, if available, will be utilized. The evaluation of the developed algorithms is based on real Web databases from different domains. This research is expected to produce new algorithms, useful datasets, a software toolkit, and several operational Web database metasearch engines. The developed technology can be used in many applications including comparison shopping and collecting data from the deep Web.

Prof. Xiaofeng Meng of the School of Information at Renmin University in China also does collaborative research on this project.

We also conduct research in developing metasearch technologies for document search engines. Please click http://www.cs.binghamton.edu/~meng/metasearch.html to visit the homepage of our Document Metasearch Engine Project.

This project is also supported in part by the following equipment grant from NSF: CNS-0454298. Any opinions, findings and conclusions or recomendations expressed on this sites are those of the PIs and do not necessarily reflect the views of the National Science Foundation (NSF).

New NIH-sponsored Project

Dr. Neil Smalheiser of University of Illinois at Chicago is leading a team for a new National Institute of Health sponsored project entitled Mining Pipeline to Accelerate Systematic Reviews in Evidence-Based Medicine (09/30/2010-09/29/2014). Part of the project is to build a metasearch system for data sources containing clinical data (e.g., clinic-trial reviews) related information. Weiyi Meng and Clement Yu are participants of this project.

Other Participants


Related Publications

  1. Clement Yu, Prasoon Sharma, Weiyi Meng, and Yan Qin. Database Selection for Processing k Nearest Neighbors Queries in Distributed Environments , First ACM/IEEE Joint Conference on Digital Libraries, Roanoke, VA, June 2001, pp.215-222.
  2. Hai He, Weiyi Meng, Clement Yu, and Zonghuan Wu. WISE-Integrator: An Automatic Integrator of Web Search Interfaces for E-Commerce . Proc. of 29th International Conference on Very Large Data Bases (VLDB'03), pp.357-368, Berlin, Germany, September 2003.
  3. Clement Yu, George Philip, and Weiyi Meng. Distributed Top-N Query Processing with Possibly Uncooperative Local Systems . Proc. of 29th International Conference on Very Large Data Bases (VLDB'03), pp.117-128, Berlin, Germany, September 2003. (For a longer version with all the proofs, click here .)
  4. Hai He, Weiyi Meng, Clement Yu, and Zonghuan Wu. Automatic Extraction of Web Search Interfaces for Interface Schema Integration . World Wide Web Conference (WWW2004), poster paper, pp.414-415, New York City, May 2004.
  5. Wensheng Wu, Clement Yu, Anhai Doan, and Weiyi Meng. An Interactive Clustering-based Approach to Integrating Source Query interfaces on the Deep Web . Proceedings of the 33rd ACM SIGMOD Conference, pp.95-106, Paris, France, June 2004.
  6. Hai He, Weiyi Meng, Clement Yu, and Zonghuan Wu. Automatic Integration of Web Search Interfaces with WISE-Integrator . VLDB Journal, Vol.13, No.3, pp.256-273, September 2004. (Special Issue for Best Papers of VLDB 2003)
  7. Qian Peng, Weiyi Meng, Hai He, and Clement Yu. WISE-Cluster: Clustering E-Commerce Search Engines Automatically . 6th ACM International Workshop on Web Information and Data Management (WIDM 2004), pp.104-111, Washington, DC, November 2004.
  8. Hongkun Zhao, Weiyi Meng, Zonghuan Wu, Vijay Raghavan, and Clement Yu. Fully Automatic Wrapper Generation for Search Engines . Proc. of 14th International World Wide Web Conference (WWW14), pp.66-75, Chiba, Japan, May 2005.
  9. Hai He, Weiyi Meng, Clement Yu, Zonghuan Wu. WISE-Integrator: A System for Extracting and Integrating Complex Web Search Interfaces of the Deep Web. International Conference on Very Large Data Bases (VLDB'05), pp.1314-1317, Demo paper, Trondheim, Norway, August 2005.
  10. Wensheng Wu, AnHai Doan, Clement Yu, and Weiyi Meng. Bootstrapping Domain Ontology for Semantic Web Services from the Source Web Sites. 6th VLDB Workshop on Technologies for E-Services (TES 2005), pp.11-22, Trondheim, Norway, September 2005.
  11. Wensheng Wu, AnHai Doan, Clement Yu. Merging interface schemas on the Deep Web via clustering aggregation. 5th IEEE International Conference on Data Mining, pp.801-804, November 2005.
  12. Hai He, Weiyi Meng, Clement Yu, and Zonghuan Wu. Constructing Interface Schemas for Search Interfaces of Web Databases. 6th International Conference on Web Information Systems Engineering (WISE05) , pp.29-42, New York City, November 2005.
  13. Michael R. Head, Madhusudhan Govindaraju, Aleksander Slominski, Pu Liu, Nayef Abu-Ghazaleh, Robert van Engelen, Kenneth Chiu, Michael J. Lewis. A Benchmark Suite for SOAP-based Communication in Grid Web Services. SC|05 (Supercomputing): International Conference for High Performance Computing, Networking, and Storage, pp.19, Seattle WA, November 2005.
  14. Madhusudhan Govindaraju, Michael R. Head, Kenneth Chiu. XCAT-C++: Design and Performance of a Distributed CCA Framework. 12th Annual IEEE International Conference on High Performance Computing (HiPC), pp.270-279, December 2005, Goa, India.
  15. Eduard Dragut, Wensheng Wu, Prasad Sistla, Clement Yu, and Weiyi Meng. Merging Source Query Interfaces on Web Databases. 22nd International Conference on Data Engineering (ICDE'06) , pp.679-690, Atlanta, Georgia, April 2006.
  16. Wensheng Wu, AnHai Doan, and Clement Yu. WebIQ: Learning from the Web to Match Deep-Web Query Interfaces. 22nd International Conference on Data Engineering (ICDE'06), April 2006 .
  17. Wei Liu, Xiaofeng Meng, Weiyi Meng. Vision-based Web Data Records Extraction. Ninth International Workshop on the Web and Databases (WebDB 2006), pp.20-25, Chicago, June 2006.
  18. Hongkun Zhao, Weiyi Meng, Clement Yu. Automatic Extraction of Dynamic Record Sections From Search Engine Result Pages . 32nd International Conference on Very Large Data Bases (VLDB06), pp.989-1000, Seoul, Korea, September 2006.
  19. Eduard Dragut, Clement Yu, Weiyi Meng. Meaningful Labeling of Integrated Query Interfaces . 32nd International Conference on Very Large Data Bases (VLDB06), pp.~679-690, Seoul, Korea, September 2006.
  20. Yiyao Lu, Hai He, Qian Peng, Weiyi Meng, and Clement Yu. Clustering E-Commerce Search Engines based on Their Search Interface Pages Using WISE-Cluster . Data & Knowledge Engineering (DKE) Journal, Vol.59, No.2, pp.231-246, November 2006.
  21. Yiyao Lu, Hai He, Hongkun Zhao, Weiyi Meng, and Clement Yu. Annotating Structured Data of the Deep Web . IEEE 23rd International Conference on Data Engineering (ICDE 2007), pp.376-385, Istanbul, Turkey, April 2007.
  22. Xian Li, Weiyi Meng, Xiaofeng Meng. EasyQuerier: A Keyword Based Interface for Web Database Integration System. 12th International Conference on Database Systems for Advanced Applications (DASFAA), pp.936-942, Bangkok, Tailand, April 2007.
  23. Hai He, Weiyi Meng, Yiyao Lu, Clement Yu, and Zonghuan Wu. Towards Deeper Understanding of the Search Interfaces of the Deep Web. World Wide Web Journal, Vol.10, No.2, pp.133-155, June 2007.
  24. Chaitali Gupta, Rajdeep Bhowmik, Michael Head, Madhusudhan Govindaraju, Weiyi Meng. A Query-based System for Automatic Invocation of Web Services . 2007 IEEE International Conference on Web Services (ICWS), Application Services and Industry Track, pp.759-766, Salt Lake City, Utah, July 2007.
  25. Janette Hicks, Madhusudhan Govindaraju, Weiyi Meng. Search Algorithms for Discovery of Web Services. 2007 IEEE International Conference on Web Services (ICWS), Work in Progress track, pp.1172-1173, Salt Lake City, July 2007.
  26. Hongkun Zhao, Weiyi Meng, and Clement Yu. Mining Templates from Search Result Records of Search Engines. 13th ACM International Conference on Knowledge Discovering and Data Mining (SIGKDD 2007), pp.884-893, San Jose, California, August 2007.
  27. Wei Liu, Xiaofeng Meng, and Weiyi Meng. A Survey of Deep Web Data Integration. Chinese Journal of Computers, Vol.30, No.9, pp.1475-1489, September 2007.
  28. Janette Hicks, Madhusudhan Govindaraju, and Weiyi Meng. Enhancing the Discovery of Web Services through Optimized Algorithms. IEEE International conference on Granular Computing, pp.695-698, Silicon Valley, November 2007.
  29. Chaitali Gupta, Rajdeep Bhowmik, Michael R. Head, Madhusudhan Govindaraju, and Weiyi Meng. Improving Performance of Web Services Query Matchmaking with Automated Knowledge Acquisition. IEEE/WIC/ACM International Conference on Web Intelligence (WI'07), pp.559-563, Silicon Valley, November 2007.
  30. Liangcai Shu, Weiyi Meng, Hai He, Clement Yu. Querying Capability Modeling and Construction. 8th International Conference on Web Information Systems Engineering (WISE), pp.13-25, Nancy, France, December 2007.
  31. Chaitali Gupta, Rajdeep Bhowmik, Madhusudhan Govindaraju. Ontological Framework for a Free-Form Query Based Grid Search Engine. The 17th IEEE International Symposium on High Performance Distributed Computing (HPDC-17), Hot Topics Session, Boston, June 2008.
  32. Fangjiao Jiang, Linlin Jia, Weiyi Meng, Xiaofeng Meng. MrCoM: A Cost Model for Range Query Translation in Deep Web Data Integration. Fourth International Conference on Semantics, Knowledge and Grid (SKG), pp.263-270, December 2008, Beijing, China.
  33. Weiyi Meng, and Hai He. Data Search Engine. In Encyclopedia of Computer Science and Engineering (Benjamin Wah, ed.), John Wiley & Sons, pp.826-834, January 2009.
  34. Liangcai Shu, Bo Long, Weiyi Meng. A Latent Topic Model for Complete Entity Resolution. 25th IEEE International Conference on Data Engineering (ICDE), pp.880-891, Shanghai, China, March 2009.
  35. Fangjiao Jiang, Weiyi Meng, Xiaofeng Meng. Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration. International Conference on Database Systems for Advanced Applications (DASFAA), pp.595-600, Brisbane, Australia, April 2009.
  36. Eduard Dragut, Fang Fang, Prasad Sistla, Clement Yu, Weiyi Meng. Stop Word and Related Problems in Web Interface Integration. 35th International Conference on Very Large Data Bases (VLDB), pp.349-360, Lyon, France, August 2009.
  37. Eduard Dragut, T. Kabisch, Clement Yu and U. Leser. A Hierarchical Approach to Model Web Query Interfaces for Web Source Integration. 35th International Conference on Very Large Data Bases (VLDB), pp.325-336, Lyon, France, August 2009.
  38. Eduard Dragut, Fang Fang, Clement Yu and Weiyi Meng. Deriving Customized Integrated Web Query Interfaces. IEEE/WIC/ACM International Conference on Web Intelligence, Milan, Italy, pp.685-688, September 2009.
  39. Wensheng Wu, AnHai Doan, Clement Yu, and Weiyi Meng. Modeling and Extracting Deep-Web Query Interfaces. In Advances in Information and Intelligent Systems, edited by Zbigniew W. Ras and William Ribarsky. Springer, pp.65-90, October 2009.
  40. Wei Liu, Xiaofeng Meng, Weiyi Meng. ViDE: A Vision-based Approach for Deep Web Data Extraction . IEEE Transactions on Knowledge and Data Engineering, Vol.22, No.3, pp.447-460, March 2010.
  41. E. Dragut, T. Kabisch, C. Yu, and U. Leser. Deep Web Integration with VisQI. 36th International Conference on Very Large Data Bases (VLDB), Demo paper, pp.1613-1616, Singapore, Sept 2010.
  42. Aaron M. Cohen, Clive E. Adams, John M. Davis, Clement Yu, Philip S. Yu, Weiyi Meng, Lorna Duggan, Marian McDonagh, and Neil R. Smalheiser. Evidence-based Medicine: The Essential Role of Systematic Reviews and the Need for Automated Text Mining Tools. 1st ACM International Health Informatics Symposium (IHI 2010), pp.376-380, Arlington, Virginia, November 2010.
  43. Liangcai Shu, Aiyou Chen, Ming Xiong, Weiyi Meng. Efficient Spectral Neighborhood Blocking for Entity Resolution. IEEE International Conference on Data Engineering (ICDE), pp.1067-1078, Hannover, Germany, April 2011.
  44. Eduard Dragut, Weiyi Meng, Clement Yu. Deep Web Query Interface Understanding and Integration. Morgan & Claypool Publishers, 2012.
  45. Liangcai Shu, Can Lin, Weiyi Meng, Yue Han, Clement Yu, and Neil R. Smalheiser. A Framework for Entity Resolution with Efficient Blocking. IEEE International Conference on Information Reuse and Integration (IRI), Las Vegas, August 2012.
  46. Yiyao Lu, Hai He, Hongkun Zhao, Weiyi Meng, Clement Yu. Annotating Search Results From Web Databases. IEEE Transactions on Knowledge and Data Engineering (TKDE), 25(3), pp.514-527, March 2013.
  47. Eduard Dragut, B. P. Beirne, Bhaskar DasGupta, A. Neyestani, B. Atassi, Clement Yu, Meng Weiyi. YumiInt - A Deep Web Integrating System for Local Search Engines for Geo-referenced Objects. IEEE International Conference on Data Engineering (ICDE), Demo paper, Brisbane, Australia, April 2013.
  48. Neil R. Smalheiser, Can Lin, Lifeng Jia, Yu Jiang, Aaron M. Cohen, Clement Yu, John M. Davis, Clive E. Adams, Marian S. McDonagh, and Weiyi Meng. Design and implementation of Metta, a metasearch engine for biomedical literature retrieval intended for systematic reviewers. Health Information Science and Systems, 2(1), 2014.
  49. Yu Jiang, Can Lin, Weiyi Meng, Clement Yu, Aaron M. Cohen, and Neil R. Smalheiser. Rule-based deduplication of article records from bibliographic databases. Database, January 2014.
  50. Eduard Dragut, Bhaskar DasGupta, Brian Peirne, Ali Neyestani, Badr Atassi, Clement Yu, and Weiyi Meng. Merging Query Results From Local Search Engines for Georeferenced Objects. ACM Transactions on the Web (TWEB), 8(4):20, October 2014.
  51. Wensheng Wu, Weiyi Meng, Weifeng Su, Quangyou Zhou, Yao-Yi Chiang. Q2P: Discovering Query Templates via Autocompletion. ACM Transactions on the Web (TWEB), 10(2):10, May 2016.
  52. Jing Yuan, Lihong He, Eduard Dragut, Weiyi Meng, Clement Yu. Result Merging for Structured Queries on the Deep Web with Active Relevance Weight Estimation. Information Systems, September 2016.

Prototypes and Demos


Last change: September 15, 2016 / meng@cs.binghamton.edu