Effective and Scalable Metasearch Engine Research


This project was sponsored by research grants from the National Science Foundation .

Principal Investigator at SUNY at Binghamton

PI: Prof. Weiyi Meng
Department of Computer Science
State University of New York at Binghamton
NSF grant number: IIS-0208574

Principal Investigator at University at Illinois at Chicago

PI: Prof. Clement Yu
Department of Computer Science
University at Illinois at Chicago
NSF grant number: IIS-0208434

The objective of this project was to develop techniques for building highly scalable and effective metasearch engines and related techniques. A metasearch engine interacts with multiple local search engines so that a single query can be used to search multiple local search engines.

We study two types of metasearch engines. The first type combines multiple document search engines and will be called document metasearch engines. The second type combines multiple database driven search engines and will be called database metasearch engines. For both types of metasearch engines, the issues that we study include how to discover and classify search engines, how to build wrappers for search engines, how to identify potentially useful local search engines for each user query, and how to merge the results from multiple local search engines. For database metasearch engines, we also study how to integrate the search interfaces of multiple search engines into a unified interface and how to annotate the retrieved results (Please click http://www.cs.binghamton.edu/~meng/DMSE.html to visit the homepage of our Web Database Metasearch Project).

For document metasearch engines, our WebScales project aims to create a metasearch engine on top of essentially all useful search engines on the Web. Due to the scale of the problem (it is estimated that there are hundreds of thousands of search engines), we are developing scalable solutions and building automated tools to construct metasearch engines.

This research is funded by the following grants from the National Science Foundation: IIS-9902872, IIS-9902792, EIA-9911099, IIS-0208574, and IIS-0208434). The Principal Investigators of these projects are Prof. Weiyi Meng at SUNY Binghamton (BU) and Prof. Clement Yu at the University of Illinois at Chicago (UIC). Zonghuan Wu and Vijay Raghavan of University of Louisiana at Lafayette are also collaborators. Any opinions, findings and conclusions or recomendations expressed on this sites are those of the PIs and do not necessarily reflect the views of the National Science Foundation (NSF).


Related Publications and Technical Reports

  1. Weiyi Meng, King-Lup Liu, Clement Yu, Xiaodong Wang, Yu-Hsi Chang, and Naphtali Rishe. Determining Text Databases to Search in the Internet . Proc. of the 24th International Conference on Very Large Data Bases (VLDB'98) , New York City, August 1998, pp.14-25.
  2. Weiyi Meng, King-Lup Liu, Clement Yu, Wensheng Wu, and Naphtali Rishe. Estimating the Usefulness of Search Engines . Proc. of the 15th International Conference on Data Engineering (ICDE'99) , Sydney, Australia, March 1999, pp.146-153.
  3. Clement Yu, King-Lup Liu, Wensheng Wu, Weiyi Meng, and Naphtali Rishe. Finding the Most Similar Documents across Multiple Text Databases . Proc. of the IEEE Conference on Advances in Digital Libraries (ADL'99) , Baltimore, Maryland, May 1999, pp.150-162.
  4. King-Lup Liu, Clement Yu, Weiyi Meng, and Naphtali Rishe. Discovery of Similarity Computation on the Internet. Proc. of the ACM Conference on Digital Libraries (DL'99) (poster paper) , University of California, Berkeley, August 1999, pp.232-233.
  5. Weiyi Meng, Clement Yu, and King-Lup Liu. Detection of Heterogeneities in a Multiple Text Database Environment . Proc. of the Fourth IFCIS Conference on Cooperative Information Systems (CoopIS'99) , Edinburgh, Scotland, September 1999, pp.22-33.
  6. Clement Yu, Weiyi Meng, King-Lup Liu, Wensheng Wu, and Naphtali Rishe. Efficient and Effective Metasearch for a Large Number of Text Databases . Proc. of Eighth ACM International Conference on Information and Knowledge Management (CIKM'99) , Kansas City, November 1999, pp.217-224.
  7. Wenxian Wang, Weiyi Meng, and Clement Yu. Concept Hierarchy Based Text Database Categorization in a Metasearch Engine Environment . Proc. of First International Conference on Web Information Systems Engineering (WISE'2000) , Hong Kong, June 2000, pp.283-290.
  8. King-Lup Liu, Weiyi Meng, and Clement Yu. Discovery of Similarity Computations of Search Engines . Proc. of Nineth ACM International Conference on Information and Knowledge Management (CIKM'00) , Washington, D.C., November 2000, pp.290-297.
  9. Zonghuan Wu, Weiyi Meng, Clement Yu, and Zhuogang Li. Towards a Highly-Scalable and Effective Metasearch Engine . Proc. of Tenth World Wide Web Conference (WWW10), Hong Kong, May 2001, pp.386-395.
  10. Clement Yu, Weiyi Meng, Wensheng Wu, and King-Lup Liu. Efficient and Effective Metasearch for Text Databases Incorporating Linkages among Documents . ACM SIGMOD Conference, May 2001, pp.187-198.
  11. Weiyi Meng, Zonghuan Wu, Clement Yu, and Zhuogang Li. A Highly-Scalable and Effective Method for Metasearch. ACM Transactions on Information Systems 19(3), pp.310-335, July 2001.
  12. King-Lup Liu, Clement Yu, Weiyi Meng, A. Santoso, and C. Zhang. Discovering the Representative of a Search Engine. Tenth ACM International Conference on Information and Knowledge Management (CIKM'01), (poster paper), Atlanta, Georgia, November 2001, pp.577-579.
  13. Weiyi Meng, Wenxian Wang, Hongyu Sun, and Clement Yu. Concept Hierarchy Based Text Database Categorization . International Journal on Knowledge and Information Systems , Vol. 4, Vol. 2, pp.132-150, March 2002.
  14. Weiyi Meng, Clement Yu, and King-Lup Liu. Building Efficient and Effective Metasearch Engines . ACM Computing Surveys , Vol. 34, No. 1, March 2002, pp.48-89.
  15. Fang Liu, Clement Yu, and Weiyi Meng. Personalize Web Search by Mapping User Queries to Categories . Proc. of Eleventh ACM International Conference on Information and Knowledge Management (CIKM'02) , McLean, Virginia, November 2002, pp.558-565.
  16. King-Lup Liu, Clement Yu, and Weiyi Meng. Discovering the Representative of a Search Engine. Proc. of Eleventh ACM International Conference on Information and Knowledge Management (CIKM'02), (poster paper), pp.652-654, McLean, Virginia, November 2002.
  17. Clement Yu, King-Lup Liu, Weiyi Meng, Zonghuan Wu, and Naphtali Rishe. A Methodology to Retrieve Text Documents from Multiple Databases . IEEE Transactions on Knowledge and Data Engineering, Vol.14, No.6, November/December 2002, pp.1347-1361.
  18. King-Lup Liu, Clement Yu, Weiyi Meng, Wensheng Wu, and Naphtali Rishe. A Statistical Method for Estimating the Usefulness of Text Databases . IEEE Transactions on Knowledge and Data Engineering, 14(6), pp.1422-1437, November/December 2002.
  19. Zonghuan Wu, Vijay Raghavan, Chun Du, M. Sai C, Weiyi Meng, Hai He, and Clement Yu. SE-LEGO: Creating Metasearch Engine on Demand. Proc. of 26th ACM SIGIR Conference, Demo paper, pp.464, Toronto, Canada, July 2003.
  20. Zonghuan Wu, Vijay Raghavan, Chun Du, Weiyi Meng, Hai He, and Clement Yu. Creating Customized Metasearch Engines on Demand Using SE-LEGO. Proc. of Fourth International Conference on Web-Age Information Management (WAIM'03), Demo paper, pp.503-505, Chengdu, China, August 2003.
  21. Zonghuan Wu, Vijay Raghavan, Hua Qian, V. Rama K, Weiyi Meng, Hai He, and Clement Yu. Towards Automatic Incorporation of Search Engines into a Large-Scale Metasearch Engine . 2003 IEEE/WIC International Conference on Web Intelligence, pp.658-661, Halifax, Canada, October 2003.
  22. Clement Yu, and Weiyi Meng. Web Search Technology . In The Internet Encyclopedia edited by Hossein Bidgoli, Wiley Publishers, pp.738-753, 2003.
  23. Fang Liu, Clement Yu, and Weiyi Meng. Personalized Web Search for Improving Retrieval Effectiveness , IEEE Transactions on Knowledge and Data Engineering, Vol.16, No.1, pp.28-40, January 2004.
  24. Wensheng Wu, Clement Yu, and Weiyi Meng. Database Selection for Longer Queries . Proceedings of the 2004 Meeting of the International Federation of Classification Societies, pp.575-584, Chicago, July 2004 (invited).
  25. Hongkun Zhao, Weiyi Meng, Zonghuan Wu, Vijay Raghavan, and Clement Yu. Fully Automatic Wrapper Generation for Search Engines . Proc. of 14th International World Wide Web Conference (WWW14), pp.66-75, Chiba, Japan, May 2005.
  26. Yiyao Lu, Weiyi Meng, Liangcai Shu, Clement Yu, and King-Lup Liu. Evaluation of Result Merging Strategies for Metasearch Engines . 6th International Conference on Web Information Systems Engineering (WISE05) , pp.53-66, New York City, November 2005.
  27. Dheerendranath Mundluru, Zonghuan Wu, Vijay Raghavan, Weiyi Meng, Hongkun Zhao. Automatically Extracting Subsequent Response Pages from Web Search Sources. IEEE Workshop on Knowledge Acquisition from Distributed, Autonomous, Semantically Heterogeneous Data and Knowledge Sources , Houston, Texas, November 2005.
  28. Yiyao Lu, Weiyi Meng, Wanjing Zhang, King-Lup Liu, and Clement Yu. Automatic Extraction of Publication Time from News Search Results. 2nd International Workshop on Challenges in Web Information Retrieval and Integration (WIRI2006), pp.141-150, Atlanta, Georgia, April 2006.
  29. Yanyan Ling, Xiaofeng Meng, and Weiyi Meng. Automated Extraction of Hit Numbers From Search Result Pages. Seventh International Conference on Web-Age Information Management (WAIM 2006), pp.73-84, Hong Kong, June 2006.
  30. Wei Liu, Xiaofeng Meng, Weiyi Meng. Vision-based Web Data Records Extraction. Ninth International Workshop on the Web and Databases (WebDB 2006), pp.20-25, Chicago, June 2006.
  31. Hongkun Zhao, Weiyi Meng, Clement Yu. Automatic Extraction of Dynamic Record Sections From Search Engine Result Pages . 32nd International Conference on Very Large Data Bases (VLDB06), pp.989-1000, Seoul, Korea, September 2006.
  32. Ronak Desai, Qi Yang, Zonghuan Wu, Weiyi Meng, Clement Yu. Identifying Redundant Search Engines in a Very Large Scale Metasearch Engine Context . Proc. of 8th ACM International Workshop on Web Information and Data Management (WIDM 2006), pp.51-58, November 2006.
  33. Reza T. Hemayati, Weiyi Meng, Clement Yu. Semantic-based Grouping of Search Engine Results Using WordNet . Joint Conference of the 9th Asia-Pacific Web Conference and the 8th International Conference on Web-Age Information Management (APWeb/WAIM'07), pp.678-686, HuangShan, China, June 2007.
  34. Yiyao Lu, Zonghuan Wu, Hongkun Zhao, Weiyi Meng, King-Lup Liu, Vijay Raghavan, Clement Yu. MySearchView: A Customized Metasearch Engine Generator. 26th ACM SIGMOD International Conference on Management of Data (SIGMOD 2007), Demo paper, pp.1113-1115, Beijing, China, June 2007.
  35. King-Lup Liu, Weiyi Meng, Jing Qiu, Clement Yu, Vijay Raghavan, Zonghuan Wu, Yiyao Lu, Hai He, Hongkun Zhao. AllInOneNews: Development and Evaluation of a Large-Scale News Metasearch Engine. 26th ACM SIGMOD International Conference on Management of Data ACM (SIGMOD 2007), Industrial track, pp.1017-1028, Beijing, China, June 2007.
  36. Hongkun Zhao, Weiyi Meng, and Clement Yu. Mining Templates from Search Result Records of Search Engines. 13th ACM International Conference on Knowledge Discovering and Data Mining (SIGKDD 2007), pp.884-893, San Jose, California, August 2007.
  37. Weiyi Meng, and Hai He. Data Search Engine. In Encyclopedia of Computer Science and Engineering (Benjamin Wah, ed.), John Wiley & Sons, pp.826-834, January 2009.
  38. Weiyi Meng. Metasearch Engines. In Encyclopedia of Database Systems, edited by Ling Liu and M. Tamer Ozsu, Springer, pp.1730-1734, August 2009.
  39. Weiyi Meng and Clement Yu. Web Search Technologies for Text Documents. In The Handbook of Technology Management, edited by Hossein Bidgoli, Wiley Publishers, January 2010.
  40. Wei Liu, Xiaofeng Meng, Weiyi Meng. ViDE: A Vision-based Approach for Deep Web Data Extraction . IEEE Transactions on Knowledge and Data Engineering, Vol.22, No.3, pp.447-460, March 2010.
  41. Aaron M. Cohen, Clive E. Adams, John M. Davis, Clement Yu, Philip S. Yu, Weiyi Meng, Lorna Duggan, Marian McDonagh, and Neil R. Smalheiser. Evidence-based Medicine: The Essential Role of Systematic Reviews and the Need for Automated Text Mining Tools. 1st ACM International Health Informatics Symposium (IHI 2010), pp.376-380, Arlington, Virginia, November 2010.
  42. Reza T. Hemayati, Weiyi Meng and Clement Yu. Identifying and Ranking Possible Semantic and Common Usage Categories of Search Engine Queries . International Conference on Web Information System Engineering (WISE), Hong Kong, December 2010.
  43. Weiyi Meng, Clement Yu. Advanced Metasearch Engine Technology. Morgan & Claypool Publishers, December 2010.
  44. Reza T. Hemayati and Weiyi Meng. Semantic-Based Grouping of Search Engine Results. In Book "Introduction to the Web Semantic: Concepts, Technologies and Applications" edited by Gabriel Fung, iConcept Press, pp.1-16, 2011.
  45. Reza T. Hemayati, Weiyi Meng, Clement Yu. Categorizing Search Results Using WordNet and Wikipedia. International Conference on Web-Age Information Management (WAIM), pp.185-197, Harbin, China, August 2012.
  46. Reza T. Hemayati, Laleh J. Dehkordi, Weiyi Meng. mNIR: Diversifying Search Results based on a Mixture of Novelty, Intention and Relevance. International Conference on Web Information System Engineering (WISE), pp.594-608, Paphos, Cyprus, November 2012.
  47. C. Yu, W. Meng. Determining Text Databases to Search in the Internet . PI Report to NSF IDM 2000 PI Workshop.
  48. C. Yu, W. Meng. Determining Text Databases to Search in the Internet . PI Report to NSF IDM 2001 PI Workshop.
  49. C. Yu, W. Meng. Determining Text Databases to Search in the Internet . PI Report to NSF IDM 2002 PI Workshop.
  50. C. Yu, W. Meng. WebScales: Towards a Large-scale Metasearch Engine . PI Report to NSF IDM 2003 PI Workshop.
  51. C. Yu, W. Meng. WebScales: Towards a Large-scale Metasearch Engine . PI Report to NSF IDM 2004 PI Workshop.

Prototypes and Demos


Last change: November 20, 2012 / meng@cs.binghamton.edu