Project Overview
The goal of this collaborative research project is to develop technologies for
providing integrated access to Web databases. The approach consists of
developing highly automated solutions to the following tasks: discovering Web
databases from the Web, clustering them according to their application domains,
integrating the search interfaces of the Web databases in the same domain
into an integrated search interface, mapping each query submitted to an
integrated interface to its underlying Web databases, extracting and
annotating the search result records from the result pages returned from the
local Web databases, and merging the results to form an integrated response
for presentation to the user. Web services based interfaces of Web databases,
if available, will be utilized. The evaluation of the developed algorithms is
based on real Web databases from different domains. This research is expected
to produce new algorithms, useful datasets, a software toolkit, and several
operational Web database metasearch engines. The developed technology can be
used in many applications including comparison shopping and collecting data
from the deep Web.
Prof. Xiaofeng Meng of the School of Information at Renmin University in China
also does collaborative research on this project.
We also conduct research in developing metasearch technologies
for document search engines. Please click
http://www.cs.binghamton.edu/~meng/metasearch.html
to visit the homepage of our Document Metasearch Engine Project.
This project is also supported in part by the following equipment grant
from NSF:
CNS-0454298.
Any opinions, findings and conclusions or recomendations expressed
on this sites are those of the PIs and do not necessarily reflect
the views of the National Science Foundation (NSF).
New NIH-sponsored Project
Dr. Neil Smalheiser of University of Illinois at Chicago is leading a team
for a new National Institute of Health sponsored project entitled
Mining Pipeline to Accelerate Systematic Reviews in Evidence-Based
Medicine (09/30/2010-09/29/2014). Part of the project is to
build a metasearch system for data sources containing clinical data
(e.g., clinic-trial reviews) related information. Weiyi Meng and Clement Yu
are participants of this project.
Other Participants
The following students have participated/are participating
in this project:
- Wensheng Wu (UIUC, graduated with a PhD degree in 2006)
- George Philip (UIC, graduated with a Masters degree)
- Eduard Dragut (UIC, graduated with a PhD degree in 2010)
- Hai He (BU, graduated with a PhD degree in 2005)
- Hongkun Zhao (BU, graduate with a PhD degree in 2007)
- Yiyao Lu (BU, graduated with a PhD degree in 2011)
- Qian Peng (BU, graduated with a Masters degree)
- Janette Hicks (BU, graduated with a Masters degree)
- Yicheng Doe (BU, graduated with a Masters degree)
- Sahana Krishnamurthy (BU, graduated with a Masters degree)
- Nirav Pandya (BU, graduated with a Masters degree)
- Liangcai Shu (BU, graduated with a PhD degree in 2012)
- Chaitali Gupta (BU, graduated with a PhD degree in 2011)
- Michael Head (BU, graduated with a PhD degree in 2009)
- Yi-Jing (Jannie) Tan (Amhurst College, REU student during summer 2009)
Related Publications
- Clement Yu, Prasoon Sharma, Weiyi Meng, and Yan Qin.
Database Selection
for Processing k Nearest Neighbors Queries in Distributed
Environments , First ACM/IEEE Joint Conference
on Digital Libraries, Roanoke, VA, June 2001, pp.215-222.
- Hai He, Weiyi Meng, Clement Yu, and Zonghuan Wu.
WISE-Integrator: An
Automatic Integrator of Web Search Interfaces for E-Commerce
. Proc. of 29th International Conference on Very
Large Data Bases (VLDB'03), pp.357-368, Berlin, Germany,
September 2003.
- Clement Yu, George Philip, and Weiyi Meng.
Distributed Top-N Query Processing with Possibly
Uncooperative Local Systems . Proc. of 29th
International Conference on Very Large Data Bases (VLDB'03),
pp.117-128, Berlin, Germany, September 2003. (For a longer version
with all the proofs, click
here .)
- Hai He, Weiyi Meng, Clement Yu, and Zonghuan Wu.
Automatic Extraction of
Web Search Interfaces for Interface Schema Integration .
World Wide Web Conference (WWW2004), poster paper, pp.414-415,
New York City, May 2004.
- Wensheng Wu, Clement Yu, Anhai Doan, and Weiyi Meng.
An Interactive
Clustering-based Approach to Integrating Source Query interfaces
on the Deep Web . Proceedings of the 33rd
ACM SIGMOD Conference, pp.95-106, Paris, France, June 2004.
- Hai He, Weiyi Meng, Clement Yu, and Zonghuan Wu.
Automatic Integration
of Web Search Interfaces with WISE-Integrator .
VLDB Journal, Vol.13, No.3, pp.256-273, September 2004.
(Special Issue for Best Papers of VLDB 2003)
- Qian Peng, Weiyi Meng, Hai He, and Clement Yu.
WISE-Cluster: Clustering
E-Commerce Search Engines Automatically .
6th ACM International Workshop on Web Information and Data
Management (WIDM 2004), pp.104-111, Washington, DC, November 2004.
- Hongkun Zhao, Weiyi Meng, Zonghuan Wu, Vijay Raghavan,
and Clement Yu.
Fully Automatic Wrapper Generation for Search Engines
. Proc. of 14th International World Wide Web
Conference (WWW14), pp.66-75, Chiba, Japan, May 2005.
- Hai He, Weiyi Meng, Clement Yu, Zonghuan Wu.
WISE-Integrator:
A System for Extracting and Integrating Complex Web Search Interfaces
of the Deep Web. International Conference on
Very Large Data Bases (VLDB'05), pp.1314-1317, Demo paper, Trondheim,
Norway, August 2005.
- Wensheng Wu, AnHai Doan, Clement Yu, and Weiyi Meng.
Bootstrapping
Domain Ontology for Semantic Web Services from the Source Web
Sites. 6th VLDB Workshop on Technologies for
E-Services (TES 2005), pp.11-22, Trondheim, Norway, September 2005.
- Wensheng Wu, AnHai Doan, Clement Yu.
Merging interface schemas on the Deep Web via clustering
aggregation. 5th IEEE International Conference on
Data Mining, pp.801-804, November 2005.
- Hai He, Weiyi Meng, Clement Yu, and Zonghuan Wu.
Constructing Interface Schemas
for Search Interfaces of Web Databases.
6th International Conference on Web Information Systems Engineering
(WISE05) , pp.29-42, New York City, November 2005.
- Michael R. Head, Madhusudhan Govindaraju, Aleksander Slominski,
Pu Liu, Nayef Abu-Ghazaleh, Robert van Engelen, Kenneth Chiu,
Michael J. Lewis.
A Benchmark Suite for SOAP-based Communication
in Grid Web Services. SC|05 (Supercomputing): International
Conference for High Performance Computing, Networking, and Storage,
pp.19, Seattle WA, November 2005.
- Madhusudhan Govindaraju, Michael R. Head, Kenneth Chiu.
XCAT-C++: Design and Performance of a Distributed CCA
Framework.
12th Annual IEEE International Conference on High Performance
Computing (HiPC), pp.270-279, December 2005, Goa, India.
- Eduard Dragut, Wensheng Wu, Prasad Sistla, Clement Yu, and
Weiyi Meng. Merging
Source Query Interfaces on Web Databases.
22nd International Conference on Data Engineering (ICDE'06) ,
pp.679-690, Atlanta, Georgia, April 2006.
- Wensheng Wu, AnHai Doan, and Clement Yu.
WebIQ: Learning from
the Web to Match Deep-Web Query Interfaces. 22nd
International Conference on Data Engineering (ICDE'06), April 2006 .
- Wei Liu, Xiaofeng Meng, Weiyi Meng.
Vision-based Web Data Records Extraction.
Ninth International Workshop on the Web and Databases
(WebDB 2006), pp.20-25, Chicago, June 2006.
- Hongkun Zhao, Weiyi Meng, Clement Yu.
Automatic Extraction of Dynamic Record Sections From Search
Engine Result Pages . 32nd International Conference
on Very Large Data Bases (VLDB06), pp.989-1000, Seoul, Korea,
September 2006.
- Eduard Dragut, Clement Yu, Weiyi Meng.
Meaningful Labeling of Integrated Query Interfaces
. 32nd International Conference on Very Large Data Bases
(VLDB06), pp.~679-690, Seoul, Korea, September 2006.
- Yiyao Lu, Hai He, Qian Peng, Weiyi Meng, and Clement Yu.
Clustering E-Commerce
Search Engines based on Their Search Interface Pages Using WISE-Cluster
. Data & Knowledge Engineering (DKE) Journal,
Vol.59, No.2, pp.231-246, November 2006.
- Yiyao Lu, Hai He, Hongkun Zhao, Weiyi Meng, and Clement Yu.
Annotating
Structured Data of the Deep Web . IEEE 23rd International
Conference on Data Engineering (ICDE 2007), pp.376-385, Istanbul,
Turkey, April 2007.
- Xian Li, Weiyi Meng, Xiaofeng Meng.
EasyQuerier: A Keyword Based Interface for Web Database
Integration System. 12th International Conference
on Database Systems for Advanced Applications (DASFAA), pp.936-942,
Bangkok, Tailand, April 2007.
- Hai He, Weiyi Meng, Yiyao Lu, Clement Yu, and Zonghuan Wu.
Towards Deeper Understanding of the
Search Interfaces of the Deep Web. World Wide Web
Journal, Vol.10, No.2, pp.133-155, June 2007.
- Chaitali Gupta, Rajdeep Bhowmik, Michael Head, Madhusudhan
Govindaraju, Weiyi Meng.
A Query-based System for Automatic Invocation of Web Services
. 2007 IEEE International Conference on Web Services
(ICWS), Application Services and Industry Track, pp.759-766,
Salt Lake City, Utah, July 2007.
- Janette Hicks, Madhusudhan Govindaraju, Weiyi Meng.
Search Algorithms for
Discovery of Web Services. 2007 IEEE International
Conference on Web Services (ICWS), Work in Progress track, pp.1172-1173,
Salt Lake City, July 2007.
- Hongkun Zhao, Weiyi Meng, and Clement Yu.
Mining Templates
from Search Result Records of Search Engines. 13th
ACM International Conference on Knowledge Discovering and Data
Mining (SIGKDD 2007), pp.884-893, San Jose, California, August 2007.
- Wei Liu, Xiaofeng Meng, and Weiyi Meng.
A Survey of
Deep Web Data Integration. Chinese Journal of Computers,
Vol.30, No.9, pp.1475-1489, September 2007.
- Janette Hicks, Madhusudhan Govindaraju, and Weiyi Meng.
Enhancing the Discovery
of Web Services through Optimized Algorithms.
IEEE International conference on Granular Computing, pp.695-698,
Silicon Valley, November 2007.
- Chaitali Gupta, Rajdeep Bhowmik, Michael R. Head, Madhusudhan
Govindaraju, and Weiyi Meng.
Improving Performance of Web Services Query Matchmaking with Automated
Knowledge Acquisition. IEEE/WIC/ACM International
Conference on Web Intelligence (WI'07), pp.559-563, Silicon Valley,
November 2007.
- Liangcai Shu, Weiyi Meng, Hai He, Clement Yu.
Querying Capability
Modeling and Construction. 8th
International Conference on Web Information Systems Engineering (WISE),
pp.13-25, Nancy, France, December 2007.
- Chaitali Gupta, Rajdeep Bhowmik, Madhusudhan Govindaraju.
Ontological Framework
for a Free-Form Query Based Grid Search Engine.
The 17th IEEE International Symposium on High Performance Distributed
Computing (HPDC-17), Hot Topics Session, Boston, June 2008.
- Fangjiao Jiang, Linlin Jia, Weiyi Meng, Xiaofeng Meng.
MrCoM: A Cost Model for Range
Query Translation in Deep Web Data Integration.
Fourth International Conference on Semantics, Knowledge and Grid (SKG),
pp.263-270, December 2008, Beijing, China.
- Weiyi Meng, and Hai He.
Data Search Engine. In Encyclopedia of Computer Science
and Engineering (Benjamin Wah, ed.), John Wiley & Sons, pp.826-834,
January 2009.
- Liangcai Shu, Bo Long, Weiyi Meng.
A Latent Topic Model for Complete Entity Resolution.
25th IEEE International Conference on Data Engineering (ICDE), pp.880-891,
Shanghai, China, March 2009.
- Fangjiao Jiang, Weiyi Meng, Xiaofeng Meng.
Selectivity Estimation
for Exclusive Query Translation in Deep Web Data Integration.
International Conference on Database Systems for Advanced Applications
(DASFAA), pp.595-600, Brisbane, Australia, April 2009.
- Eduard Dragut, Fang Fang, Prasad Sistla, Clement Yu, Weiyi Meng.
Stop Word and Related Problems
in Web Interface Integration. 35th International Conference
on Very Large Data Bases (VLDB), pp.349-360, Lyon, France, August 2009.
- Eduard Dragut, T. Kabisch, Clement Yu and U. Leser.
A Hierarchical Approach to Model
Web Query Interfaces for Web Source Integration. 35th
International Conference on Very Large Data Bases (VLDB), pp.325-336,
Lyon, France, August 2009.
- Eduard Dragut, Fang Fang, Clement Yu and Weiyi Meng.
Deriving Customized Integrated
Web Query Interfaces
. IEEE/WIC/ACM International Conference
on Web Intelligence, Milan, Italy, pp.685-688, September 2009.