Scientific research often requires data from many different sources, ranging from sensor networks to lab analysis equipment. Working with SDSC and the University of Wisconsin, we are developing an integrated data aquisition system. The system will be able to acquire data from sources ranging from PDAs to sensor networks. An autonomous agent approach is being used to reprogram and reconfigure both hardware, such as dataloggers, and software. Ontologies and RDF are being used to describe sensors and deployment information.
Our current status is that we are able to automatically detect SDI-12 sensors, and reprogram Campbell data loggers for a selected subset of sensor models. Current work is to generalize the sensor description and datalogger reprogramming to work with any SDI-12 sensor. In the coming months, we will start to incorporate PDAs and analytical instruments into our acquisition system.
This work is part of the GLEON activities.
The circulation of lakes plays an important role in a number of scientific questions. Knowing the circulation can thus help understand and ultimately answer these questions.
Directly measuring lake circulation is difficult, however. Acoustic doppler current profilers (ADCP) can measure localized circulation, but are too costly to deploy throughout an entire lake. Computing the circulation through CFD models is thus a reasonable alternative. These models generally require a tractable amount of input data, such as the lake morphology, wind speed, etc.
We are working on a system for hind-, now-, and fore-casting of lake circulation, and verification of the results with acoustic doppler current profiler (ADCP) data. The system includes databases at Wisconsin 's Center for Limnology, non-hydrostatic 3-D models developed by Wisconsin , and integrative and presentation software developed at SUNY-Binghamton.
Longer-term, we wish to package this into a toolkit that can be distributed to other sites, with possible hosting of backend services a central computing facility, to relieve lake sites from having to supply scientific computing expertise.
The CrystalGrid Framework (CGF) project will research the acquisition, transport, and curation of data over the entire data space of the field of X-ray crystallography, addressing methods for managing wide heterogeneity in data representations, formats, data containers, administrative domains and diverse instruments and equipment. Until recently, individual labs have simply imposed local homogeneity of format and procedure, and not stored lab-dependent metadata. This ad hoc system is limited, however, as crystallographers begin to cross between labs to accomplish their research objectives, and as increasing numbers and sizes of output data streams leave less time for each investigation. Local workflow must be made explicit, procedures must be formally described, and the history and assemblages of data expressed in an open, shareable way. Creation and management of complete, accessible records for each experiment is critical, as well as heterogeneity in data acquisition and management across the field.
To meet that need, this project will develop a framework of web service interfaces and data and metadata systems addressing the whole spectrum of crystallography. Project participants and collaborators will leverage existing projects, such as Reciprocal Net and Common Instrument Middleware Architecture, that address narrower issues in the problem domain. The CGF will also draw on collaborating projects with overlapping areas of interest, such as the UK-based Comb-e-Chem project. The resulting framework will be a useful environment for crystallographic investigations and an extensible platform on which new web-based applications can be built.
The CGF project involves the classic problem of dealing with heterogeneity in data, procedures, and instruments in the crystallography application space, and another classic problem in integrating the entire data collection, transport, and curation requirements of the domain into a seamless beginning to end system. The challenge is to create a virtualization system that manages heterogeneity in more than a single aspect and to provide vertical integration using only open, extensible, and interoperable standards and methodologies. While the project constitutes research into pertinent computer science problems, the plan for performing the research is centered on producing a product (the CGF) that will immediately be useful in addressing emerging technical problems in the field of X-ray crystallography. Within crystallography, one of the specific goals is to make structural results accessible that might otherwise never be seen, and so the CGF will help increase the body of scientific knowledge and improve the return on federal investment in the large numbers of x-ray diffractometers and associated instruments nationwide. Although the project targets specifically a few hundreds of crystallography labs worldwide, the software and methods created in it are intended to be reusable for any science moving from individual lab practices to a shared, global collaboratory system. In sciences such as high-energy physics and astronomy, the scientists have long shared single, unique, large instruments and had to create shared data management and instrument metadata. CGF is likely to be useful in other scientific disciplines which still use widely-distributed lab-based instruments that now need to be linked in data grids.
Large, expensive instruments are typically shared among many scientists and even nations. Thus, the ability to fully utilize these instruments remotely can be very useful. By using Grid services to directly represent the instrument itself, as opposed to merely representing its generated data, we can achieve better integration of instruments into the grid computing infrastructure. This integration will lead to more effective utilization of these resources.
At the opposite end of the spectrum, scientists are currently engaged in building many large arrays of sensors. By bringing Grid service concepts and standards out to these sensors, we can streamline and unify scientific data processing.
Currently, most Grid service middleware is developed with the assumption that it will be run on large computers, and thus tends to have relatively large memory and CPU requirements. We believe that such demands are not inherent to grid services, however, and seek to develop lighter-weight grid middleware for sensors.
More details are available at the CIMA web site.