Web Services Performance Projects



Binary XML

XML is text-based format for structured data. In addition, however, XML also implicitly defines an abstract, tree-structured, collection of information. This is known as the XML infoset.

The concept of the infoset can be seen in APIs such as DOM. We can also make the relationship more explicit by defining an infoset API (which would likely resemble DOM). If applications are then written to this infoset API, the actual form of the XML can then change without rewriting the program.

This is the basic idea of binary XML. We define a serialization of XML that is not text-based, but rather closer to the in-memory representation of the data. We can thus achieve significant performance advantages, especially for scientific computing.

Schema-Specific Parsing

Most applications today use a general-purpose XML parser. The parser takes any well-formed XML, and parses it into a set of data structures. These data structures are then validated, either implicitly or explicitly, to ensure sure that the XML document conforms to what is expected by the application.

The conventional wisdom is that XML parsing and validation are slow. But if one were to hand-write parsing and validation code based on knowing exactly what XML is expected, one could write much more efficient code than what is typically used for parsing and validation. This suggests that schema information can be used during parsing to improve performance, using what has been termed schema-specific parsing.

Proteus Multiprotocol Web Services Library

SOAP is a widely-adopted communications format for distributed systems. Unfortunately, being XML-based, it can be quite slow, especially for scientific computing. This suggests that SOAP be used for an initial negotiation phase, after which we switch to a faster protocol, such as binary XML, if both sides understand it.

Proteus is a multiprotocol library for facilitating such switching.