Comments
suedunnell wrote: Hi Again - I should add my name to comment #1 above and ask that if anyone has questions, they can either post them here or ask me directly: Sue Dunnell PowerBuilder Product Manager 978 287 1752 sue.dunnell@sybase.com
Cloud Computing
Conference & Expo
November 2-4, 2009 NYC
Register Today and SAVE !..

SYS-CON.TV
Today's Top SOA Links


XML Compression and Its Role in SOA Performance
Dealing with the increased size of SOA payloads

Looking at the uncertainty and volatility of market conditions today, enterprises are depending on new cutting-edge technology to have an edge over their fierce competitors. At the same time, they try extracting more value from their existing IT investments. Adding to these disparate applications and technologies are the acquisitions and mergers that inherently bring in different sets of applications.

Better integration of these myriad applications built on different technologies clearly makes them more valuable. Using Service Oriented Architecture (SOA), enterprises can not only achieve better integration but also be future-ready as an agile enterprise that can swiftly respond to change in business processes.

XML and Its Role in SOA
XML is emerging as the lingua franca of data representation and exchange across applications interacting in an SOA world. A close look at the standards stack for SOA (Figure 1) shows that XML is the foundation for all the Web Services standards like XML Schema, SOAP, WSDL, and UDDI. These standards leverage the core concept of XML-based representations to carry out information interchange between service providers and requestors in a SOA.

Notwithstanding the core syntactic standards of SOA as shown in Figure 1, semantics is another important dimension that plays a crucial part in communication between a service provider and a service consumer in an SOA infrastructure and requires that the contents of the messages be mutually understood, which leads to the requirement of semantic interoperability.

XML solves the semantic interoperability problems associated with working with different data formats in different applications across multiple platforms. Different vertical business domain stakeholders have come to together and defined shared XML-based vocabularies to solve the semantic interoperability issue. (See http://xml.coverpages.org for a comprehensive list of such standardization efforts.) Using XML inherently brings ease of representation since it's text-based, flexible, and extensible. The platform- and language-independence of XML has catalyzed it as SOA's mainstream representation format.

SOA Performance Challenge and XML Compression Solutions
While self-describing XML-based service descriptions and messages in SOA make the data exchange easier, lending reusability and extensibility, they also increase the size of the data significantly. This is because the XML message typically contains not only the data as text, but also the format of the data. It contains all the information about the data presentation to the end user like font, size, and style. The verbosity of text-based representation by itself also tends to increase the data size in SOA payloads. So XML data representation not only increases data storage and data transfer times in SOA but also increases data parsing times in the context of a SOA, creating a performance challenge for a SOA.

The following are the salient points driving the need for compressing XML document in the context of SOA:

  1. Redundant data in XML documents, e.g., white space, similar node names.
  2. Text-based XML document sizes tend to be large.
  3. The need for an efficient way to store files based on XML.
  4. Large volumes of XML data sent over the network as SOAP payloads.
One category of industry solutions used to solve SOA performance management problems rely on the notion of XML compression. These solutions leverage use compression techniques to reduce the size of the XML payloads being carried in the SOA messages and transfer data in compressed format. However, there is the additional cost of compression/decompression at either end that has to be accommodated when computing the overall cost.

While the issues related to data storage and data transfer times can be resolved to a significant level by using compression techniques, the problem related to the processing overhead can be solved using both software and hardware solutions. A variety of tools and methodologies are already on the market to overcome XML processing limitations. Some prominent categories of tools and technologies that help overcome the limitations associated with using XML are briefly mentioned here:

XML Hardware
Large XML data processing will consume enormous amounts of CPU, memory, and network bandwidth. Traditionally there were processors that did general-purpose processing, but with the advent of XML and XML-based applications a new breed of custom acceleration processors are being developed. This specialized hardware, called XML accelerators, not only accelerate time-consuming tasks like XSL transformation and schema validation, but security-related features like encryption. These operate over networks and perform XML processing at wirespeed. XML accelerators are network devices that offload overtaxed servers by processing XML at a higher speed.

Compact Representation
A key premise in this approach is to use a compact representation to compact the size of the message being carried around. One mechanism is to have XML transferred in compact encodings like Abstract Syntax Notation. The usual textual format of XML offers no way to determine the end of a data value; hence the application has to examine every byte received. In this case the time consumed is increased and performance isn't that great. A different approach would be to represent XML in a binary format such as Abstract Syntax Notation number One (ASN1). This notation is associated with standardized encoding rules such as the Basic Encoding Rules (BER) and Packed Encoding Rules (PER) and is useful for applications that have bandwidth restrictions. This significantly reduces the time consumed and enhances performance.

XML Cache/Component Parsers
Repeatedly used XML data can be cached to reduce XML processing overheads. Similarly specific XML parsers can be used that cater to the specific needs of an SOA application.

XML Software Compression
Since XML is text-based, we can use gzip, bzip, etc. like techniques that leverage Lempel-Ziv and Huffman Encoding Algorithms for compression. These compression techniques are generic text compressors and they're effective and have very good compression ratios too. These techniques are good for sequential data, but unlike normal text, XML data is tree-structured data. XMILL from AT&T is a focused XML compression technique. It regroups similar XML nodes and uses conventional compressors such as gzip to compress the result of the regrouping of nodes. A comparison of the salient features of gzip and XMILL are as below:
gzip:

  • available in both Open Source and commercial implementations
  • Provides a good compression rate
  • free from patented algorithms
  • knowledge of the document structure isn't needed XMILL
  • better compression rate compared to gzip (by a factor of two)
  • it separates structure from content
  • moderately faster than gzip
  • three types of compressors available:
    - atomic compressors for the basic data types
    - Combined compressors
    - User-defined compressors
Binary XML Standards: XOP, MTOM & RRSHB
New schemas are being developed to solve the problem of exchanging large documents between the service provider and the consumer. These schemas address the problem of fitting binary data directly into an XML message.

MTOM is a description of how XML-binary Optimized Packaging (XOP) is layered into a SOAP HTTP transport and uses XOP to let SOAP bindings speed up data transmission by selective encoding portions of the XML message. But MTOM uses a MIME package as opposed to XML and has the overhead of MIME processing to base-64 encoding.

Resource Representation SOAP Header Block (RRSHB) sends all the data needed to process the message. It can send a Web resource as a part of the SOAP message. This is specific to those cases where access to the resource is restricted to the body of the message and there is network overhead.

Conclusion
SOA infrastructure relies heavily on XML to be the lingua franca, and effective SOA performance management requires efficient ways of handling XML. XML compression techniques can go a long way in handling the SOA performance challenge. Needless to say, specific application needs are very decisive in choosing a compression technique from the myriad of techniques mentioned in this article.

References

  1. XMILL http://sourceforge.net/projects/xmill
  2. gzip www.gzip.org
  3. Datapower XML hardware www.datapower.com/products/xa35.html
  4. Sarvega Hardware www.sarvega.com/xml-security-products.html
  5. www.w3.org/TR/soap12-mtom/
  6. www.w3.org/TR/soap12-rep/
  7. www.w3.org/TR/soap12-mtom/#XOP
About Dr. Srinivas Padmanabhuni
Dr. Srinivas Padmanabhuni is a principal researcher with the Web Services Centre of Excellence in SETLabs, Infosys Technologies, and specializes in Web Services, service-oriented architecture, and grid technologies alongside pursuing interests in Semantic Web, intelligent agents, and enterprise architecture. He has authored several papers in international conferences. Dr. Padmanabhuni holds a PhD degree in computing science from University of Alberta, Edmonton, Canada.

About Akash Saurav Das
The authors are interning and/or working as part of the Web Services COE (Center of Excellence) for Infosys Technologies, a global IT consulting firm, and have substantial experience in publishing papers, presenting papers at conferences, and defining standards for SOA and Web services. The Web Services COE specializes in SOA, Web services, and other related technologies.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

Looking at the uncertainty and volatility of market conditions today, enterprises are depending on new cutting-edge technology to have an edge over their fierce competitors. At the same time, they try extracting more value from their existing IT investments. Adding to these disparate applications and technologies are the acquisitions and mergers that inherently bring in different sets of applications.

Looking at the uncertainty and volatility of market conditions today, enterprises are depending on new cutting-edge technology to have an edge over their fierce competitors. At the same time, they try extracting more value from their existing IT investments. Adding to these disparate applications and technologies are the acquisitions and mergers that inherently bring in different sets of applications.


Your Feedback
SYS-CON India News Desk wrote: Looking at the uncertainty and volatility of market conditions today, enterprises are depending on new cutting-edge technology to have an edge over their fierce competitors. At the same time, they try extracting more value from their existing IT investments. Adding to these disparate applications and technologies are the acquisitions and mergers that inherently bring in different sets of applications.
SOA Web Services Journal News wrote: Looking at the uncertainty and volatility of market conditions today, enterprises are depending on new cutting-edge technology to have an edge over their fierce competitors. At the same time, they try extracting more value from their existing IT investments. Adding to these disparate applications and technologies are the acquisitions and mergers that inherently bring in different sets of applications.
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021


SYS-CON Featured Whitepapers


ADS BY GOOGLE