Comments
bruce.armstrong wrote: Somebody just said it better than I did, and with more chops to say it: Open Letter to Mark Zuckerberg, Sheryl Sandberg & Facebook Mobile
Cloud Computing
Conference & Expo
November 2-4, 2009 NYC
Register Today and SAVE !..
SYS-CON.TV
Today's Top SOA Links


XML and Distributed Computing
XML and Distributed Computing

There are three big challenges when implementing distributed computing systems: data transfer, interface management, and remote invocation. This article examines how XML can help with each of these, and how XML-based semantic messaging can unify disparate distributed architectures.

Most popular distributed computing models, such as DCE, DCOM, RMI, and CORBA, attempt to present the developer with the standard function/method invocation paradigm, which is exactly the same as a local invocation.

Foo (int a, char b, double c)
This is usually done by describing the interface using some type of interface definition language (IDL). IDL is operating system, hardware, and programming language independent, and provides a standard approach for the definition of the interface. IDL compilers exist for different operating system/programming language combinations, thus allowing for remote invocation across operating systems and programming languages. The code that's generated by the IDL compiler (and underlying middleware services) is responsible for accepting data from the sender in the format the sender uses, and for presenting it to the receiver in a format the receiver understands.

This paradigm is convenient for the developer, because it hides the distribution aspect from him or her, making it nearly transparent. Unfortunately, the penalty for this convenience is tight coupling between communicating applications. This usually doesn't pose a problem for local communications (within the application itself), but can be a real challenge for multiple applications within the enterprise or, even worse, in interenterprise communications.

The most typical problems when using this approach are:

  • Data format: Sending and receiving applications must have the same format and sequence of data.
  • Data semantics: The receiver must have the same understanding of the parameters' semantics (based on the order) as the sender.
  • Extraneous data: The information submitted by the sender must be the same as the receiver expects.

For several years semantic messaging has been suggested as a way to solve these problems. Semantic messaging is defined many different ways; even execution and transactional semantics have been considered part of the definition. Throughout this article we define a semantic message as purely data semantics - a message should contain data and a definition of what this data element represents. Thus, applications that deal with semantic messages are not driven by the sequence of data or its type, but rather by naming conventions (data semantics). This paradigm is significantly better suited for implementing a data transfer.

The simplest case of semantic messages is name/value pairs, which are used internally by distributed computing models. These types of messages are self-describing in the sense that transferred data is defined not by its position in the data stream, but rather by the name of this particular piece of data. Self-describing data allows two applications to share data semantics (names), instead of agreeing on an internal data representation and a data sequence within the message.

Instead of parsing incoming information and providing access to every piece (as in the IDL-based approach), a self-describing data approach presents all the input data to the application as a single message. This approach forces an application to extract the required information by parsing the incoming message.

XML enables self-describing structured data of any complexity to be implemented in a uniform fashion using XML documents. The availability of standardized XML parsers and the standard representation of XML documents (DOM) simplifies the parsing and extracting of XML data on the fly. Although this approach requires more effort than standard distributed computing models, it allows for significantly fewer coupled applications. A typical example of an XML-based semantic message is presented below:

<PurchaseOrder>
<OrderHeader>
<OrderID>5</ OrderID>
<BuyerID>111-0798</BuyerID>
</OrderHeader>
</PurchaseOrder>

By allowing arbitrary structure usage inside the documents, XML enables the introduction of arbitrary, complex-structured data semantics. An additional advantage of XML is that it allows for typeless data transmission, taking variable types out of the equation, so every application converts data to the type it wants to handle.

An additional problem of distributed systems is that two applications can share the same data, but use different semantics for it (for example, POID versus PurchaseOrderNumber). Standard XML tools such as XSL processors automate the transformation of the data semantics, simplifying the integration of the systems with different vocabularies.

Semantic messaging builds significantly less coupled systems for the following reasons:

  • The position of the data element within the message is irrelevant: The sender and receiver have to agree only on the data semantics (naming).
  • The receiving application looks only for the data with the predefined semantics: The sending application is free to include additional data without disrupting the receiver.
  • XML documents are effectively typeless: The sender and the receiver don't have to agree on the data types of the information exchanged.
  • The receiver programmatically processes input semantic messages: A set of defaults can be established for the parameters that are not submitted for the sender.
  • Standard XML DOM APIs allow both the sender and receiver to build and parse XML documents regardless of the programming language and operating system: Although XML-based semantic messages provide a powerful distributed systems model, two things must be kept in mind when implementing this approach:
  1. Use of semantic messages adds to the programming effort for distributed system creation: In this approach it's the responsibility of the application programmer, not the IDL compiler-generated code, to process incoming data.
  2. The semantic message processing code adds to the latency of the actual execution of the component that accepts semantic messages: This is partially compensated for by the fact that XML semantic messages are usually transport- ed in the form of a string, so the underlying distributed sys- tems software doesn't have to pack or parse transferred data.
The interesting question here is whether XML validation (DTDs or schemas) should be used for data transfer. Our feeling is that using XML validation in this case will defeat the purpose of the semantic messages, which is to make systems less coupled. Introducing XML validation is similar to using IDL. If this is done, the agility of the semantic interfaces disappears.

In our experience, using semantic messages leads to the creation of flexible, loosely coupled systems.

Interface Management
Interface management becomes an increasingly larger issue as the system grows. The problem here is that not only new functionality, but every required combination of input parameters entails the creation of a new interface. For example, if the same value of x can be calculated two different ways, based on parameters a and b or a and c, two interfaces are required:

Foo1 (double x, int a, char b)
Foo2 (double x, int a, double c)
Because these two interfaces essentially support the same functionality, this leads to the creation of very "wide" interfaces that require all feasible parameters to be present. For the example above, this is:
Foo (double x, int a, char b, double c)
This approach reduces the number of interfaces required, but forces all the applications invoking it to submit all the parameters they might not have. To solve this problem a system of flags, or something similar, is introduced to manifest the absence of certain parameters. Continuing with the above, we'll get something similar to the following:
Foo (double x, int a, char b, double c, boolean type)
where the value of "type" defines the actual type of invocation.

The overall approach is equivalent to introducing a proprietary implementation of data semantics. Due to the proprietary nature of this solution, it can't be generalized, thus requiring a new implementation for every interface.

XML-based semantic messages allow a single interface to be created for every required function with a well-defined signature using input and output parameters in the form of XML documents. For example:

Foo (xmldoc input, xmldoc output)
The input document contains any allowable combination of the input parameters, and the output document contains a combination of the output corresponding to the input parameters received. For the previous example, the input document can be either:
<input>
<a>5</ a>
<b>abc</b>
</ input>
or:
<input>
<a>5</ a>
<c>78</c>
</ input >
or even:
<input>
<a>5</ a>
</ input >,
if the proper set of defaults for b or c exists.

In the examples above, the same input document contains the different types of data the sender wants to submit for the actual invocation. Because the input document contains the information semantics (names of the data elements), there's no need for any additional "flags" (see example above) defining which information has actually been submitted. The receiving application can obtain this information by traversing the input document.

This approach also simplifies the interface maintenance. In the standard distributed environment, every time an interface changes on one of the systems, a corresponding change has to be applied to all the users of this interface. Even more, all these changes have to be implemented simultaneously, otherwise the overall system will stop functioning. Semantic messaging, which provides a more loosely coupled system, usually simplifies this problem by processing the parameters programmatically within the application itself. As we've discussed above, a well-defined system of defaults can allow for incremental changes in both sending and receiving applications.

Remote Invocation
All distributed environments implement remote invocation the same way: a proxy of the remote component (object) is created within the address space of the sender and a stub is created in the address space of the receiver. The sender communicates with the proxy, which in turn talks to the stub that communicates with the receiver (see Figure 1).

The introduction of the proxy and stub, though simplifying the programming model, complicates the overall system and makes it more expensive computationally and memory-wise. In this architecture, remote communications between components are encapsulated in the proxy/stub communications, which are generated by an IDL compiler that's based on the middleware APIs. The sender only communicates with the local proxy, and the receiver gets all the requests from the local stub.

Because proxies and stubs are created in the memory space of the sender and receiver, every connection requires additional memory. In the simple example in Figure 2, a component from the client process connects to four different components in the server process. The client process contains four proxies and four server stubs. The server supports 100 clients and there are 400 stubs, which adds up to a significant amount of memory. These stubs also have to be created and destroyed at some point, increasing the overall computational expenses.

A separate logical connection is required between proxy/stub pairs. Although most of the distributed model implementation can share the underlying physical connection (usually TCP/IP), establishing a proxy/stub connection is an expensive operation. As a result, all distributed computing models recommend establishing a connection once and keeping it open for the life of the component on the sender. This leads to the synchronization of the life cycles of the components on the receiver and the sender. When the receiver is a server, this leads to reduced scalability (due to the memory usage), which is usually not acceptable. Both DCOM and EJB specifications are introducing intermediate context objects to solve this problem.

XML-based semantic messaging for stateless components provides an elegant solution (see Figure 3). Stateless here means having no conversational state with the receiver. Stateless components can still have persistent state, which is kept in the database.

A "gate" object is introduced that represents all the components in the process. It can send messages to the process rather than to any particular component of the process. The number of proxy/stub pairs, in this case, is one (for the gate object), unlike four in Figure 2. A single gate accepts all the messages destined for any component within the process and orchestrates the actual execution. Since XML doesn't impose any limitations on the message semantics, it's possible to place every message in the standard envelope. This envelope includes the component name and the method name of the component for which the message is sent. The gate object above consists of two major parts (see Figure 4).

The message receiver is a listener on a particular protocol (DCOM, CORBA, RMI, etc.). This object is responsible for accepting messages using a predefined communications protocol. After the message is received, the rest of the communications are internal for the receiving process. The message router is responsible for parsing the envelope and extracting the name of the component, the method for which the message is intended, and the message itself. The router then instantiates the appropriate component, passes control to it for execution, and deletes it when processing is complete.

The idea of the gate object is not a new one. It's somewhat similar to the "façade" pattern or stateless session beans introduced by J2EE. The biggest difference here is using the power and flexibility of XML-based semantic messaging, which standardizes the gate interface and makes the relationship between the gates and supported components dynamic. This is the only architecture that doesn't require rewriting gate to support additional components or additional methods on the components. Support for introspection, introduced in most of the modern middleware environments, makes implementation of routers even simpler.

So far we've discussed using the gate object to "multiplex" access to components within the process. The same architecture can also be used to multiplex access to the distributed system based on certain middleware products. In this case, gate can serve as a "gate" and a "bridge," meaning that in addition to multiplexing, it converts messages between multiple middleware types. For example, gate can accept RMI messages while internal communications can be DCOM.

Let's analyze the complexities of "bridging" components running on top of different middleware products and see how XML-based semantic messages can help solve them.

Name Resolution
Different middleware systems usually have different naming services, which makes name resolution inherently complex. This name resolution has to be done for every component that tries to invoke a component running in different middleware. By enhancing the architecture as in Figure 3 (we don't show stubs and proxies here because they're not relevant to our discussion), it's possible to move name resolution functionality to the outbound gate, which relieves every component from this operation and encapsulates it. Basically, if a component isn't found within the local middleware platform, the request is forwarded to the outbound gate, which has to resolve the name and forward the request to the receiver's inbound gate. The proposed system of inbound/outbound gates makes the granularity of the name resolution significantly lower, simplifying this problem.

Components Invocation
Different middleware products have different proxy/stub implementations, and the majority of bridging solutions use two proxies (native proxy and the proxy from the receiver's middleware) for remote invocation. Introducing the generalized gate architecture (see Figure 5) simplifies this mechanism by limiting invocation requirements. Basically, only outbound gates have to invoke appropriate inbound gates.

Data Transformations
Every middleware product implements its own data packing, representation, and conversion approaches. As a result, synchronized data representation between multiple middleware packages is often a challenge. XML-based messaging simplifies this problem enormously by using string representation of the data, which is fairly standard, between multiple middleware systems.

Gate-based architecture solves the following problems:

  • Minimizes the amount of required proxy/stub pairs to one per process versus one per component.
  • Doesn't require coupling of component's life cycles to improve performance. Connection is established to the process and components are created dynamically based on incoming requests.
  • Load balancing is simpler in this architecture as it's based on processes, not components.
  • Bridges multiple middleware platforms into a seamless environment.
To simplify message router implementation in a gate-based approach, XML validation makes sense on the level of envelope processing, but we still assume nonvalidated XML for the message itself.

This type of architecture is the foundation of today's Web technology. The Web server can be viewed as a complex gate bridging HTTP communications with the system's internal middleware, whether it's CGI, RMI, CORBA, or DCOM (see Figure 6).

Traditionally routing was based on URL and data transfers on semantic name/value pairs submitted to the Web server. Advances in B2Bi and XML are changing this approach by introducing XML-based semantic messaging, and incorporating routing information and data (very similar to the solution we discussed above). Examples of implementations of XML semantic messages over the Web are XML and B2B servers and Web services.

Unified Distributed Systems Architecture
Unifying both Web and non-Web cases by using XML-based semantic messages and gate-based architecture can create a generic architecture and provide any kind of access protocol (HTTP, CORBA, DCOM, etc.) (see Figure 7).

Presented architecture lets you bridge multiple systems running on different middleware platforms in a standardized fashion. The advantages of this generic architecture are:

  • XML-based semantic messages enable flexible, loosely coupled messaging between multiple systems.
  • Gate objects coupled with XML-based semantic messages simplify complex operations, such as name resolution, remote invocation, and data transformation between disparate middleware environments.
  • It's easily extendable since the introduction of the additional components and/or methods on an existing component doesn't require modifications to the routers that control access to the components.

The extension of described architecture is conversational state support, which can be easily introduced in a fashion similar to session support in the Web servers. The conversational state of the components can be externalized in the form of XML documents and stored in the XML router. When a new request arrives, the router can append state information to the request data.

When implementing this architecture, it's important to use XML-based semantic messaging correctly. Although using semantic messaging adds flexibility and agility to the overall system, it also usually increases the latency of the overall execution. This is why I suggest using XML-based semantic messages as follows:

  • For communications between multiple systems and/or processes
  • As an internal middleware platform for communications within particular processes

Conclusion
XML-based semantic messaging revolutionized distributed systems development. The main advantages of using semantic messaging for building distributed systems are:

  • Significantly more flexible data transfer: XML-based semantic messaging lets you deal with data semantics rather than data position and type.
  • Simplified interface management: XML-based semantic messaging lets you simplify interface management by expressing parameters in the form of XML documents, thus making them more generic and more resilient to the changes of the parameters.
  • Simplified remote invocation: XML-based semantic messaging, coupled with the introduction of the gate object, minimizes the amount of required proxy/stub pairs to one per process and doesn't require coupling the component's life cycles to improve the overall performance.
Using XML-based semantic messages and a gate-based approach you can create a generic architecture that provides any kind of access protocol (HTTP, CORBA, DCOM, etc.) to the existing systems.
About Boris Lublinsky
Boris Lublinsky is an Enterprise Architect at CNA Insurance where he is involved in design and implementation of CNA’s integration strategy, building application frameworks and implementing service-oriented architecture for the company Prior to this he was Director of Technology at Inventa Technologies, where he was overseeing and actively participating in engagements in EAI and B2B integration implementations and development of large-scale web applications. While a Technical Architect at Platinum Technology and SSA, Boris was involved in component-based systems development and design and implementation of execution platforms for component-based systems. In all, he has over 25 years experience in software engineering and technical architecture.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

SYS-CON Featured Whitepapers
ADS BY GOOGLE