Comments
Richard Davies wrote: The UK has a good crop of technology pioneers in cloud computing - for example ElasticHosts, FlexiScale, Flexiant, OnApp - and also some strong government initiatives such as G-Cloud. We will have to see whether this kind of technical leadership converts into swift mass-market adoption or not.
Cloud Computing
Conference & Expo
November 2-4, 2009 NYC
Register Today and SAVE !..
SYS-CON.TV
Today's Top SOA Links


High-Performance Data Services with Smart Caching
Reduce potential performance bottlenecks and ensure timely delivery of information

One of the main concerns among IT architects planning an implementation of an enterprise data virtualization layer in their service-oriented architecture (SOA) or overall information system is the performance of the participating data services. Performance becomes particularly important in real- or near-real-time environments as well as in environments with highly distributed data sources where network latency cannot be controlled. This article examines how to reduce potential performance bottlenecks by utilizing high-performance caching with data virtualization middleware. Different scenarios within single-, cluster- and distributed-caching implementations are covered.

Introduction
A data virtualization implementation normally includes a wide variety of data sources, both relational and non-relational, often distributed across several business units, and sometimes located in different geographical regions. Therefore, the data virtualization layer's performance highly depends on response latency. As the amount of data retrieved through the data virtualization layer increases, network latency can quickly turn into a bottleneck for the entire IT department overseeing the implementation. If the implementation includes various client applications distributed across multiple business units or geographies, the data virtualization layer implementation has to include response latency as one of the major SLA items.

In its most simple implementation, the data virtualization layer's response latency comprises three factors: the data sources, the middleware layer, and the network. Hence, when the data sources have non-uniform performance characteristics, located on networks with varying throughput capacity, the total request/response latency can be measured as follows: (see Figure 1):

Total response latency = MAX(DSL1, DSL2, ... DSLN) + MAX(DSS1, DSS2... DSSN) + MAX(DSC1, DSC2...DSCN) + DVL, where

DSL - Data Source Latency|
DSS - Data Source Network Subnet Latency
DSC - Client Network Subnet Latency
DVL - Data Virtualization Latency

In deployments where the data virtualization middleware, all data sources and all clients are located on the same subnets, network latency for the data sources and clients will be similar and more or less constant. Therefore, to reduce the response latency of the entire solution, architects should concentrate on reducing the latency of the slowest data source to minimize the amount of time the data virtualization middleware idles, waiting for a response. While changing the data source or partitioning it are both valid options, a less invasive approach is to utilize a high-performance caching system in the data virtualization layer. There is a wide variety of implementation choices available in the data virtualization layer, starting with a simple table-level caching, more advanced materialized-view caching, and the most complex dynamic result-set caching. Using these options, IT architecture teams can achieve a performance boost of their data virtualization deployments, ranging from 10 percent to 50 percent (see Figure 2).

The effectiveness of caching in the data virtualization layer depends on a number of additional solution characteristics, including the frequency of changes in the underlying data and the applications' tolerance for "stale" data (e.g., the frequency with which the cache system has to refresh itself to comply with the required SLA). It is easy to imagine a case when caching, if not implemented properly, actually slows down the overall solution performance. This can happen if the underlying source data changes frequently and the client application requires access to the real-time data. In this case, most application requests will result in a cache miss, and therefore will either initiate a pass-through request or cache refresh. The additional time it takes the request to travel to the cache system and back will actually add to the overall latency. Hence, the architecture team needs to consider a number of critical characteristics before deciding if caching is suitable for its environment and business requirements. In this scenario, implementing an incremental cache update utilizing change data capture will eliminate the need for a full data refresh, yet still provide freshly updated data to the requesting application, while maintaining SLA requirements (see Figure 3).

While performance improvement is typically the most sought after benefit of the caching system in the data virtualization implementation, an overlooked, but equally important, advantage is the reduced impact or stress on the production systems. With the caching system enabled, many, if not most, of the client requests will be fulfilled by cached data, thus reducing the number of requests going against the production data sources. With high-request volumes, this additional benefit supplements the performance gain benefits of the caching system

Single Cache Instance Implementations
Single cache instance deployment is the most basic implementation in the data virtualization layer. Single-cache instances are typically preferred for small- to medium- sized departmental projects with low-to-moderate client load activity. As mentioned earlier, caching systems can improve data virtualization layer performance and reduce stress on the production data sources, depending on the implementation characteristics. If performance improvement is the primary objective, the implementation team should consider co-locating the cache on the same subnet as the data virtualization middleware, to minimize the network latency between the middleware and the cache. This is an important consideration because the caching system typically bears the brunt of the request load and hence the amount of traffic between the middleware and the caching system is expected to increase significantly. In situations where cached data is relatively small, but is accessed frequently, it may be beneficial to collocate the caching system with the middleware on the same blade server, thus eliminating network latency altogether. To further improve the performance of the cache collocated on the same blade, the cache database may be configured to pin the cache table into memory, and therefore further reduce the time needed to fetch the data from the cache (see Figure 4).

Depending on the topology and nature of the underlying data sources, the caching system may be configured to cache raw table data, materialized views or procedural data. Caching raw table data is suitable for environments where the performance of a single data source is significantly worse than the rest of the data sources, causing the data virtualization middleware to idle while waiting for a response. Caching the table data from slow data sources into the higher performance caching system improves the performance of the overall solution by removing the incremental latency delta associated with the idling middleware (see Figure 5). Materialized-view caching is most suitable when numerous clients send identical requests, therefore clogging production systems with requests that elicit identical responses. In such scenarios, the data virtualization middleware will execute the first client request as usual against the production systems, and then cache it, instead of discarding the returned result-set, so that subsequent client requests will be fulfilled by the cache system instead of the production systems. Finally, if one or more of the underlying data sources is a web service with long or unpredictable response latency, then enabling procedural caching will allow the data virtualization middleware to optimize the overall performance by caching the result-sets returned by the web service sources based on the passed parameters and thus eliminating potential web-service latency.

Cluster Cache Implementation
For more complex deployments, such as environments with heavy client request loads, a single instance of middleware and a single cache instance might be insufficient to handle all the requests within the allotted SLA. In such cases, the most common approach is to cluster the data virtualization middleware into multiple nodes. Although middleware clustering adds capacity to handle additional client requests, it also exacerbates the load on the production data sources, because each individual client request, even if the subsequent requests are identical, is executed against the production data sources. Enabling a caching system in a clustered environment, therefore, will potentially have a significant impact on the solution's performance as well as on offloading stress from the production systems (see Figure 6).

Distributed Cache Implementation
Finally, in environments where one or multiple clients are located remotely, a distributed caching system helps reduce the network latency associated with frequent requests over long networks. Such a distributed caching system typically has a central cache repository and multiple remote edge caches for servicing requests from the remote clients. There is usually no need for the edge caches to replicate the central cache system one to one -because the edge cache system monitors remote client requests, it can simply replicate the portion of the central cache that is relevant to its client requests. After initial replication, edge caches register change data capture requests with the central cache and are notified automatically whenever the central cache data changes, thus eliminating the need for a complete edge cache re-sync (see Figure 7).

Conclusion
As global enterprises and government agencies implement data virtualization to federate data across disparate systems and geographic locations, IT teams are considering the data virtualization layer's performance in relation to the overall information system. By using advanced data virtualization middleware with high-performance caching, IT architects can reduce potential performance bottlenecks and thus ensure timely delivery of information.

About Avtandil Garakanidze
Avtandil Garakanidze currently serves as the Vice President of Product Management and Strategy at Composite Software Inc, the leader in data virtualization solutions. Prior to Composite, he held executive and senior product and engineering management positions with high-tech companies including Symantec/VERITAS, Siebel Systems, Yahoo! and Starfish Software. Garakanidze earned an MBA from MIT’s Sloan School of Management and an MS from the Georgian Technical University.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

SYS-CON Featured Whitepapers
ADS BY GOOGLE