|
Comments
|
Today's Top SOA Links
Features High-Performance Data Services with Smart Caching
Reduce potential performance bottlenecks and ensure timely delivery of information
Jan. 23, 2010 03:30 PM
One of the main concerns among IT architects planning an implementation of an enterprise data virtualization layer in their service-oriented architecture (SOA) or overall information system is the performance of the participating data services. Performance becomes particularly important in real- or near-real-time environments as well as in environments with highly distributed data sources where network latency cannot be controlled. This article examines how to reduce potential performance bottlenecks by utilizing high-performance caching with data virtualization middleware. Different scenarios within single-, cluster- and distributed-caching implementations are covered. Introduction In its most simple implementation, the data virtualization layer's response latency comprises three factors: the data sources, the middleware layer, and the network. Hence, when the data sources have non-uniform performance characteristics, located on networks with varying throughput capacity, the total request/response latency can be measured as follows: (see Figure 1): Total response latency = MAX(DSL1, DSL2, ... DSLN) + MAX(DSS1, DSS2... DSSN) + MAX(DSC1, DSC2...DSCN) + DVL, where DSL - Data Source Latency| In deployments where the data virtualization middleware, all data sources and all clients are located on the same subnets, network latency for the data sources and clients will be similar and more or less constant. Therefore, to reduce the response latency of the entire solution, architects should concentrate on reducing the latency of the slowest data source to minimize the amount of time the data virtualization middleware idles, waiting for a response. While changing the data source or partitioning it are both valid options, a less invasive approach is to utilize a high-performance caching system in the data virtualization layer. There is a wide variety of implementation choices available in the data virtualization layer, starting with a simple table-level caching, more advanced materialized-view caching, and the most complex dynamic result-set caching. Using these options, IT architecture teams can achieve a performance boost of their data virtualization deployments, ranging from 10 percent to 50 percent (see Figure 2). The effectiveness of caching in the data virtualization layer depends on a number of additional solution characteristics, including the frequency of changes in the underlying data and the applications' tolerance for "stale" data (e.g., the frequency with which the cache system has to refresh itself to comply with the required SLA). It is easy to imagine a case when caching, if not implemented properly, actually slows down the overall solution performance. This can happen if the underlying source data changes frequently and the client application requires access to the real-time data. In this case, most application requests will result in a cache miss, and therefore will either initiate a pass-through request or cache refresh. The additional time it takes the request to travel to the cache system and back will actually add to the overall latency. Hence, the architecture team needs to consider a number of critical characteristics before deciding if caching is suitable for its environment and business requirements. In this scenario, implementing an incremental cache update utilizing change data capture will eliminate the need for a full data refresh, yet still provide freshly updated data to the requesting application, while maintaining SLA requirements (see Figure 3). While performance improvement is typically the most sought after benefit of the caching system in the data virtualization implementation, an overlooked, but equally important, advantage is the reduced impact or stress on the production systems. With the caching system enabled, many, if not most, of the client requests will be fulfilled by cached data, thus reducing the number of requests going against the production data sources. With high-request volumes, this additional benefit supplements the performance gain benefits of the caching system Single Cache Instance Implementations Depending on the topology and nature of the underlying data sources, the caching system may be configured to cache raw table data, materialized views or procedural data. Caching raw table data is suitable for environments where the performance of a single data source is significantly worse than the rest of the data sources, causing the data virtualization middleware to idle while waiting for a response. Caching the table data from slow data sources into the higher performance caching system improves the performance of the overall solution by removing the incremental latency delta associated with the idling middleware (see Figure 5). Materialized-view caching is most suitable when numerous clients send identical requests, therefore clogging production systems with requests that elicit identical responses. In such scenarios, the data virtualization middleware will execute the first client request as usual against the production systems, and then cache it, instead of discarding the returned result-set, so that subsequent client requests will be fulfilled by the cache system instead of the production systems. Finally, if one or more of the underlying data sources is a web service with long or unpredictable response latency, then enabling procedural caching will allow the data virtualization middleware to optimize the overall performance by caching the result-sets returned by the web service sources based on the passed parameters and thus eliminating potential web-service latency. Cluster Cache Implementation Distributed Cache Implementation Conclusion Reader Feedback: Page 1 of 1
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
|
SYS-CON Featured Whitepapers
Most Read This Week |
|||||||||||||||||||||||||||