{"id":42651,"date":"2016-11-26T10:40:33","date_gmt":"2016-11-26T05:10:33","guid":{"rendered":"http:\/\/www.tothenew.com\/blog\/?p=42651"},"modified":"2017-01-26T16:22:48","modified_gmt":"2017-01-26T10:52:48","slug":"caching-what-why-and-how-with-hazelcast","status":"publish","type":"post","link":"https:\/\/www.tothenew.com\/blog\/caching-what-why-and-how-with-hazelcast\/","title":{"rendered":"Caching: What, Why and How with Hazelcast"},"content":{"rendered":"<p><span style=\"font-weight: 400\"> In modern user facing and <a title=\"Application Development\" href=\"http:\/\/www.tothenew.com\/mobile-application-development-services\" target=\"_blank\">real-time applications<\/a>, performance is the top concern with usually having data at its core. <strong>What if you were able to\u00a0offload some work from the database and at the same time increase the performance and <a title=\"MEAN stack development\" href=\"http:\/\/www.tothenew.com\/java-development-services\" target=\"_blank\">response times of your application<\/a>?<\/strong><\/span><\/p>\n<p><span style=\"font-weight: 400\">Most of the time data handling is done by relational databases which provide access to data. Applications talk directly to the database which at times can have a hot-swappable backup machine. In most cases, the only solution to\u00a0increase the application performance is to scal<\/span><span style=\"font-weight: 400\">e it vertically, i.e, increase more RAM, add more CPU power, etc. and that comes at a cost.<\/span><\/p>\n<p><strong>What is Software Caching?<\/strong><\/p>\n<p>In <a href=\"https:\/\/en.wikipedia.org\/wiki\/Computing\">computing<\/a>, a cache is a hardware or software component that stores data for server\u00a0future data requests faster. The data stored in a cache can be the result of an earlier computation, or the duplicate of data stored elsewhere.<\/p>\n<p><strong>Why it is needed?<\/strong><\/p>\n<p><span style=\"font-weight: 400\">We all are\u00a0aware of the cache used by CPU\/hardware that helps to process faster. What if this concept can be taken to the application\/software level.\u00a0This is where Caching helps.<\/span><\/p>\n<p>There are practices prevalent in the industry to use such kind of solutions in many forms, they may be known as key-value stores (Redis, etc.), in-memory databases (Gridgain, etc.) and many more. In fact, one type of software cache that we all might have used someday is Hibernate\u2019s second level caching.<\/p>\n<p><strong>How Caching is done?<\/strong><\/p>\n<p><span style=\"font-weight: 400\">Below are few possibilities of handling such scenarios at an architectural level:<\/span><\/p>\n<p><strong><span style=\"font-weight: 400\">1. You would either let the application consult the cache first for data and if it is a miss, then the same application consults the backing store for the data. A schematic diagram is shown below.<\/span><\/strong><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-42652 size-full\" src=\"\/blog\/wp-ttn-blog\/uploads\/2016\/11\/Screen-Shot-2016-11-25-at-5.10.39-PM.png\" alt=\"Cache Hazelcast\" width=\"1272\" height=\"680\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2016\/11\/Screen-Shot-2016-11-25-at-5.10.39-PM.png 1272w, \/blog\/wp-ttn-blog\/uploads\/2016\/11\/Screen-Shot-2016-11-25-at-5.10.39-PM-300x160.png 300w, \/blog\/wp-ttn-blog\/uploads\/2016\/11\/Screen-Shot-2016-11-25-at-5.10.39-PM-1024x547.png 1024w, \/blog\/wp-ttn-blog\/uploads\/2016\/11\/Screen-Shot-2016-11-25-at-5.10.39-PM-624x333.png 624w\" sizes=\"(max-width: 1272px) 100vw, 1272px\" \/><\/p>\n<p><strong><span style=\"font-weight: 400\">2. Or else, you would take that functionality out from the application into a separate application\/process and the application will transparently get data from the 3rd application which will be responsible for fetching data from cache\/database and keeping the cache in sync with the backing store. This particular approach is termed as Read-Through\/Write-Through approach. Refer\u00a0Diagram below.<\/span><\/strong><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-42653 size-full\" src=\"\/blog\/wp-ttn-blog\/uploads\/2016\/11\/Screen-Shot-2016-11-25-at-5.12.26-PM.png\" alt=\"Cache Hazelcast\" width=\"1314\" height=\"528\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2016\/11\/Screen-Shot-2016-11-25-at-5.12.26-PM.png 1314w, \/blog\/wp-ttn-blog\/uploads\/2016\/11\/Screen-Shot-2016-11-25-at-5.12.26-PM-300x120.png 300w, \/blog\/wp-ttn-blog\/uploads\/2016\/11\/Screen-Shot-2016-11-25-at-5.12.26-PM-1024x411.png 1024w, \/blog\/wp-ttn-blog\/uploads\/2016\/11\/Screen-Shot-2016-11-25-at-5.12.26-PM-624x250.png 624w\" sizes=\"(max-width: 1314px) 100vw, 1314px\" \/><\/p>\n<p>The first approach fails when the database is saturated, or the application performs mostly the &#8220;put&#8221; operations (writes). Further, this approach is of no use because it offloads the database only from the &#8220;get&#8221; loads (reads). Even if the applications are read-intensive, there can be consistency problems in case if\u00a0data changes. This is when concepts like time-to-live (TTL) or write-through comes in.<\/p>\n<p><strong><span style=\"font-weight: 400\">Also, in the case of TTL, if the access is less frequent than the TTL value, the result will always be a cache miss.\u00a0<\/span><\/strong><\/p>\n<p><b>Above discussion\/concerns regarding caching can be summarized\u00a0as under:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Memory size<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Synchronization complexity:<\/span>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Consistency between the cached data state and data source&#8217;s original data<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Maintaining consistency in multiple nodes where data is replicated<\/span><\/li>\n<\/ul>\n<\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Durability :<\/span>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Eviction Policy. E.g: LRU, LFU, FIFO<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Eviction Percentage<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Expiration. E.g TTL<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><b>Cache Types<\/b><span style=\"font-weight: 400\">:<\/span><\/p>\n<ol>\n<li><span style=\"font-weight: 400\"><strong>Local Cache:<\/strong> It is local to particular application server \u00a0<\/span>\n<ul>\n<li><strong>Pros:<\/strong>\n<ul>\n<li><span style=\"font-weight: 400\">Simplicity<\/span><\/li>\n<li><span style=\"font-weight: 400\">Performance<\/span><\/li>\n<li><span style=\"font-weight: 400\">No serialisation \/ deserialisation overhead<\/span><\/li>\n<\/ul>\n<\/li>\n<li><strong>Cons:<\/strong>\n<ul>\n<li><span style=\"font-weight: 400\">Not a fault-tolerant<\/span><\/li>\n<li><span style=\"font-weight: 400\">Scalability<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li><span id=\"replicated_cache\" style=\"font-weight: 400\"><strong>Replicated cache:<\/strong> It replicates all its data to all cluster nodes<\/span><\/li>\n<\/ol>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignleft wp-image-43103 size-full\" src=\"\/blog\/wp-ttn-blog\/uploads\/2016\/11\/distributed-cache.jpg\" alt=\"Replicated Cache\" width=\"985\" height=\"595\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2016\/11\/distributed-cache.jpg 985w, \/blog\/wp-ttn-blog\/uploads\/2016\/11\/distributed-cache-300x181.jpg 300w, \/blog\/wp-ttn-blog\/uploads\/2016\/11\/distributed-cache-624x376.jpg 624w\" sizes=\"(max-width: 985px) 100vw, 985px\" \/><\/p>\n<ul>\n<li><strong>Data Operation:<\/strong><\/li>\n<\/ul>\n<p style=\"padding-left: 30px\"><strong>Get:<\/strong> As the picture states each cluster node access the data from its own memory, i.e \u00a0Local read<\/p>\n<p style=\"padding-left: 30px\"><strong>Put:<\/strong> Pushing the version of data to all cluster node<\/p>\n<ul>\n<li><strong>Pros:<\/strong>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Best for read operation<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Fault-tolerant<\/span><\/li>\n<\/ul>\n<\/li>\n<li style=\"font-weight: 400\"><strong>Cons:<\/strong>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Poor write performance<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Additional network load<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Memory consumption<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><span id=\"distributed_cache\" style=\"font-weight: 400\">4) <strong>Distributed Cache:<\/strong> It partitions its data among all cluster nodes. The data is being sent to a primary cluster node and a backup cluster node if the backup count is 1.\u00a0<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><strong>Data Operation:<\/strong>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\"><strong>Get:<\/strong> Access goes over the network to another cluster node<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\"><strong>Put:<\/strong> Pushing the version of data to multiple cluster nodes<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\"><strong>Failover:<\/strong> Involves promoting backup data to be primary storage<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-43105 size-full\" src=\"\/blog\/wp-ttn-blog\/uploads\/2016\/11\/distributed1.jpg\" alt=\"Distributed Cache\" width=\"985\" height=\"595\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2016\/11\/distributed1.jpg 985w, \/blog\/wp-ttn-blog\/uploads\/2016\/11\/distributed1-300x181.jpg 300w, \/blog\/wp-ttn-blog\/uploads\/2016\/11\/distributed1-624x376.jpg 624w\" sizes=\"(max-width: 985px) 100vw, 985px\" \/><\/p>\n<ul>\n<li style=\"font-weight: 400\"><strong>Pros:<\/strong>\n<ul>\n<li style=\"font-weight: 400\">As the above picture states\u00a0distributed cache can be configured in a way where data is not fully replicated through out the cluster, thus write performance can be enhanced.<\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Linear performance scalability for reads and writes<\/span><\/li>\n<\/ul>\n<\/li>\n<li style=\"font-weight: 400\"><strong>Cons:<\/strong>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Increased latency of reads (due to network round-trip and serialization \/ deserialization expenses)<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><strong><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-43108 size-full\" src=\"\/blog\/wp-ttn-blog\/uploads\/2016\/11\/distributor-3.jpg\" alt=\"distributor 3\" width=\"985\" height=\"595\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2016\/11\/distributor-3.jpg 985w, \/blog\/wp-ttn-blog\/uploads\/2016\/11\/distributor-3-300x181.jpg 300w, \/blog\/wp-ttn-blog\/uploads\/2016\/11\/distributor-3-624x376.jpg 624w\" sizes=\"(max-width: 985px) 100vw, 985px\" \/><\/strong><\/p>\n<p><strong>Fault-tolerant: \u00a0<\/strong>As the above picture states in both replicated and distributed cache on the absence of node 2 data can be served from the other node.<\/p>\n<p><span style=\"font-weight: 400\">We can conclude that an ideal cache would combine TTL and write-through features with distributed cluster mode with data consistency and provide high read-write performance.<\/span><\/p>\n<h3><strong>Hazelcast Caching<\/strong><\/h3>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-42656 size-medium\" src=\"\/blog\/wp-ttn-blog\/uploads\/2016\/11\/HazelcastLogo-Blue_Dark_1200px-300x59.png\" alt=\"Hazelcast\" width=\"300\" height=\"59\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2016\/11\/HazelcastLogo-Blue_Dark_1200px-300x59.png 300w, \/blog\/wp-ttn-blog\/uploads\/2016\/11\/HazelcastLogo-Blue_Dark_1200px-1024x202.png 1024w, \/blog\/wp-ttn-blog\/uploads\/2016\/11\/HazelcastLogo-Blue_Dark_1200px-624x123.png 624w, \/blog\/wp-ttn-blog\/uploads\/2016\/11\/HazelcastLogo-Blue_Dark_1200px.png 1200w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/p>\n<p><span style=\"font-weight: 400\">Hazelcast is a brand new approach to data, and it is designed around the concept of distribution. Also, Hazelcast shares data around the cluster for flexibility and performance. It is an in-memory data grid for clustering and highly scalable data distribution.<\/span><\/p>\n<p><span style=\"font-weight: 400\">One of the main features of Hazelcast is not having a master node. Each node in the cluster is configured to be the same in terms of functionality by sharing the metadata called <strong>partition table<\/strong> of the cluster. It consists of several information like members detail, cluster health, backup information, re-partitioning, etc. The first node created in the node cluster manages the cluster members, i.e. automatically performs the data assignment to nodes. If the oldest node dies, the second oldest node will manage the cluster members.<\/span><\/p>\n<p><span style=\"font-weight: 400\">All clients are by default <\/span><b>smart client<\/b><span style=\"font-weight: 400\">, i.e they also have the metadata about the cluster but with restrictive information. So, a client can directly connect to primary data holding member to\u00a0reduce the network lagging.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Another main feature of Hazelcast is the data being held entirely in-memory. In the case of a failure, such as a node crash, no data will be lost since Hazelcast distributes copies of data across all the nodes of the cluster.<\/span><\/p>\n<p><strong><span style=\"font-weight: 400\">Hope this will help to understand the importance of caching, in-memory-data-grid and distributed caching\u00a0in the modern application.<\/span><\/strong><\/p>\n<p>If you are keen to know How to\u00a0Integrate Hazelcast with Grails, read through this blog &#8211;\u00a0<a title=\"Getting started with Hazelcast using Grails in 10 minutes\" href=\"http:\/\/www.tothenew.com\/blog\/hazelcast-integration-with-grails\/\" target=\"_blank\">Getting started with Hazelcast using Grails in 10 minutes<\/a><\/p>\n<p><span style=\"font-size: 1rem\">Stay tuned for more on:-<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><a href=\"http:\/\/www.tothenew.com\/blog\/sharing-the-load-cache-clustering-with-hazelcast\/\">Sharing the Load: Cache Clustering with Hazelcast<\/a><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Hazelcast as secondary level cache<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Hazelcast as Spring Data cache<\/span><\/li>\n<\/ul>\n<p>I hope you enjoyed this blog and it was helpful. Here is another interesting blog on <a title=\"Enhancing throughput using Hazelcast\" href=\"http:\/\/www.tothenew.com\/blog\/hazelcast-object-deserialization-with-distributed-query\/#sthash.WeKvuXpv.dpuf\" target=\"_blank\">Enhancing throughput of Java apps performance by optimization of object deserialization and distributed query<\/a>.<\/p>\n<p>Thanks for reading, see you next time.<\/p>\n<p><span style=\"font-weight: 400\">Here\u2019s a quick reference to &#8211; <\/span><\/p>\n<p><a title=\"Distributed Caching with Hazelcast\" href=\"https:\/\/drive.google.com\/file\/d\/0B4LHpTbNYCypanE5MlJUOTIwaTQ\/view?usp=sharing\" target=\"_blank\">Distributed Caching with Hazelcast<\/a><\/p>\n<p><a title=\"Grails Plugin Contributions by experts @ TO THE NEW\" href=\"http:\/\/www.tothenew.com\/blog\/grails-plugin-contributions-by-experts-to-the-new\/\" target=\"_blank\">Grails Plugin Contributions by experts @ TO THE NEW<\/a><\/p>\n<p><strong>For more reference:<\/strong><\/p>\n<p><a title=\"Clients &amp; Languages Compatibility Matrix\" href=\"https:\/\/hazelcast.org\/clients-languages\/compatibility-matrix\/\">Clients &amp; Languages Compatibility Matrix<\/a><\/p>\n<p><a title=\"Redis 3.0.7 vs Hazelcast 3.6 Benchmark\" href=\"https:\/\/hazelcast.com\/resources\/benchmark-redis-vs-hazelcast\/\">Redis-vs-hazelcast | benchmark<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In modern user facing and real-time applications, performance is the top concern with usually having data at its core. What if you were able to\u00a0offload some work from the database and at the same time increase the performance and response times of your application? Most of the time data handling is done by relational databases [&hellip;]<\/p>\n","protected":false},"author":349,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":38},"categories":[446,1],"tags":[4238,118,1137,4235,4154,4242,2659,4237,4236,4243,4241,4239,4240],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/42651"}],"collection":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/users\/349"}],"replies":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/comments?post=42651"}],"version-history":[{"count":0,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/42651\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/media?parent=42651"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/categories?post=42651"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/tags?post=42651"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}