{"id":57000,"date":"2023-03-30T08:44:59","date_gmt":"2023-03-30T03:14:59","guid":{"rendered":"https:\/\/www.tothenew.com\/blog\/?p=57000"},"modified":"2023-09-08T16:57:29","modified_gmt":"2023-09-08T11:27:29","slug":"optimal-use-from-snowflake-warehouse-and-tables","status":"publish","type":"post","link":"https:\/\/www.tothenew.com\/blog\/optimal-use-from-snowflake-warehouse-and-tables\/","title":{"rendered":"Optimal use from Snowflake Warehouse and Tables"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">In the previous blog, we discussed the <a href=\"https:\/\/www.google.com\/url?q=https:\/\/www.tothenew.com\/blog\/best-practices-and-hacks-for-data-loading-in-snowflake-from-stages\/&amp;sa=D&amp;source=docs&amp;ust=1680236121921679&amp;usg=AOvVaw1Vl4YzgkF9UCIp_ZJdWWYv\">Best practices to be followed while Data loading into Snowflake from Stages<\/a><\/span><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Continuing the snowflake blog series lets us understand how to use Snowflake Warehouse and Tables optimally.<\/span><\/p>\n<h2><b>Snowflakes Virtual Warehouses<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">Virtual Warehouses is one of the critical components in Snowflake architecture and deciding the correct configurations for the same can save a lot of snowflake credits.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Below are some of the best practices you need to consider while selecting the configurations for the warehouse.<\/span><\/p>\n<h2><b>How to decide which Warehouse size is optimal?<\/b><\/h2>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">For simple queries, it&#8217;s preferable to use x-small or small warehouses.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">For complex queries or if large datasets need to be scanned, use Larger sized warehouses.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Query performance improves linearly while increasing the warehouse size until optimal performance is reached. After that, there wouldn\u2019t be any significant difference in performance.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">To determine the optimal warehouse for complex queries, it is recommended to test with different warehouses and note the query execution time for each.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">If you are running a single query on the warehouse, it is best to choose a warehouse that runs for at least a minute to optimize the cost.<\/span><\/li>\n<\/ul>\n<h2><b>Auto Suspension of warehouse<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">Auto Suspend allows warehouses to suspend automatically when they are not in use hence saving cost. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Auto Suspend property can be disabled when:<\/span><\/p>\n<ul>\n<li><span style=\"font-weight: 400;\">If there are frequent, steady workloads on the Warehouse 24\/7.<\/span><\/li>\n<\/ul>\n<ul>\n<li><span style=\"font-weight: 400;\">If you need the virtual warehouse readily available all the time for faster query results from the cache.\u00a0<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">Note that disabling auto suspend can lead to heavy billing, and hence choose wisely. <\/span><span style=\"font-weight: 400;\">General practice is to keep the auto suspension enabled so that you only pay for the active time and not for the ideal time.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">By default Auto_Suspend time interval is 600 seconds. <\/span><span style=\"font-weight: 400;\">This cannot be optimal if, say, you run the queries once in 10 minutes and the query execution time is 1 minute.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In such cases, it&#8217;s always better to set the AUTO_SUSPEND time interval according to the needs.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The time interval for Auto Suspend can be decided on below factors:<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Time Interval between two subsequent queries executed in the warehouse.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Average query execution time.<\/span><\/li>\n<\/ol>\n<h2><b>Cost vs. Performance while setting the AUTO_SUSPEND limit<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">Let&#8217;s assume that there is a recurring pattern of executing similar queries every 75 seconds, with an average query execution time of 10 seconds, and the warehouse has been set to AUTO_SUSPEND after 60 seconds.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In such cases, below is what would happen.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 0th \u00a0 sec\u00a0 \u00a0 \u00a0 \u00a0\u2192 Query is initiated and starts the warehouse<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 10th sec\u00a0 \u00a0 \u00a0 \u00a0 \u2192 Query is executed successfully<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 70th sec\u00a0 \u00a0 \u00a0 \u00a0\u2192 Warehouse auto suspends<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 75th sec\u00a0 \u00a0 \u00a0 \u00a0\u2192 Query is initiated and starts the warehouse<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 85th sec\u00a0 \u00a0 \u00a0 \u00a0\u2192 Query is executed successfully<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 145th sec\u00a0 \u00a0 \u00a0\u2192 Warehouse auto suspends<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 150th sec\u00a0 \u00a0 \u00a0\u2192 Query is initiated and starts the warehouse<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0160th sec\u00a0 \u00a0 \u00a0 \u2192 Query is executed successfully<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0220th sec\u00a0 \u00a0 \u00a0 \u2192 Warehouse auto suspends<\/span><\/p>\n<p><span style=\"font-weight: 400;\">And so on\u2026<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Here if you notice the AUTO_SUSPEND of 60 sec is not benefiting us when we consider cost vs performance factor.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The total uptime of the warehouse in the above case is 210 sec. <\/span><span style=\"font-weight: 400;\">Total uptime, if AUTO_SUSPEND was disabled, would have been 220 sec.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">However, the benefit of disabling AUTO_SUSPEND in this given scenario would have been faster query processing time. <\/span><span style=\"font-weight: 400;\">Every time the warehouse is restarted, the data would be fetched from the Remote disk to the local cache, and then the query would undergo processing.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">But in the case of disabling AUTO_SUSPEND, since the queries were similar, it just had to process over the Local disk cache and would result in faster query performance. Maybe a couple of seconds instead of 10 seconds. <\/span><span style=\"font-weight: 400;\">And if the same query was reissued and if there was no data change, the output would have been in milliseconds directly from the Result cache.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">So always consider the trade-off between saving credits by suspending a warehouse versus maintaining the cache of data from previous queries to help with performance. <\/span><span style=\"font-weight: 400;\">Now that we understand the costs associated with warehouses in Snowflake, let&#8217;s take a look at how data storage in Snowflake affects the overall billing.<\/span><\/p>\n<h2><b>Data Storage costs over Snowflake internal tables<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">Data storage costs are frequently overlooked in Snowflake as they are considered inexpensive. However, it is crucial to carefully consider the type of tables to create in Snowflake, taking into account the costs associated with time travel, fail safe, data sharing, and cloning strategies.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">This understanding will aid in developing effective approaches for managing internal tables.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">If you come from an RDBMS background, you may assume that executing &#8220;create table&#8221; in Snowflake will create a normal table. However, this is not the case. Instead, it will create a table with time travel enabled, which can lead to increased costs if such a table is not needed. <\/span><span style=\"font-weight: 400;\">Any inserts, updates, and deletes on these tables are accounted for data storage, and in case of frequent DML operations, the size of tables with Time travel data can grow within no time.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Hence, if you have not decided on which kind of table you need to create, then always use\u00a0<\/span><\/p>\n<blockquote><p><span style=\"font-weight: 400;\">CREATE TRANSIENT TABLE instead of CREATE TABLE<\/span><\/p><\/blockquote>\n<p><span style=\"font-weight: 400;\">This has to be communicated to developers as the general habit is always to use Create Table.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">For a normal table, if it is Large in size along with High Churn, the costs can grow exponentially. <\/span><span style=\"font-weight: 400;\">Note all tables don&#8217;t need to have Time travel features hence, use CREATE TABLE wisely.<\/span><\/p>\n<p><strong>For example<\/strong><\/p>\n<p><span style=\"font-weight: 400;\">Suppose we have a table that is 200GB in size and receives frequent updates. This table is configured for time travel, specifically for 90 days, and it&#8217;s assumed that each record in the table undergoes an update at least 20 times during this period. After the 90-day period, the table will be moved to Fail Safe storage by Snowflake, where it will be stored for 7 days.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Hence, below will be the stats for storage for the table:<\/span><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-57002 size-full\" src=\"\/blog\/wp-ttn-blog\/uploads\/2023\/03\/table-for-snowflake-blog.png\" alt=\"\" width=\"678\" height=\"109\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2023\/03\/table-for-snowflake-blog.png 678w, \/blog\/wp-ttn-blog\/uploads\/2023\/03\/table-for-snowflake-blog-300x48.png 300w, \/blog\/wp-ttn-blog\/uploads\/2023\/03\/table-for-snowflake-blog-624x100.png 624w\" sizes=\"(max-width: 678px) 100vw, 678px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Although the table is of size 0.2TB, the cost incurred is 32.2 TB when Time travel is enabled.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Below is the case if the same table would have been a Transient Table with 0 days of Time travel:<\/span><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-57003 size-full\" src=\"\/blog\/wp-ttn-blog\/uploads\/2023\/03\/table-for-snowflake-blog2.png\" alt=\"\" width=\"256\" height=\"124\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Although you can enable time travel for up to 90 days, choose the number of days that suits your needs.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Eg: In production, if I know that if any data-related issues exist and can be figured out and fixed within 7 days, then I&#8217;d choose the Time travel days as 7 days. <\/span><span style=\"font-weight: 400;\">Even in the worst case, if the issue persists for more than 7 days, you can contact Snowflake support and get a copy of Fail Safe data.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">If you are going with the Transient table approach for Critical tables, then the Best practice is always to keep a backup at periodical intervals.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Although the Backup table also costs the same as the actual table, the Total cost of both combined would always be much less than the table with Time Travel.\u00a0<\/span><\/p>\n<h2><b>Use Cloning instead of CTAS\u00a0 in Snowflake<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">Cloning in Snowflake is a powerful concept and will save a lot of cost if used.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Use cases would be:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Create a copy of the table. This could be for any bug debugging.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Creating a backup copy of the existing table.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">When a table is cloned, the micro partitions are shared between the current main table and the cloned table at that particular time of cloning.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">CTAS queries, which are used in many databases, would duplicate the data, but in the case of cloning, the underlying data in the form of micro partitions will remain the same, hence saving the storage cost.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">If any DML\u2019s done on the main table and cloned table after cloning, the new micro partitions are not shared. <\/span><span style=\"font-weight: 400;\">Hence, the best practice is to clone the table wherever needed and not to use CTAS queries. <\/span><span style=\"font-weight: 400;\">Similarly, cloning can be done at Database and schema levels, also saving a lot of cost.<\/span><\/p>\n<h2><b>Data Sharing to share the data across accounts<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">Cloning an object is not possible across accounts, and we tend to opt for replication of objects between accounts.<\/span><\/p>\n<p><strong>Some use cases here could be:<\/strong><\/p>\n<p><span style=\"font-weight: 400;\">The production database is in Snowflake Account A1 under Org O. <\/span><span style=\"font-weight: 400;\">The dev database is in Snowflake Account A2 under the same Org O.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">You need to test the Dev data engineering pipeline with the same source tables as in the Production account. <\/span><span style=\"font-weight: 400;\">Now, since cloning across databases for source tables is not possible, in such cases, we can opt for data sharing between accounts.<\/span><\/p>\n<p><strong>Let&#8217;s see how this works:<\/strong><\/p>\n<p><span style=\"font-weight: 400;\">In the Production account, let&#8217;s assume that we have a database PAYER_PROD and a schema PAYER_ANALYTICS_MASTER inside which we have a source table AWSCUR_MAIN which needs to be shared with the Development account.<\/span><\/p>\n<p><strong>Then follow the below steps:<\/strong><\/p>\n<p><span style=\"font-weight: 400;\">In Production account:<\/span><\/p>\n<pre><i><span style=\"font-weight: 400;\">\u2013 Use AccountAdmin role<\/span><\/i>\r\n\r\n<i><span style=\"font-weight: 400;\">use role ACCOUNTADMIN;<\/span><\/i>\r\n\r\n<i><span style=\"font-weight: 400;\">\u2013 Creates a Share object<\/span><\/i>\r\n\r\n<i><span style=\"font-weight: 400;\">create share AWSCUR_MAIN_NON_PROD;<\/span><\/i>\r\n\r\n<i><span style=\"font-weight: 400;\">\u2013 Grants necessary privileges to the share<\/span><\/i>\r\n\r\n<i><span style=\"font-weight: 400;\">grant usage on database PAYER_PROD to share AWSCUR_MAIN_NON_PROD;<\/span><\/i>\r\n\r\n<i><span style=\"font-weight: 400;\">grant usage on schema PAYER_PROD.PAYER_ANALYTICS_MASTER to share AWSCUR_MAIN_NON_PROD;<\/span><\/i>\r\n\r\n<i><span style=\"font-weight: 400;\">grant select on table PAYER_PROD.PAYER_ANALYTICS_MASTER.AWSCUR_MAIN to share AWSCUR_MAIN_NON_PROD;<\/span><\/i>\r\n\r\n\r\n\r\n\r\n<i><span style=\"font-weight: 400;\">\u200b\u200b\u2013 Add accountid of the Development snowflake to the share<\/span><\/i>\r\n\r\n<i><span style=\"font-weight: 400;\">alter share AWSCUR_MAIN_NON_PROD add accounts=&lt;dev_account_id&gt;;<\/span><\/i><\/pre>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">In Development account:<\/span><\/p>\n<pre><i><span style=\"font-weight: 400;\">\u2013 Use AccountAdmin role<\/span><\/i>\r\n\r\n<i><span style=\"font-weight: 400;\">use role ACCOUNTADMIN;<\/span><\/i>\r\n\r\n<span style=\"font-weight: 400;\">\u2013 Create a database out of the shared object<\/span>\r\n\r\n<i><span style=\"font-weight: 400;\">CREATE DATABASE <\/span><\/i><span style=\"font-weight: 400;\">PAYER_PROD<\/span><i><span style=\"font-weight: 400;\"> FROM SHARE &lt;&lt;orgid&gt;&gt;.&lt;&lt;accountid&gt;&gt;.\"AWSCUR_MAIN_NON_PROD\";<\/span><\/i>\r\n\r\n<span style=\"font-weight: 400;\">\u2013 Grant the database to respective roles in development account<\/span>\r\n\r\n<span style=\"font-weight: 400;\">GRANT IMPORTED PRIVILEGES ON DATABASE PAYER_PROD TO ROLE \"&lt;&lt;rolename&gt;&gt;\";<\/span><\/pre>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">Since data is shared between accounts and not replicated, there are no additional storage costs associated with it. Only metadata is shared.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Any updates on Production data will be reflected in the Dev Snowflake database at no additional cost. <\/span><span style=\"font-weight: 400;\">Cost is only for querying the data in the Dev environment, which is per Warehouse usage.<\/span><\/p>\n<div class=\"ap-custom-wrapper\"><\/div><!--ap-custom-wrapper-->","protected":false},"excerpt":{"rendered":"<p>In the previous blog, we discussed the Best practices to be followed while Data loading into Snowflake from Stages. Continuing the snowflake blog series lets us understand how to use Snowflake Warehouse and Tables optimally. Snowflakes Virtual Warehouses Virtual Warehouses is one of the critical components in Snowflake architecture and deciding the correct configurations for [&hellip;]<\/p>\n","protected":false},"author":1552,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":33},"categories":[4685,1395,4308,4682,4831],"tags":[5156],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/57000"}],"collection":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/users\/1552"}],"replies":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/comments?post=57000"}],"version-history":[{"count":5,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/57000\/revisions"}],"predecessor-version":[{"id":58436,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/57000\/revisions\/58436"}],"wp:attachment":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/media?parent=57000"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/categories?post=57000"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/tags?post=57000"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}