{"id":36530,"date":"2016-06-29T03:58:21","date_gmt":"2016-06-28T22:28:21","guid":{"rendered":"http:\/\/www.tothenew.com\/blog\/?p=36530"},"modified":"2016-06-29T14:42:11","modified_gmt":"2016-06-29T09:12:11","slug":"mystery-behind-s3-costing","status":"publish","type":"post","link":"https:\/\/www.tothenew.com\/blog\/mystery-behind-s3-costing\/","title":{"rendered":"Mystery Behind S3 Costing"},"content":{"rendered":"<p>AWS S3 is a <strong>Simple Storage Service<\/strong> provided by Amazon that can store any amount of data, at any time, from anywhere on the web. It is one of the most heavily used <a href=\"http:\/\/www.tothenew.com\/devops-aws\">AWS Service<\/a>. It is not just used as a storage service, it is also used for hosting websites with static content. It is easily integrated with many other AWS services and tools.<\/p>\n<p><span style=\"font-weight: 400;\">When it comes to S3 Pricing, there are basically three factors that are used to determine the total cost of using S3:<\/span><\/p>\n<ol>\n<li>The amount of storage.<\/li>\n<li>The amount of data transferred every month.<\/li>\n<li>The number of requests made monthly.<\/li>\n<\/ol>\n<p><strong> <span style=\"font-weight: 400;\">In most cases, only the storage amount and data transferred make much of a difference in cost. However, in the\u00a0case\u00a0of data transfer that occurs between S3 and AWS resources within the same region, the data transfer costs is zero.\u00a0<\/span><\/strong><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone  wp-image-36675\" src=\"\/blog\/wp-ttn-blog\/uploads\/2016\/06\/final1.png\" alt=\"final1\" width=\"309\" height=\"140\" \/>\u00a0\u00a0\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0<img decoding=\"async\" loading=\"lazy\" class=\"alignnone  wp-image-36999\" src=\"\/blog\/wp-ttn-blog\/uploads\/2016\/06\/images1.jpg\" alt=\"images\" width=\"196\" height=\"139\" \/><\/p>\n<h2><b>Use-Case<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">While analyzing S3 costs I have always wondered how these costs are actually calculated and if there is a way one could actually match these costings with the billing. The blog shows how to analyze S3 bucket logs to find out the following details about the buckets hosted in S3:<\/span><\/p>\n<ul>\n<li>Total number of each type of requests received by each bucket.<\/li>\n<li>Total size on each bucket.<\/li>\n<li>Total data transfer from S3.<\/li>\n<\/ul>\n<h2><strong> Requirements:<\/strong><\/h2>\n<ul>\n<li>S3 bucket logging should be enabled on each bucket.<\/li>\n<li>Python ( boto module )<\/li>\n<li>AWS Account ( Access Key Id, Secret Key Id)<\/li>\n<li>s3cmd<\/li>\n<\/ul>\n<h2><strong> Steps<\/strong><\/h2>\n<p><strong>1. Enable bucket-logging on each bucket.<\/strong><\/p>\n<p>Log into <strong>AWS console<\/strong>, and go to <strong>S3 service<\/strong>. Select the <strong>bucket<\/strong>\u00a0and under <strong>properties<\/strong>, select <strong>logging<\/strong>. Select <strong>enabled<\/strong> in logging and provide a name\u00a0for the\u00a0<strong>target bucket.\u00a0<\/strong>The target bucket stores the logs for each bucket. Each bucket has a separate\u00a0folder where its logs are stored.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-36556\" src=\"\/blog\/wp-ttn-blog\/uploads\/2016\/06\/bucket_logging_enable2.png\" alt=\"bucket_logging_enable2\" width=\"655\" height=\"237\" \/><\/p>\n<p><strong>2.\u00a0<\/strong><strong>Download and merge the logs<\/strong><\/p>\n<p>Install <strong>s3cmd<\/strong> using command <strong>apt-get install s3cmd,\u00a0<\/strong>and configure it by providing the AWS Access Key and AWS Secret Key<strong>\u00a0<\/strong>in the <strong>.s3cfg<\/strong> file created in the home directory of the user.<\/p>\n<p>Following command is used to download logs from the log-bucket:<br \/>\n<strong>s3cmd sync &#8211;delete-removed &#8211;preserve s3:\/\/bucket.logging.access\/ &lt;Path\/to\/store\/logs&gt; &#8211;exclude &#8220;*&#8221; &#8211;include &#8220;test-bucket2016-06*&#8221;<\/strong><\/p>\n<ul>\n<li><strong>&#8220;test-bucket2016-06*&#8221;<\/strong> is a string that filters logs for the bucket, named <strong>&#8220;test-bucket&#8221;<\/strong>, for the month of <strong>JUNE<\/strong> in year <strong>2016.<\/strong><\/li>\n<li>filters like <strong>&#8211;include<\/strong> and <strong>&#8211;exclude<\/strong> can be used to download specific logs from logging-bucket.<\/li>\n<li><strong>&#8211;preserve<\/strong> is a switch that prevents the logs from being overwritten, so newly created logs are only added to the log folder<\/li>\n<\/ul>\n<p>Concatenate logs of all buckets into one file. So that all our further operations can be performed on one specific file. This concatenation can be carried out using the <strong>&#8220;cat&#8221;<\/strong> command.<\/p>\n<p><strong>cat \u00a0\/path\/to\/log1 \u00a0\/path\/to\/log2 \u00a0\/path\/to\/log3 \u00a0&gt; \u00a0\/path\/to\/final_log_file<\/strong><\/p>\n<p><strong>3.\u00a0<\/strong><strong>Format of the logs<\/strong><\/p>\n<p>The logs for <strong>PUT<\/strong> request are shown in this format:<img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-36693\" src=\"\/blog\/wp-ttn-blog\/uploads\/2016\/06\/PUT1.png\" alt=\"PUT1\" width=\"1299\" height=\"73\" \/><\/p>\n<p>The logs for <strong>LIST<\/strong> request are shown in this format:<img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-36694\" src=\"\/blog\/wp-ttn-blog\/uploads\/2016\/06\/LIST1.png\" alt=\"LIST1\" width=\"1300\" height=\"74\" \/><strong>**<\/strong>Important point to note is that a <strong>LIST<\/strong> request is logged as <strong>REST.GET.BUCKET<\/strong> in logs<\/p>\n<p>The logs for <strong>COPY<\/strong> request are shown in this format:<img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-36695\" src=\"\/blog\/wp-ttn-blog\/uploads\/2016\/06\/COPY1.png\" alt=\"COPY1\" width=\"1299\" height=\"99\" \/><\/p>\n<p>The logs for <strong>GET<\/strong>\u00a0request are shown in this format:<img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-36696\" src=\"\/blog\/wp-ttn-blog\/uploads\/2016\/06\/GET1.png\" alt=\"GET1\" width=\"1300\" height=\"75\" \/><\/p>\n<p>The logs for <strong>HEAD<\/strong> request are shown in this format:<img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-36697\" src=\"\/blog\/wp-ttn-blog\/uploads\/2016\/06\/HEAD1.png\" alt=\"HEAD1\" width=\"1300\" height=\"76\" \/><\/p>\n<p><strong>4. Calculating the number of requests<\/strong><\/p>\n<p>A simple <strong>grep<\/strong> function with <strong>wc -l <\/strong>filter is used to count the number of each type of request in the logs. Following commands are used to count each type of request:<\/p>\n<ul>\n<li><strong>PUT<\/strong>: grep REST.PUT &lt;\/path\/to\/final_log_file&gt; | wc -l<\/li>\n<li><strong>POST<\/strong>: grep REST.POST &lt;\/path\/to\/final_log_file&gt; | wc -l<\/li>\n<li><strong>LIST<\/strong>: grep REST.GET.BUCKET &lt;\/path\/to\/final_log_file&gt; | wc -l<\/li>\n<li><strong>COPY<\/strong>:\u00a0grep REST.COPY &lt;\/path\/to\/final_log_file&gt; | wc -l<\/li>\n<li><strong>GET<\/strong>:\u00a0grep REST.GET &lt;\/path\/to\/final_log_file&gt; | wc -l<\/li>\n<li><strong>HEAD<\/strong>:\u00a0grep REST.HEAD &lt;\/path\/to\/final_log_file&gt; | wc -l<\/li>\n<\/ul>\n<p>As it happened in my case, calculating number of <strong>GET<\/strong> and <strong>LIST<\/strong> requests with the following methods\u00a0<strong>(wrong method)<\/strong>:<\/p>\n<ul>\n<li><strong>GET<\/strong>:\u00a0grep GET &lt;\/path\/to\/final_log_file&gt; | wc -l<\/li>\n<li><strong>LIST<\/strong>:\u00a0grep LIST &lt;\/path\/to\/final_log_file&gt; | wc -l<\/li>\n<\/ul>\n<p>gave me results that were not even close to the number of requests shown in my billing, so it is <strong>important<\/strong> to know that, in order to calculate the number of <strong>GET<\/strong> requests we need to subtract, the number of <strong>LIST<\/strong> requests from the number of <strong>GET<\/strong> requests that we calculated. As the <strong>LIST<\/strong> request is logged as a type of <strong>GET<\/strong> request only. A <strong>LIST<\/strong> request is logged as <strong>REST.GET.BUCKET<\/strong> whereas a <strong>GET<\/strong> request is logged as a <strong>REST.GET <\/strong>in the logs.<\/p>\n<p>Following is the snippet, of the code I\u00a0used to calculate requests\u00a0each bucket:<\/p>\n<pre>region_bucks=boto.s3.connect_to_region('ap-southeast-1')\r\nlist_bucket_bucks=region_bucks.get_all_buckets()\r\nfor a_bucks in list_bucket_bucks:\r\nbucket_name_bucks=str(a_bucks)[9:-1]\r\n\r\nprint 'For Bucket : %s ' %(bucket_name_bucks)\r\n\r\npost_bucks=subprocess.Popen(\"grep -r REST.POST \/home\/mayank\/Desktop\/s3_costing\/s3_logs\/access.log | grep \"+bucket_name_bucks+\" | wc -l\",stdout=subprocess.PIPE,shell=True)\r\npost_bucks_f,e=post_bucks.communicate()\r\nprint 'Number of POST request on %s : %s' %(bucket_name_bucks, post_bucks_f)\r\n\r\nput_bucks=subprocess.Popen(\"grep -r REST.PUT \/home\/mayank\/Desktop\/s3_costing\/s3_logs\/access.log | grep \"+bucket_name_bucks+\" | wc -l\",stdout=subprocess.PIPE,shell=True)\r\nput_bucks_f,e=put_bucks.communicate()\r\nprint 'Number of PUT request on %s : %s' %(bucket_name_bucks, put_bucks_f)\r\n\r\nlist_bucks=subprocess.Popen(\"grep -r REST.GET.BUCKET \/home\/mayank\/Desktop\/s3_costing\/s3_logs\/access.log | grep \"+bucket_name_bucks+\" | wc -l\",stdout=subprocess.PIPE,shell=True)\r\nlist_bucks_f,e=list_bucks.communicate()\r\nprint 'Number of LIST request on %s : %s' %(bucket_name_bucks, list_bucks_f)\r\n\r\ncopy_bucks=subprocess.Popen(\"grep -r REST.COPY \/home\/mayank\/Desktop\/s3_costing\/s3_logs\/access.log | grep \"+bucket_name_bucks+\" | wc -l\",stdout=subprocess.PIPE,shell=True)\r\ncopy_bucks_f,e=copy_bucks.communicate()\r\nprint 'Number of COPY request on %s : %s' %(bucket_name_bucks, copy_bucks_f)\r\n\r\nget_bucks=subprocess.Popen(\"grep -r REST.GET \/home\/mayank\/Desktop\/s3_costing\/s3_logs\/access.log | grep \"+bucket_name_bucks+\" | wc -l\",stdout=subprocess.PIPE,shell=True)\r\nget_bucks_f,e=get_bucks.communicate()\r\nget_only=float(get_bucks_f) - float(list_bucks_f)\r\nprint 'Number of GET request on %s : %s' %(bucket_name_bucks, get_only)\r\n\r\nhead_bucks=subprocess.Popen(\"grep -r REST.HEAD \/home\/mayank\/Desktop\/s3_costing\/s3_logs\/access.log | grep \"+bucket_name_bucks+\" | wc -l\",stdout=subprocess.PIPE,shell=True)\r\nhead_bucks_f,e=head_bucks.communicate()\r\nprint 'Number of HEAD request on %s : %s' %(bucket_name_bucks, head_bucks_f)<\/pre>\n<p><strong>5. Calculating storage on each bucket<\/strong><\/p>\n<p><span style=\"font-weight: 400;\">When we calculate the size of the bucket it is not always equal to the size it shows in the AWS billing, that is because the size is calculated by the data that AWS has charged you for, so far. Following is the formulae to calculate the total size shown on the present date:<br \/>\n<\/span><b>(Total Size in GB \/ Total number of days in the month) * (Day of the month)<br \/>\n<\/b><span style=\"font-weight: 400;\">i.e In my case the storage was calculated to be equal to <\/span><b>5 GB <\/b><span style=\"font-weight: 400;\">on the <\/span><b>24th<\/b><span style=\"font-weight: 400;\"> of the month and in the AWS billing it showed total size equal to <\/span><b>3.950 GB<\/b><span style=\"font-weight: 400;\">, so according to the formulae,<br \/>\n<\/span><b>(5\/30) * 24 = 4 GB<\/b><span style=\"font-weight: 400;\">, this gives me an error rate of <\/span><b>0.0125%.<\/b><\/p>\n<p>Following is the snippet of code i used to calculate, current storage size:<\/p>\n<pre>import boto\r\nfrom boto.s3.connection import Location\r\n\r\ndef sizeof_fmt(num):\r\n   for x in ['bytes','KB','MB','GB','TB']:\r\n       if num &lt; 1024.0:\r\n           return \"%3.1f %s\" % (num, x)\r\n       num \/= 1024.0\r\n\r\nregion=boto.s3.connect_to_region('ap-southeast-1')\r\nlist_bucket=region.get_all_buckets()\r\narray=[]\r\nt_size=0\r\nfor a in list_bucket:\r\n        bucket_name=str(a)[9:-1]\r\n        s3 = boto.connect_s3()\r\n        bucket = s3.lookup(bucket_name)\r\n        total_bytes = 0\r\n        print bucket_name\r\n        for key in bucket:\r\n                total_bytes += key.size\r\n        t_size=t_size+total_bytes\r\n        print sizeof_fmt(total_bytes)\r\ntotal_size=sizeof_fmt(t_size)\r\nTOTAL_SIZE=total_size[:-3]\r\n\r\nimport datetime\r\ndatetime=datetime.datetime.now()\r\nday=datetime.day\r\n\r\nprint '\\n'\r\n\r\nsize=(float(TOTAL_SIZE)\/30) * float(day)\r\n\r\nprint 'Chargeable size as on current date : %s' %(size)\r\n\r\nprint '\\n'\r\n\r\nprint 'Storage cost as on current date is : %s' %(float(size)*0.0295)<\/pre>\n<p><strong>6.\u00a0<\/strong><strong>Calculate data transferred from s3<\/strong><\/p>\n<p>AWS charges for<strong> data transfer<\/strong> outside the region. It can be calculated using the logs, as the size of data transferred in each request is present in the log. Following is the snippet of a normal HEAD request, with data transfer size marked in it.<img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-36690\" src=\"\/blog\/wp-ttn-blog\/uploads\/2016\/06\/HEAD.png\" alt=\"HEAD\" width=\"1300\" height=\"75\" \/><\/p>\n<p>The above shown, request\u00a0initiated a data transfer of <strong>293 bytes<\/strong>. The data size is mentioned in the <strong>15th column<\/strong> of the log. The total data size can be\u00a0calculated using the following code:<\/p>\n<pre>datatransfer=subprocess.Popen(\"awk '{print $15}' \/home\/mayank\/Desktop\/s3_costing\/s3_logs\/access.log | awk '{ sum += $1} END {print sum\/1024\/1024\/1024}'\",stdout=subprocess.PIPE,shell=True)\r\n\r\ndatatransfer_f,e=datatransfer.communicate()\r\n\r\nprint 'Total data transfer is : %s GB' %float(datatransfer_f)<\/pre>\n<p>Hope this blog has given you a proper understanding,\u00a0on how to analyze the S3 bucket logs to get detailed analyses of your buckets hosted in AWS S3.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>AWS S3 is a Simple Storage Service provided by Amazon that can store any amount of data, at any time, from anywhere on the web. It is one of the most heavily used AWS Service. It is not just used as a storage service, it is also used for hosting websites with static content. It [&hellip;]<\/p>\n","protected":false},"author":916,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":9},"categories":[1174,2348,1],"tags":[2160,1332,3667,1892,3665,3666,3668,3664,3669],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/36530"}],"collection":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/users\/916"}],"replies":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/comments?post=36530"}],"version-history":[{"count":0,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/36530\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/media?parent=36530"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/categories?post=36530"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/tags?post=36530"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}