{"id":45231,"date":"2017-01-24T19:42:13","date_gmt":"2017-01-24T14:12:13","guid":{"rendered":"http:\/\/www.tothenew.com\/blog\/?p=45231"},"modified":"2017-01-25T15:21:41","modified_gmt":"2017-01-25T09:51:41","slug":"introduction-to-amazon-athena","status":"publish","type":"post","link":"https:\/\/www.tothenew.com\/blog\/introduction-to-amazon-athena\/","title":{"rendered":"Introduction to Amazon Athena"},"content":{"rendered":"<p style=\"text-align: center;\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone  wp-image-45233\" src=\"\/blog\/wp-ttn-blog\/uploads\/2017\/01\/aws-athena.png\" alt=\"aws-athena\" width=\"412\" height=\"192\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2017\/01\/aws-athena.png 774w, \/blog\/wp-ttn-blog\/uploads\/2017\/01\/aws-athena-300x139.png 300w, \/blog\/wp-ttn-blog\/uploads\/2017\/01\/aws-athena-624x290.png 624w\" sizes=\"(max-width: 412px) 100vw, 412px\" \/><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\"><strong>What is Amazon Athena?<br \/>\n<\/strong>Amazon Athena is an analytics and interactive query service that use interactive standard SQL to analyze data stored on Simple Storage Service (S3). It is a serverless service i.e.\u00a0there is no need to setup the instances for the datastore and manage the infrastructure. Also, there is no need to load data to Athena or run complex ETL processes. All you need to do is to point the data from the <a title=\"AWS DevOps\" href=\"http:\/\/www.tothenew.com\/devops-aws\">application to AWS S3<\/a>. You can also simply store the logs on S3 to analyze with\u00a0<\/span><span style=\"color: #000000;\">them with SQL queries.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\"><strong>What Kind of Data Amazon Athena Analyzes?<br \/>\n<\/strong><span style=\"font-weight: 400;\">Amazon Athena is capable of analyzing and processing the <\/span><span style=\"font-weight: 400;\">structured data sets in the form of tables, semi-structured and unstructured data in any format. There are various formats like CSV, JSON,\u00a0<\/span>columnar data formats such as Apache Parquet and Apache <\/span>ORC.\u00a0<span style=\"color: #000000;\">You can also use Amazon Athena to generate reports or to explore data with business intelligence tools or SQL clients, connected via a JDBC driver.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\"><strong>Back-End Structure of Amazon Athena<\/strong><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Amazon Athena uses a distributed SQL engine to run the queries called Presto. It uses <span style=\"font-weight: 400;\">Apache Hive to store the structured data. Apache Hive is a data warehouse tool that creates, drops and alters tables and partitions in the\u00a0<\/span><span style=\"font-weight: 400;\">datasets.<\/span><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\"><span style=\"font-weight: 400;\">Amazon Athena provides its query editor that help developers to create a query on the data, which is written\u00a0on Apache Hive. These queries are compliant DDL <\/span><span style=\"font-weight: 400;\">Create Table statements or DDL drafted in Apache Hive<\/span><span style=\"font-weight: 400;\">, which facilitates reading, writing, and managing large and distributed data sets. Apache Hive supports various SQL functions and provides data partitioning similar to the concept of external tables. Athena metadata store is a repository that stores metadata such as\u00a0column names and table definitions. It also supports various window functions, complex joins, and nested queries; and uses <\/span><\/span>an approach known as schema-on-read, which allows developers to project their schema on the data at the same time when the query is executed.<\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\">Athena use\u00a0<\/span>compute<span style=\"color: #000000;\"> resources or pools from multiple AZs (availability zones) <\/span><span style=\"color: #000000;\">to accelerate\u00a0<\/span><span style=\"color: #000000;\">the performance of a query. <\/span>Also,\u00a0<span style=\"color: #000000;\">it allows developers\u00a0to run queries\u00a0parallelly on massive data size which may be in Terabytes or Petabytes.)<\/span><\/p>\n<p><span style=\"color: #000000;\"><strong>Pricing of Amazon Athena<br \/>\n<\/strong>Amazon Athena is a pay-as-you-go service,\u00a0which is charged based on the number of queries executed. Since\u00a0<\/span><span style=\"color: #000000;\">the data is stored on AWS S3, the charges are 5 per TB of scanned data from Amazon S3. Athena does not charge anything <\/span><span style=\"color: #000000;\">on the failed queries. DDL Statements like CREATE, ALTER, DROP and partitioning queries are totally free. If you cancel a query, you will be charged only for the scanned data up to that point. Of course, you can reduce costs by using columnar formats, compression, and partitions. With all such techniques, Athena scans\u00a0<\/span>fewer<span style=\"color: #000000;\"> data from Amazon S3.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #000000;\"><strong><span style=\"font-weight: 400;\">It is simple to calculate the charges for AWS Athena as it is based on the amount of data that needs to be analyzed. It improves the performance of the query and helps organizations to save cost by converting the data to the columnar formats by using open-source tools such as Apache Parquet and Apache ORC.\u00a0<\/span><\/strong><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>What is Amazon Athena? Amazon Athena is an analytics and interactive query service that use interactive standard SQL to analyze data stored on Simple Storage Service (S3). It is a serverless service i.e.\u00a0there is no need to setup the instances for the datastore and manage the infrastructure. Also, there is no need to load data [&hellip;]<\/p>\n","protected":false},"author":969,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":5},"categories":[1174,1395,4308,2348,1],"tags":[248,1552,670],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/45231"}],"collection":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/users\/969"}],"replies":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/comments?post=45231"}],"version-history":[{"count":0,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/45231\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/media?parent=45231"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/categories?post=45231"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/tags?post=45231"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}