Tweaking Logstash’s S3 plugin to create folders in YYYY/MM/DD format on AWS S3

25 / May / 2016 by Navjot Singh 3 comments

Logstash is a service that accepts logs from a variety of systems, processes it and allows us to index it in Elasticsearch etc which can be visualised using Kibana.

Our DevOps engineers have been using Logstash S3 plugin which simply puts all data in a S3 bucket location. Since we have configured files to be created in every hour on S3, the number of files in the S3 location touched thousand in just one and a half month. We decided to store the files in YYYY/MM/DD folder structure. For example, files created on date 2016-06-16 would go inside s3://location/2016/06/16. These YYYY, MM, and DD folder would get created dynamically.

logstash-logo.png (501×190)

Scenario: Store data received by Logstash on S3 in folder structure YYYY/MM/DD such the each day’s data would go into its respective folder.

Logstash S3 plugin does not provide the functionality to pass a variable. To achieve our objective, we decided to tweak the ruby code of S3 plugin. You can do that by following the below steps:

  1. Go to your Logstash’s home directory.
  2. Open vendor/bundle/jruby/1.9/gems/logstash-output-s3-1.0.2/lib/logstash/outputs/s3.rb file.
  3. Search for string “remote_filename”. We need to tweak the below line:
    remote_filename = "#{@prefix}#{File.basename(file)}"
  4. Create time object say “t” above this line and store the desired format i.e YYYY/MM/DD in a variable  say “date_s3” as below:

    t = Time.new 
    date_s3 = t.strftime("%Y/%m/%d/")
  5. Append this variable in the line specified in point 3 as below:
    remote_filename = “#{@prefix}#{date_s3}#{File.basename(file)}”
  6. The modified code would look like this:

    t = Time.new
    date_s3 = t.strftime(“%Y/%m/%d/”)
    remote_filename = “#{@prefix}#{date_s3}#{File.basename(file)}”
  7. Save the file and check configuration and restart Logstash.
  8. Logstash’s S3 plugin configuration in Logstash’s output would be:
    s3{
        access_key_id => “xxxxxxxxxxxxxxxxxxxxxxxxx”
    secret_access_key => “XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX”
    endpoint_region => “ap-southeast-1″
    bucket => “s3-bucket-name”
    prefix => “directory-inside-s3/”
    time_file => 60
    canned_acl => “private”
    }

    Note: Do not forget to add “/” in the prefix parameter.

    This configuration would create date-based directory structure inside “s3://s3-bucket-name/directory-inside-s3/“.

    Yes, it is done now. You can now browse the files easily on S3. Hope this blog was useful to you. I will be coming up with more such interesting use cases.

FOUND THIS USEFUL? SHARE IT

comments (3)

  1. bizzy

    This seems to work well, thank you for sharing. One thing though, upon folder creation I get a logstash-programmatic-access-test-object-########## in the folder. It appears that this is a default of the s3.rb? Do you know how to get it to delete the object after it successfully tests in the newly created day folder?

    Reply
    1. Magno

      Hello guy, I had this problem.

      You need to implements that approach on method “delete_on_bucket”.

      This article didn’t talk about that method, but on source code the permission test on s3 is to do “write_on_bucket” and after “delete_on_bucket”. If you don’t implement on “delete_on_bucket” the file teste isn’t deleted.

      But I think that problem was fixed.

      Thanks for article =)

      Reply
  2. Ranvijay Jamwal

    Nice Blog.
    The parameter endpoint_region => “ap-southeast-1″ is getting depriciated.
    One can use region => “regionname”
    Also, if you are using this for Mumbai region, go to the s3.rb file mentioned in the blog, search for “endpoint_region” and then add “ap-south-1” in the list of regions to make it work.

    Reply

Leave a Reply to Magno Cancel reply

Your email address will not be published. Required fields are marked *