Most of us would be using some tools for centralizing logs, their analysis and storage. Logentries is also the one falling into this category.
Logentries provides a straightforward way for analysis of logs containing KVPs i.e. Key-Value pairs. But for the cases where KVPs are not present it becomes quite hectic to analyse the logs.
To handle such cases, logentries provides a RegEx named capture group which we can use to create dynamic KVP as per our requirement. We would be understanding and using it for our scenario.
Scenario: Count the occurrences of identical URLs which have thrown 404 error in nginx access log.
Let’s start by taking the following log-entry from my logentries’ account. I have changed the URL randomly. All nginx access logs would be coming in this format only as this format will not change until I change it. We would be creating our KVP on basis on this format.
2015-05-27T08:34:01.557805Z web-server1 nginx-access-log - - - hostname=web-server1 appname=nginx-access-log 0.350 18.104.22.168 - - [27/May/2015:14:04:01 +0530] "GET /undefined/uekf=12ed1 HTTP/1.1" 404 14657 "http://www.mydomain.com/?abc=xyz"
Now we are going to create RegEx named capture group which is nothing but creating KVP dynamically.
Syntax of RegEx named capture group is as below.
/some anchor text to finds key location in log (?P<KEY>regEx_to_find_the_Value)/
Here whatever the “regEx_to_find_the_Value” would return is assigned to “KEY”.
“regEx_to_find_the_Value” should evaluate to url which is in this particular case is “/undefined/uekf=12ed1” and this could be any url.
In the log, we have url as “/undefined/uekf=12ed1”, ‘”GET ‘ which is present just before url can be taken as anchor text. We can use any string as as key, let it be “URL”. Now we are required to define RegEx for the urls. Now urls can contains words, numbers, spaces so we have to define regex accordingly.
So, the regex would be “[\/\w\d\s.]+”.
where “\” is escape character for “/”
\w denotes word,
\s denotes space,
\d denotes digits,
. denotes any character other than newline.
The [ ] brackets mean anything inside the square brackets will be checked in the regEx.
“+” denotes the regEx can occur one or more times.
We have created the following regEx till now.
Now we are ready with dynamic KVP.
As per our scenario, we want all urls throwing 404 error so we first need to search 404. Then we will apply the RegEx named capture group on this result. We need to group the same urls and calculate the occurrence of same urls.SO the final expression will come out to be as follows.
404 AND /"GET (?P<URL>[\/\w\d\s.]+)/ groupby(URL) calculate(count)
Here we are using “AND”, groupby and calculate functions provided by logentries.
Below is the graphical view which I get on searching with above expression in logentries.
Team AWS, TO THE NEW Digital.