Extracting information from log file


#1

Hi All,

I am a beginner in linux trying to get few information of a customer , from a big log file and displaying in a dashboard.The log file is of the below format.

log file name : logfile_date

[2018-06-02 09:06:22,407] hostname 1527944791770 AS:UP:ZS492266 http-nio-8080-exec-24 ERROR Get Price Failed! ZS492266 , AS , UP (AbstractLocalPricingServiceV1)
[2018-06-02 09:06:22,407] hostname 1527944791770 AS:UP:ZS492266 http-nio-8080-exec-24 ERROR Exception while search for products (CartService)
com.xx.service.pricing.xx.exception.InvalidCartRequestException: Error: Failed to calculate US Tax. Error Number: '1' Error Description: 'No record found in StateDetail for state AE' | Params: Address={
  "line1" : "Unit 8400",
  "line2" : "Box 41",
  "city" : "DPO",
  "stateCode" : "AE",
  "postalCode" : "09498",
  "countryCode" : "US"
} taxType=Sales

What i need to get is Date,Time,program[AS],var[UP],customerID[ZS492266],line1,line2,city,statecode,postalcode,countrycode .

This is a huge log file having many such records. How can this be achieved.Any advise is appreciated.


#2

Hi,

Welcome to LinuxConfig.org forums.

Looking at the code perhaps the best would be to use sed and/or awk command in combination with jq command to extract the included JSON data into CSV.

Given the information you have provided I assume that each customer record in your log file consists of 10 lines. This is important as we need create separate objects to differentiate customer records.

Below you can find one possible solution but some prior tweaking and data cleaning may be required as the information given by you lists only a single customer record. It would be great to have more data to see the next record for another customer. But In any case this should provide you with a good start.

In order to execute the below script you will first need to install jq ( lightweight and flexible command-line JSON processor ) command. Given that you use Debian or Ubuntu execute :

$ sudo apt install jq

I have extrapolated your data to include dummy data record for a second customer. The sample logfile.txt looks like this:

[2018-06-02 09:06:22,407] hostname 1527944791770 AS:UP:ZS492266 http-nio-8080-exec-24 ERROR Get Price Failed! ZS492266 , AS , UP (AbstractLocalPricingServiceV1)
[2018-06-02 09:06:22,407] hostname 1527944791770 AS:UP:ZS492266 http-nio-8080-exec-24 ERROR Exception while search for products (CartService)
com.xx.service.pricing.xx.exception.InvalidCartRequestException: Error: Failed to calculate US Tax. Error Number: '1' Error Description: 'No record found in StateDetail for state AE' | Params: Address={
  "line1" : "Unit 8400",
  "line2" : "Box 41",
  "city" : "DPO",
  "stateCode" : "AE",
  "postalCode" : "09498",
  "countryCode" : "US"
} taxType=Sales
[2018-06-02 09:06:22,407] hostname 1527944791772 AS:UP:ZS492267 http-nio-8080-exec-24 ERROR Get Price Failed! ZS492266 , AS , UP (AbstractLocalPricingServiceV1)
[2018-06-02 09:06:22,407] hostname 1527944791772 AS:UP:ZS492267 http-nio-8080-exec-24 ERROR Exception while search for products (CartService)
com.xx.service.pricing.xx.exception.InvalidCartRequestException: Error: Failed to calculate US Tax. Error Number: '1' Error Description: 'No record found in StateDetail for state AE' | Params: Address={
  "line1" : "Unit 6400",
  "line2" : "Box 46",
  "city" : "DPO",
  "stateCode" : "TX",
  "postalCode" : "07498",
  "countryCode" : "US"
} taxType=Sales

Now create a script named eg. extract_log.sh with the following content:

#!/bin/bash

exec 5< $1

while read line1 <&5 ; do
        read line2 <&5
        read line3 <&5
        read line4 <&5
        read line5 <&5
        read line6 <&5
        read line7 <&5
        read line8 <&5
        read line9 <&5
        read line10 <&5

        echo "$line1" | sed -e 's/\[/ /' -e 's/\]/ /' -e 's/\,...//g' -e 's/:/,/3' -e 's/:/,/3' -e 's/ \+/,/g' | cut -d , -f2,3,6,7,8 | tr '\n' ','
        echo  $line4 $line5 $line6 $line7 $line8 $line9 $line10 | sed 's/^/\{ /' | sed 's/\}.*$/\}/' | jq ' [.line1, .line2, .city, .stateCode, .postalCode, .countryCode] | join (", ")' | sed -e 's/\"//g' -e 's/, /,/g'
done

exec 5<&-

Next make the script executable:

$ chmod +x extract_log.sh

Once ready execute the script and include the log file as an argument:

$ ./extract_log.sh logfile.txt
2018-06-02,09:06:22,AS,UP,ZS492266,Unit 8400,Box 41,DPO,AE,09498,US
2018-06-02,09:06:22,AS,UP,ZS492267,Unit 6400,Box 46,DPO,TX,07498,US

Little bit about the script. Using the while loop the script iterates the entire file and reads 10 lines at the time.

Using the first echo command we print first line, clean it and extract data from new CSV output using cut command.

Using the second echo command we print rest of the lines and create a valid JSON data block. Next, by using jq command we extract JSON data into CSV ( command separated value ) string which gets appended to the previous output.

The script is far from perfect and I’m sure that it can be greatly modified to improve efficiency. As it is normally the case with GNU/Linux there is no single answer to a single question.

Hope this helps

Lubos


#3

Thanks Lubos, i will work on the idea as suggested.