Search…
Text logs in CSV format
The process of parsing files in CSV format is based on converting each row into a simple JSON object mapping keys to values. To do that, each column must be given a name.

CSV logs without header

To parse CSV logs without a header row, Panther needs to know which names to assign to each column.
Let's assume our logs are CSV with 7 columns: year, month, day, time, action, ip_address, message. Some example rows of this file could be:
1
2020,09,01,10:35:23,SEND,192.168.1.3,"PING"
2
2020,09,01,10:35:25,RECV,192.168.1.3,"PONG"
3
2020,09,01,10:35:25,RESTART,-,"System restarts"
Copied!
We would use the following LogSchema to define log type:
1
version: 0
2
parser:
3
csv:
4
# CSV files come in many flavors and you can choose the delimiter character to split each row
5
delimiter: ","
6
# Names in the 'columns' array will be mapped to columns in each row.
7
# If you want to skip a column, you can set the name at the same index to an empty string ("")
8
columns:
9
- year
10
- month
11
- day
12
- time
13
- action
14
- ip_address
15
- message
16
# CSV files sometimes use placeholder values for missing or N/A data.
17
# You can define such values with 'emptyValues' and they will be ignored.
18
emptyValues: ["-"]
19
# The 'expandFields' directive will render a template string injecting generated fields into the key/value pairs
20
expandFields:
21
# Since the timestamp is split across multiple columns, we need to re-assemble it into RFC3339 format
22
# The following will add a 'timestamp' field by replacing the fields from CSV values
23
timestamp: '%{year}-%{month}-%{day}T%{time}Z'
24
fields:
25
- name: timestamp
26
type: timestamp
27
timeFormat: rfc3339
28
isEventTime: true
29
required: true
30
- name: action
31
type: string
32
required: true
33
- name: ip_address
34
type: string
35
indicators: [ip]
36
- name: message
37
type: string
Copied!

CSV logs with header

Avoid using such schemas in combination with others. Use a separate source or S3 prefix.
To parse CSV logs that starts with a header row, Panther has two options:
  • Use the names defined in the header as the names for the JSON fields or,
  • Skip the header and define the names the same way we did for headerless CSV files
To use the names in the header the configuration for the parser should be:
1
parser:
2
csv:
3
delimiter: ","
4
# Setting 'hasHeader' to true without specifying a 'columns' field,
5
# tells Panther to set the column names from values in the header.
6
hasHeader: true
7
# In case you want to rename a column you can use the 'expandFields' directive
8
expandFields:
9
# Let's assume that the header contains '$cost' as column name and you want to 'normalize' it as 'cost_us_dollars'
10
"cost_us_dollars": '%{$cost}'
Copied!
To ignore the header and define your set of names for the columns use:
1
parser:
2
csv:
3
delimiter: ","
4
# Setting 'hasHeader' to true while also specifying a 'columns' field,
5
# tells Panther to ignore the header and use the names in the 'columns' array
6
hasHeader: true
7
columns:
8
- foo
9
- bar
10
- baz
Copied!
Last modified 8mo ago