Text logs with fastmatch
The fastmatch parser uses simple string patterns that specify the position of fields within a log line. As the name suggests it is very fast and should be the preferred method to parse text logs. It can handle most cases of structured text logs where the order of fields is known. In fact, it is so fast you can specify multiple patterns that will be tested in order, so you can 'solve' cases where there are a few variations in the structure of the log line.
We will be using the following example log line that is using Apache Common Log format:
1
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
Copied!
To parse this log type using fastmatch we would define the following log schema:
1
version: 0
2
parser:
3
fastmatch:
4
# Define an array of patterns to match against.
5
# In this example we only use one pattern because the log format is the same for all lines.
6
# If we wanted to include the Apache Extended Log format, we could provide an additional pattern.
7
match:
8
- '%{remote_ip} %{identity} %{user} [%{timestamp}] "%{method} %{request_uri} %{protocol}" %{status} %{bytes_sent}'
9
emptyValues: [ '-' ] # specify that `-` string values are considered null
10
fields:
11
- name: remote_ip
12
type: string
13
indicators:
14
- ip
15
- name: identity
16
type: string
17
- name: user
18
type: string
19
- name: timestamp
20
type: timestamp
21
isEventTime: true
22
timeFormat: '%d/%b/%Y:%H:%M:%S %z'
23
- name: method
24
type: string
25
- name: request_uri
26
type: string
27
- name: protocol
28
type: string
29
- name: status
30
type: int
31
- name: bytes_sent
32
type: bigint
Copied!

Understanding fastmatch patterns

The patterns use %{field_name} placeholders to set where in the log line a field is expected. For example to match the text
1
2020-10-10T14:32:05 [[email protected]] [DEBUG] "" Something when wrong
Copied!
We can use this pattern (surrounded by single quotes for clarity):
1
'%{timestamp} [%{service}@%{ip}] [%{log_level}] %{message}'
Copied!

Delimiters

The text between two consecutive fields defines the 'delimiter' between them.
Delimiters cannot be empty.
In the example above we cannot omit the "@" between service and ip in the pattern
The field preceding a delimiter cannot contain the delimiter text. In the example above:
    timestamp cannot contain space " "
    service cannot contain "@"
    ip cannot contain "] ["
    log_level cannot contain "] "

Anonymous fields

Field placeholders without names (%{}) are ignored.

Tail capture

If the last field in a pattern does not have any delimiter text after it, it will capture everything until the end of the text. In the example above message will capture "Something when wrong"

Handling quotes

In some cases fields can be quoted within the text:
1
2020-10-10T14:32:05 "Some quoted text with \"escaped quotes\" inside"
Copied!
To properly unescape such fields just surround the field placeholder with quotes:
1
%{timestamp} "%{message}"
Copied!
This works for both single and double quotes.
Last modified 6mo ago