To add support for a new log type we should 1. Define the schema for the new log type 1. Provide a parser to parse a single log item 1. Build and validate a log type entry 1. Export the log type entry so Panther can use it
Log types are identified by a name that has the form Service.EventName
. The Service
prefix groups log types based on the service that produced them (ie NGINX
, Juniper
, Syslog
). All log types for a Service
are grouped under a go module in the parsers module. For example all AWS.*
logs live under the awslogs
module at internal/log_analysis/log_processor/parsers/awslogs
.
Each log type is described in it's own .go
file inside it's module. For example AWS.CloudTrail
is defined in cloud_trail.go
file at internal/log_analysis/log_processor/parsers/awslogs/cloudtrail.go
. All AWS.*
logtypes exist in the awslogs
module. That module exports a LogTypes() logtypes.Group
function to declare all log types it contains. Panther knows to include all exported log types inf modules under internal/log_analysis/log_processor/parsers/
at runtime.
As an example we will use FooService.Event
. FooService.Event
is a log event produced by FooService
and describes the result of a user request. An example FooService.Event
log line is:
{"time": "2020-08-06T15:42:23.415Z","message": "failed to serve a request","remote_ip": "127.0.0.1","referrer_url": "https://example.com","request_id": "123456789","user": {"uid": 123456,"groups": ["admin","ssh"]},"duration_s": 0.15}
NOTE the following steps describe the full process of defining a parser to explain the concepts. If you want you can skip to TL;DR Give me the code​
We will start by writing a test to better understand what we want to achieve and to be sure our code works as it is supposed to be working.
Our goal here is to parse a JSON string of a Foo.Event
and produce a Panther log event that will be stored in our storage backend to be processed by the rules engine and queried in the security data lake.
Panther provides the logtesting
package with helpers for testing log parsers. This module allows to run tests declared in YAML files using the logtesting.RunTestsFromYAML
function. These tests verify that the parser produces the expected log event(s) for a log entry. This method ensures we haven't missed any fields or indicator values throughout the process.
We write a test for Foo.Event
in a YAML file at foologs/testdata/foologs_tests.yml
name: sample_foo_eventlogType: Foo.Eventinput: |{"time": "2020-08-06T15:42:23.415Z","message": "failed to serve a request","remote_ip": "127.0.0.1","referrer_url": "https://example.com","request_id": "123456789","user": {"uid": 123456,"groups": ["admin","ssh"]},"duration_s": 0.15}result: |{"time": "2020-08-06T15:42:23.415Z","message": "failed to serve a request","remote_ip": "127.0.0.1","referrer_url": "https://example.com","request_id": "123456789","user": {"uid": 123456,"groups": ["admin","ssh"]},"duration_s": 0.15,"p_event_time": "2020-08-06T15:42:23.415Z","p_any_ip_addreses": ["127.0.0.1"],"p_any_domain_names": ["example.com"],"p_any_trace_ids": ["123456789"],"p_log_type": "Foo.Event"}
Note that Panther adds some fields to the output event. We call these fields 'panther fields' and they are prefixed with p_
to avoid name collisions.
Also note that the result
JSON includes all the panther-fields so we can verify that the log processing is correct. For testing purposes p_parse_time
and p_row_id
are omitted since they would vary on each run of the test. The helper only verifies that these fields are non-empty and of valid format in the parsed result.
Then to run the test we write a simple go test at foologs/foologs_test.go
. The test will read tests from a YAML file and run them.
package foologs_test​import ("testing""github.com/panther-labs/panther/internal/log_analysis/log_processor/parsers/foologs""github.com/panther-labs/panther/internal/log_analysis/log_processor/logtypes/logtesting")​func TestFooEvent(t *testing.T) {logtesting.RunTestsFromYAML(t, foologs.LogTypes(), "./testdata/foologs_tests.yml")}
The test should fail at this point since we haven't defined any log types yet.
First order of bussiness is to define the schema for the Foo.Event
log type. Panther uses Go structs to define the schema of a log type.
We can represent the schema of a Foo.Event
as a Go struct by:
package foologs​import "github.com/panther-labs/panther/internal/log_analysis/log_processor/pantherlog"​type FooEvent struct {Time pantherlog.Time `json:"time" validate:"required" tcodec:"rfc3339" event_time:"true" description:"The foo event time"`Message pantherlog.String `json:"message" validate:"required" description:"The foo event time"`RemoteIP pantherlog.String `json:"remote_ip" panther:"ip" description:"The remote ip used for the request"`ReferrerURL pantherlog.String `json:"referrer_url" panther:"url" description:"The referrer URL used for the request"`RequestID pantherlog.String `json:"request_id" panther:"trace_id" description:"The id of the request that generated this event"`User *User `json:"user" description:"The user that made the request"`Duration pantherlog.Float64 `json:"duration_s" description:"The number of seconds it took to serve the request"`}type User struct {UID pantherlog.Int64 `json:"uid" description:"The id of the user that made the request"`Groups []string `json:"groups" description:"The groups the user is a member of"`}
Fields in a log event struct should use the types defined in the pantherlog
module. These types handle null
values and missing JSON fields by omitting them in the output. Empty strings (""
) and zero numeric values are never omitted in order to preserve as much as possible from the original log event.
Panther uses struct tags to specify field attributes. All fields must have some description
to document the contents of this field. These documentation strings will be used to generate user documentation from this code. The json
tag must use the exact name used in the logs. Panther automatically adds omitempty
to all fields.
pantherlog.String
String values can be tagged with a panther:"SCANNER"
to define indicator fields. Note that a scanner can produce multiple indicator fields from a single value, or a different indicator field based on the value. Panther defines the following scanners
Scanner | Description |
| Adds a |
| Adds a |
| Adds a |
| Adds a |
| Adds a |
| Adds a |
| Adds a |
| Adds a |
| Adds a |
| Scans an ARN string and adds any |
| Adds a |
| Adds a |
| Adds a |
| Adds a |
pantherlog.Time
Timestamps use a separate Go type to allow easier querying of logs using date time ranges. By default timestamps use the RFC3339 format. To specify a different format for a timestamp use the tcodec
tag.
All numeric values can be parsed from either a JSON number or a JSON numerical string (ie both 42
and "42"
are valid).
pantherlog.Float64
,pantherlog.Float32
pantherlog.Int64
,pantherlog.Int32
,pantherlog.Int16
,pantherlog.Int8
If you are certain about the range of values an integer can take, you can use one for the smaller sizes accordingly. This will limit the storage requirements for the columns. If you are unsure about the range limits use pantherlog.Int64
.
pantherlog.Uint64
,pantherlog.Uint32
,pantherlog.Uint16
,pantherlog.Uint8
If you are certain about the range of values an integer can take, you can use one for the smaller sizes accordingly. A usual example is a port number that can fit in a pantherlog.Uint16
. This will limit the storage requirements for the columns. If you are unsure about the range limits use pantherlog.Uint64
.
pantherlog.Bool
Boolean values handle null
case by omitting the field when encoding to JSON.
If our log has nested objects we define a separate struct to define the object. Fields that have an object value should use a pointer so the nested object can be safely omitted in the output if it is null
or missing in the log input.
We use a simple Go slice which will be omitted in the output if it's empty.
Now that we have defined the schema of Foo.Event
we need to provide a way for panther to parse log input into a panther log event. To achieve this we need to provide a type implementing parsers.Interface
.
type Interface interface {ParseLog(input string) (results []*pantherlog.Result, err error)}
The parser takes a log item (in most cases a line of text) and tries to parse it into one or more log events. If it fails to parse the line it should return a nil
slice and an error.
For our example the parser would be:
package foologs​import (jsoniter "github.com/json-iterator/go""github.com/panther-labs/panther/internal/log_analysis/log_processor/pantherlog""github.com/panther-labs/panther/internal/log_analysis/log_processor/parsers")​type FooParser struct {pantherlog.ResultBuilder}​func (p *FooParser) ParseLog(input string) ([]*pantherlog.Result, error) {event := FooEvent{}// We unmarshal the string using jsoniterif err := jsoniter.UnmarshalFromString(input, &event); err != nil {return nil, err}// We validate the struct using the `validate` struct tags.if err := parsers.ValidateStruct(&event); err != nil {return nil, err}​result, err := p.BuildResult("Foo.Event", &event)if err != nil {return nil, err}​return []*pantherlog.Result{result}, nil}
Parser code follows this general pattern: 1. Parse the input text into a struct 2. Validate the parsed output to ensure the input is valid for this log type 3. Package the log event(s) into pantherlog.Result
values so the log processor can store them.
Fields marked with a panther
struct tag will only be processed when the Result
is encoded to JSON. This is deliberate in order to be able to support both JSON and text-based log types. In the log processor pipeline this happens in the final stage when the result is written to a buffer that will be uploaded to S3.
Attention This means that
EventTime
will only be set on theResult
when it is encoded to JSON.
If a log event requires special logic to produce the event timestamp it can implement pantherlog.EventTimer
interface:
type EventTimer interface {PantherEventTime() time.Time}
The time.Time
returned by EventTimer
instances takes precedence over event timestamps defined with struct tags.
Now that we have our parser and schema ready, we need to map the Foo.Event
log type to them. Panther keeps track of supported log types using a registry of log types.
To describe a log type entry we use logtypes.Config
, a struct including all the information required to build a logtypes.Entry
for any log type. The configuration for Foo.Event
would look like this:
package foologs​var configFooEvent = logtypes.Config{Name: "Foo.Event",Description: "Foo log event",ReferenceURL: "https://example.com/logs/foo-event", // used in generated documentationSchema: FooEvent{},NewParser: parsers.FactoryFunc(func (_ interface{}) (parsers.Interface, error) {return &FooParser{}, nil}),}
To validate the configuration and build a logtypes.Entry
we need to use its BuildEntry() (logtypes.Entry, error)
function
logtypes.Build(config logtypes.Builder) (logtypes.Entry, error)
We prefer to use the panicking helper logtypes.MustBuild
so we are informed of any errors at compile time.
package foologs​var logTypeFooEvent = logtypes.MustBuild(configFooEvent)
To export all logtype.Entry
values declared in a package group them together in a logtypes.Group
which will provide a read-only view of our logtypes.Entry
collection. Note that each group has a name to describe the purpose of the grouping. We choose "Foo" for naming our group.
package foologs​var exportedFooLogTypes = logtypes.Must("Foo", logTypeEvent)
Finally we need to export the log types using a LogTypes() logtypes.Group
function so Panther knows to register and use them at runtime.
package foologsfunc LogTypes() logtypes.Group {return exportedFooLogTypes}
Putting it all together and avoiding intermediate variables we have:
package foologs​import (jsoniter "github.com/json-iterator/go""github.com/panther-labs/panther/internal/log_analysis/log_processor/pantherlog""github.com/panther-labs/panther/internal/log_analysis/log_processor/logtypes""github.com/panther-labs/panther/internal/log_analysis/log_processor/parsers")​func LogTypes() logtypes.Group {return exportedFooLogTypes}​var exportedFooLogTypes = logtypes.Must("Foo", logtypes.Config{Name: "Foo.Event",Description: "Foo log event",ReferenceURL: "https://example.com/logs/foo-event", // used in generated documentationSchema: FooEvent{},NewParser: parsers.FactoryFunc(func (_ interface{}) (parsers.Interface, error) {return &FooParser{}, nil}),})​type FooEvent struct {Time pantherlog.Time `json:"time" validate:"required" tcodec:"rfc3339" event_time:"true" description:"The foo event time"`Message pantherlog.String `json:"message" validate:"required" description:"The foo event time"`RemoteIP pantherlog.String `json:"remote_ip" panther:"ip" description:"The remote ip used for the request"`ReferrerURL pantherlog.String `json:"referrer_url" panther:"url" description:"The referrer URL used for the request"`RequestID pantherlog.String `json:"request_id" panther:"trace_id" description:"The id of the request that generated this event"`User *User `json:"user" description:"The user that made the request"`Duration pantherlog.Float64 `json:"duration_s" description:"The number of seconds it took to serve the request"`}type User struct {UID pantherlog.Int64 `json:"uid" description:"The id of the user that made the request"`Groups []string `json:"groups" description:"The groups the user is a member of"`}​type FooParser struct {pantherlog.ResultBuilder}​func (p *FooParser) ParseLog(input string) ([]*pantherlog.Result, error) {event := FooEvent{}// We unmarshal the string using jsoniterif err := jsoniter.UnmarshalFromString(input, &event); err != nil {return nil, err}// We validate the struct using the `validate` struct tags.if err := parsers.ValidateStruct(&event); err != nil {return nil, err}​result, err := p.BuildResult("Foo.Event", &event)if err != nil {return nil, err}​return []*pantherlog.Result{result}, nil}
That is still a lot of boilerplate for the simple case of parsing a JSON line containing a single log entry. The above code is an example that can be adapted to cover any case regardless of whether the logs are in JSON or whether we want to produce multiple events from a single log entry.
For the common case of a JSON log entry containing a single log event, Panther provides logtypes.ConfigJSON
that automates the definition of the parser. In practice the code required to define such a log type would be much less as seen in the TL;DR section below.
package foologs​import ("github.com/panther-labs/panther/internal/log_analysis/log_processor/pantherlog""github.com/panther-labs/panther/internal/log_analysis/log_processor/logtypes")​// FooEvent describes the schema of a Foo.Event JSON log entry.type FooEvent struct {// We use `tcodec` struct tag to define the time format for the timestamp.// We use the `event_time` struct tag to mark a timestamp as the partitioning timestamp.// We use the `validate:"required"` struct tag to mark field as required.// This allows Panther to distinguish log entries of different log types from each otherTime pantherlog.Time `json:"time" validate:"required" tcodec:"rfc3339" event_time:"true" description:"The foo event time"`// We use field types from the `pantherlog` package to handle JSON null values without the need for pointers and with minimum allocations.Message pantherlog.String `json:"message" validate:"required" description:"The foo event time"`// We use `panther` struct tags to mark strings that should be used as indicator values.RemoteIP pantherlog.String `json:"remote_ip" panther:"ip" description:"The remote ip used for the request"`ReferrerURL pantherlog.String `json:"referrer_url" panther:"url" description:"The referrer URL used for the request"`RequestID pantherlog.String `json:"request_id" panther:"trace_id" description:"The id of the request that generated this event"`// We use a pointer to a struct (*User) to handle the case of a missing or null `user` field.User *User `json:"user" description:"The user that made the request"`Duration pantherlog.Float64 `json:"duration_s" description:"The number of seconds it took to serve the request"`}​type User struct {UID pantherlog.Int64 `json:"uid" description:"The id of the user that made the request"`Groups []string `json:"groups" description:"The groups the user is a member of"`}​// LogTypes exports a group of all log type entries defined in this module.// Panther scans for this function signature and includes these log types at runtime.func LogTypes() logtypes.Group {return exportedFooLogTypes}​// exportedFooLogTypes is a group of all log type entries exported by this module.// By defining a variable at package level we ensure that we only build the entry once during init().// By using `logtypes.Must` we make sure at compile time that all our log type entries are valid.var exportedFooLogTypes = logtypes.Must("Foo", logtypes.ConfigJSON{Name: "Foo.Event",Description: "Foo log event",ReferenceURL: "https://example.com/logs/foo-event", // used in generated documentationNewEvent: func () interface{} {return &FooEvent{}},})
By now our test should be passing, verifying that Panther can properly handle Foo.Event
logs.
Ensure your code is formatted, run mage fmt
Ensure all tests pass mage test:ci
Be sure to checkin the documentation that will be automatically generated and update the SUMMARY.md if you added a new family of log types (enterprise-only).
Deploy Panther and add a Source that uses the log types you defined. You should be able to see a new table with your added parser in Glue Data Catalog
Do an end-to-end test. You can use s3queue to copy test files into the panther-bootstrap-auditlogs-<id>
bucket to drive log processing or use the development tool ./out/bin/devtools/<os>/<arch>/logprocessor
to read files from the local file system.
Write a test rule for the new type to ensure data is flowing.