Runbooks

Resource names below refer to resources in the CloudFormation templates in Panther.

Each resource describes its function and failure impacts.

panther-alert-delivery-api

This lambda dispatches alerts to their specified outputs (destinations).

Failure Impact

  • Failure of this lambda will impact delivery of alerts.

  • Failed events will go into the panther-alerts-queue-dlq. When the system has recovered they should be re-queued to the panther-alerts-queue using the Panther tool requeue.

panther-alert-forwarder

This lambda reads from a DDB stream for the panther-alert-forwarder table and writes alerts to the panther-log-alert-info ddb table. It also forwards alerts to panther-alerts-queue SQS queue where the appropriate Lambda picks them up for delivery.

Failure Impact

  • Delivery of alerts could be slowed or stopped.

  • There will be no data loss until events are purged from the ddb stream (24 hours).

  • This Lambda processes alerts in batches. In case a batch partially fails, the whole batch will be retried which might lead

    to duplicate notifications for some alerts.

panther-alert-forwarder

The panther-alert-forwarder ddb table is updated conditionally when new policies have a violation or a time limit has been exceeded.

Failure Impact

  • Processing of alerts could be slowed or stopped if there are errors/throttles.

panther-alert-processor

This lambda reads events from the panther-alert-processor-queue generated by the panther-policy-engine lambda. It updates the panther-alert-forwarder ddb table (which enables deduplication) and may trigger remediation by calling the panther-remediation-api.

Failure Impact

  • Failure of this lambda will impact alerts generated policy violations.

  • Failed events will go into the panther-alert-processor-queue-dlq. When the system has recovered they should be re-queued to the panther-alert-processor-queue using the Panther tool requeue.

panther-alert-processor-queue

This sqs queue contains events that may become alerts. The panther-alert-processor lambda reads from this queue to determine if the event requires action or is a duplicate.

Failure Impact

  • Failure of this sqs queue will impact delivery of alerts to output destinations.

  • Failed events will go into the panther-alert-processor-queue-dlq. When the system has recovered they should be re-queued to the panther-alert-processor-queue using the Panther tool requeue.

panther-alert-processor-queue-dlq

This is the dead letter queue for the panther-alert-processor-queue. Items are in this queue due to a failure of the panther-alert-processor lambda. When the system has recovered they should be re-queued to the panther-alert-processor-queue using the Panther tool requeue.

panther-alerts-api

Lambda for CRUD actions for the alerts API.

Failure Impact

  • Failure of this lambda will impact the Panther user interface.

panther-alerts-queue

This sqs queue holds alerts to be delivered to user configured destinations.

Failure Impact

  • Failure of this sqs q will impact delivery of alerts to output destinations.

  • Failed events will go into the panther-alerts-queue-dlq. When the system has recovered they should be re-queued to the panther-alerts-queue using the Panther tool requeue.

panther-alerts-queue-dlq

This is the dead letter queue for the panther-alerts-queue. Items are in this queue due to a failure of the panther-alerts-delivery lambda. When the system has recovered they should be re-queued to the panther-alerts-queue using the Panther tool requeue.

panther-analysis

This ddb table holds the policies applied by the panther-rules-engine lambda and managed by the panther-analysis-api.

Failure Impact

  • Processing of policies could be slowed or stopped if there are errors/throttles.

  • The Panther user interface could be impacted.

panther-analysis-api

This lambda implements the analysis API which is responsible for policies/rules from being created, updated, and deleted.

Failure Impact

  • Failure of this lambda will prevent policies/rules from being created, updated, deleted. Additionally, policies and rules will stop being evaluated by the policy/rules engines.

panther-analysis-packs

This ddb table holds the packs that manages packs in panther-analysis-packs ddb and is managed by the panther-analysis-api.

Failure Impact

  • Updates or rollbacks to enabled detection packs could be delayed

panther-athena-admin-api

The panther-athena-admin-api lambda is used to create tables in Athena for scheduled queries as well the panther-datacatalog-compactor lambda to generate Parquet data.

If there are persistent Athena timeouts or other limit related errors request a limit increase: https://docs.aws.amazon.com/athena/latest/ug/service-limits.html

Failure Impact

  • Failure of this lambda will stop the Panther UI from creating schedule query tables.

  • Failure of this lambda will interfere with Parquet compaction.

panther-athena-api

The panther-athena-api lambda is used by GraphQL to query Athena.

If there are persistent Athena timeouts or other limit related errors request a limit increase: https://docs.aws.amazon.com/athena/latest/ug/service-limits.html

Failure Impact

  • Failure of this lambda will stop the Panther UI from doing Athena queries.

panther-auditlog-processing

The panther-auditlog-processing topic is used to send s3 notifications to log processing for log sources internal to the Panther account.

panther-aws-event-processor

This lambda reads events from the panther-aws-events-queue sqs queue and determines if the infrastructure referenced in the event has changed. If so, it writes events to the panther-snapshot-queue to be scanned for the nature of the changes.

Failure Impact

  • Failure of this lambda will impact from continuous infrastructure monitoring.

  • Failed events will go into the panther-aws-events-queue-dlq. When the system has recovered they should be re-queued to the panther-aws-events-queue using the Panther tool requeue.

panther-aws-events-queue

This sqs q receives CloudTrail events delivered by CloudWatch events as well as S3 notifications from log processing. The panther-aws-event-processor lambda processes these events.

Failure Impact

  • Failure of this sqs q will impact the continuous scanning accounts.

  • Failed events will go into the panther-aws-events-queue-dlq. When the system has recovered they should be re-queued to the panther-aws-events-queue using the Panther tool requeue.

panther-aws-events-queue-dlq

This is the dead letter queue for the panther-aws-events-queue. Items are in this queue due to a failure of the panther-aws-event-processor lambda. When the system has recovered they should be re-queued to the panther-aws-events-queue using the Panther tool requeue.

panther-aws-remediation

The panther-aws-remediation lambda executes automated infrastructure remediations.

Failure Impact

  • Failure of this lambda will mean specific remediations are failing and infrastructure will remain in violation of policy.

panther-boostrap-input-data-notifications

This topic triggers the log analysis flow for data integrations configured internally by Panther e.g. data by Amazon EventBridge.

panther-cfn-custom-resources

Used by CloudFormation when deploying or updating Panther.

Failure Impact

  • Panther itself will not be affected, but deployments may be failing

panther-cloudsecurity-datalake-forwarder

The panther-cloudsecurity-datalake-forwarder lambda reads from the ddb stream for the panther-resources and panther-compliance tables, summarizes changes, and forwards them to the log analysis data puller bucket.

Failure Impact

  • Failure of this lambda will stop delivery of cloud security snapshots to the datalake.

  • There will be no data loss until events are purged from the ddb stream (24 hours).

panther-compliance

This ddb table holds policy violation events for associated resources in the panther-resources ddb table.

Failure Impact

  • Processing of policies could be slowed or stopped if there are errors/throttles.

panther-compliance-api

This lambda implements the compliance API which is responsible for tracking resource and policy pass/fail states.

Failure Impact

  • The UI experiences errors on nearly every page for cloud security related data.

  • Alerts for cloud security stop.

  • Policy failures are no longer be recorded.

panther-cw-alarms

CloudWatch alarms are configured to notify this topic

Failure Impact

  • CloudWatch alarm notifications will not be delivered to subscribers

panther-database-workflow

The panther-database-workflow is a Step Functions state machine used execute queries and notify callers when done.

Failure Impact

  • Failure of this state machine will stop compaction and scheduled queries.

panther-datacatalog-compactor

This SINGLETON lambda polls the Glue catalog every hour, converting partitions to Parquet.

Failure Impact

  • Parquet conversion will stop

panther-datacatalog-compactor-callbacks

This lambda runs the same code as the panther-datacatalog-compactor lambda but is NOT a singleton. This allows asynchronous callbacks from CTAS Step functions to complete.

Failure Impact

  • Parquet conversion will stop

panther-datacatalog-compactor-dlq.fifo

This is the dead letter queue for the panther-datacatalog-compactor-queue.fifo. Items are in this queue due to a failure of the panther-datacatalog-compactor lambda. When the system has recovered they should be re-queued to the panther-datacatalog-compactor-queue.fifo using the Panther tool requeue.

panther-datacatalog-compactor-queue.fifo

This FIFO queue contains Glue partition data notifications. This is used by the panther-datacatalog-compactor lambda to detect if the there is late data (new JSON data after a partition has been converted to Parquet) in order to trigger re-processing of the partition.

Failure Impact

  • Late data processing will stop

panther-datacatalog-compactor-reaper

This lambda deletes parquet data orphaned due to job failures or late data.

Failure Impact

  • Deleting orphaned parquet data will stop and useless data will accumulate.

panther-datacatalog-compactor-reaper-dlq

This is the dead letter queue for the panther-datacatalog-compactor-reaper-queue. Items are in this queue due to a failure of the panther-datacatalog-compactor-reaper lambda. When the system has recovered they should be re-queued to the panther-datacatalog-compactor-reaper-queue using the Panther tool requeue.

panther-datacatalog-compactor-reaper-queue

This queues requests to clean up orphaned compacted data.

Failure Impact

  • Orphaned compacted data not removed and if the problem persists the s3 costs will rise.

panther-datacatalog-updater

This lambda reads events from the panther-datacatalog-updater-queue generated by generated by the panther-rules-engine and panther-log-processor lambda. It creates new partitions to the Glue tables in panther* Glue Databases.

Failure Impact The tables in panther* Glue databases will not be updated with new partitions. This will result in:

  • Users will not be able to search the latest log data

  • Users will not be able to see new events that matched some rule.

panther-datacatalog-updater-dlq

This is the dead letter queue for the panther-datacatalog-updater-queue. Items are in this queue due to a failure of the panther-datacatalog-updater lambda. When the system has recovered they should be re-queued to the panther-datacatalog-updater-queue using the Panther tool requeue.

panther-datacatalog-updater-queue

This queue contains notifications that include information of new log data.

Failure Impact The tables in panther* Glue databases will not be updated with new partitions. This will result in:

  • Users will not be able to search the latest log data

  • Users will not be able to see new events that matched some rule.

panther-graphql-api

This is the GraphQL endpoint for the Panther UI.

Failure Impact

  • The Panther user interface will show errors.

panther-input-data-notifications-queue

This sqs queue receives S3 notifications of log files to be processed by panther-log-processor after routing throught the panther-log-router lambda.

Failure Impact

  • Failure of this sqs queue will impact stop log analysis (log ingest and rule processing).

  • Failed events will go into the panther-input-data-notifications-queue-dlq. When the system has recovered they should be re-queued to the panther-input-data-notifications-queue using the Panther tool requeue.

panther-input-data-notifications-queue-dlq

This is the dead letter queue for the panther-input-data-notifications-queue. Items are in this queue due to a failure of the panther-log-processor lambda. When the system has recovered they should be re-queued to the panther-input-data-notifications-queue using the Panther tool requeue.

panther-kv-store

Key-value store for Python policies/rules to use however they like

Failure Impact

  • Custom rules / policies which leverage the table may be failing

  • Panther itself is not affected

panther-layer-manager

This lambda manages updates to the lambda layers attached to the Panther policy and rule engines.

Failure Impact

  • Failure of this lambda will prevent users from updating global helper functions.

  • Failed events will go into the panther-layer-manager-queue-dlq. When the system has recovered they should be re-queued to the panther-layer-manager-queue using the Panther tool requeue.

panther-layer-manager-queue

This sqs queue is used to communicate layer update requests to the layer manager function.

Failure Impact

  • Failure of this sqs queue will prevent users from updating the globals layer.

  • Failed events will go into the panther-layer-manager-queue-dlq. When the system has recovered, one event should be re-queued to the panther-layer-manager-queue using the Panther tool requeue and the rest should be purged.

panther-layer-manager-queue-dlq

This is the dead letter queue for the panther-layer-manager-queue. Items are in this queue due to a failure of the panther-layer-manager lambda. When the system has recovered they should be re-queued to the panther-layer-manager-queue using the Panther tool requeue.

panther-lock

This table is holding Panther's distributed lock data.

Failure Impact

  • Different aspects of Panther normal operation might be impacted.

panther-log-alert-dedup

The panther-rules-engine lambda manages this table and it is used to deduplicate of alerts. The panther-log-alert-forwarder reads the ddb stream from this table.

Failure Impact

  • Processing of rules could be slowed or stopped if there are errors/throttles.

panther-log-alert-forwarder

This lambda reads from a DDB stream for the panther-alert-dedup table and writes alerts to the panther-log-alert-info ddb table. It also forwards alerts to panther-alerts-queue SQS queue where the appropriate Lambda picks them up for delivery.

Failure Impact

  • Delivery of alerts could be slowed or stopped.

  • There will be no data loss until events are purged from the ddb stream (24 hours).

  • This Lambda processes alerts in batches. In case a batch partially fails, the whole batch will be retried which might lead

    to duplicate notifications for some alerts.

panther-log-alert-info

This table holds the alerts history and is managed by the panther-log-alert-forwarder lambda.

Failure Impact

  • Delivery of alerts could be slowed or stopped if there are errors/throttles.

  • The Panther user interface may be impacted.

panther-log-processor

The lambda function that processes S3 files from notifications posted to the panther-input-data-notifications-queue SQS queue and routed to source specific sqs queues by the panther-log-router lambda. EventBridge scheduled rules trigger processing once a minute to read from the per-source queues.

Troubleshooting

  • If files cannot be processed errors will be generated. Some root causes can be:

    • S3 event configured against a whole bucket rather than a prefix, which causes

      files other than the intended logs to be processed.

    • Variations in the log format not handled by the parsers.

      Open a bug report.

    Failure Impact

  • Failure of this lambda will cause log processing and rule processing (because rules match processed logs) to stop.

  • Failed events will go into the DLQs of the per-source SQS queues. When the system has recovered they should be

  • re-queued to the source queue using the Panther tool requeue.

  • There is the possibility of duplicate data ingested if the failures had partial results.

panther-log-puller

This lambda runs on a schedule, pulling data from SaaS log providers.

Failure Impact Panther will stop pulling data from SaaS logs sources

panther-log-router

The lambda function that routes S3 notifications posted to the panther-input-data-notifications-queue SQS queue and dispatches to source specific sqs queues. EventBridge scheduled rules trigger processing once a minute to read from the per-source queues by the panther-log-processor.

Failure Impact

  • Failure of this lambda will cause log processing and rule processing (because rules match processed logs) to stop.

  • Failed events will go into the panther-input-data-notifications-queue-dlq. When the system has recovered they should be

  • re-queued to the panther-input-data-notifications-queue using the Panther tool requeue.

  • There is the possibility of duplicate data ingested if the failures had partial results.

panther-logtypes

This ddb table stores settings about log types.

panther-logtypes-api

This lambda implements logtypes API to manage logtypes.

panther-message-forwarder

This Lambda pulls data from user configured SQS sources and pushes them to Panther for further processing. Failure Impact Panther will stop processing data from SQS sources.

panther-metrics-api

The panther-metrics-api lambda handles requests for metric data by properly translating them to CloudWatch requests and then translating the results back.

Failure Impact

  • Failure of this lambda will prevent requests for metric data.

panther-organization

This ddb table stores general settings about an organizations.

panther-organization-api

This lambda implements organization API to manage settings.

panther-outputs

This table describes the user configured destinations for alerts.

Failure Impact

  • Processing of alerts could be slowed or stopped if there are errors/throttles.

  • The Panther user interface for managing destinations may be impacted.

panther-outputs-api

This lambda implements CRUD actions for alert outputs (destinations).

Failure Impact

  • Failure of this lambda will impact the Panther user interface for managing destinations.

panther-policy-engine

This lambda executes the user-defined policies against infrastructure events. It is called directly from the panther-resource-processor lambda. Failure Impact

  • Failure of this lambda will impact evaluating policies.

panther-processed-data-notifications

This topic triggers the log analysis flow

panther-query-executions

This table holds execution history for all queries.

Failure Impact

  • The query history user interface will not function.

  • Likely queries will fail.

panther-remediation-api

The panther-remediation-api lambda triggers AWS remediations.

Failure Impact

  • Failure of this lambda will impact performing remediations and infrastructure will remain in violation of policy.

panther-remediation-processor

The panther-remediation-processor lambda processes queued remediations in the panther-remediation-queue and calls the panther-aws-remediation lambda.

Failure Impact

  • Failure of this lambda will impact performing remediations and infrastructure will remain in violation of policy.

  • Failed events will go into the panther-remediation-queue. When the system has recovered they should be re-queued to the panther-remediation-queue-dlq using the Panther tool requeue.

panther-remediation-queue

This sqs queue has remediation request events that are processed by the panther-remediation-processor lambda.

Failure Impact

  • Failure of this sqs queue will impact the remediation of policy violations.

  • Failed events will go into the panther-remediation-queue. When the system has recovered they should be re-queued to the panther-remediation-queue-dlq using the Panther tool requeue.

panther-remediation-queue-dlq

This is the dead letter queue for the panther-remediation-queue. Items are in this queue due to a failure of the panther-remediation-processor lambda. When the system has recovered they should be re-queued to the panther-remediation-queue using the Panther tool requeue.

panther-resource-processor

This lambda reads from panther-resources-queue which has events concerning recently changed infrastructure. The lambda calls the policy-engine lambda to determine if there are any policy engine violations. Each violation is sent to the panther-alert-processor-queue for deduplication and delivery. Failure Impact

  • Failure of this lambda will impact continuous monitoring of infrastructure.

  • Failed events will go into the panther-resources-queue-dlq. When the system has recovered they should be re-queued to the panther-resources-queue using the Panther tool requeue.

panther-resources

This table holds descriptions of the AWS resources in all accounts being monitored. The panther-resources-api lambda manages this table.

Failure Impact

  • Processing of policies could be slowed or stopped if there are errors/throttles.

  • The Panther user interface could be impacted.

panther-resources-api

The panther-resources-api lambda implements the resources API.

Failure Impact

  • Infrastructure scans may be impacted when updating resources.

  • The Panther user interface for display of resources.

panther-resources-queue

This sqs queue has events from recently changed infrastructure. The lambda panther-resource-processor consumes these events to generate alerts.

Failure Impact

  • Failure of this sqs queue will impact the delivery of alerts for recently changed infrastructure.

  • Failed events will go into the panther-resources-queue-dlq. When the system has recovered they should be re-queued to the panther-resources-queue using the Panther tool requeue.

panther-resources-queue-dlq

The dead letter queue for the panther-resources-queue. Items are in this queue due to a failure of the panther-resource-processor lambda. When the system has recovered they should be re-queued to the panther-resources-queue using the Panther tool requeue.

panther-roles

This table stores role metadata (name, id, permissions, lastModified)

Failure Impact

  • The web app may show errors when trying to list or modify roles

panther-rules-engine

The panther-rules-engine lambda function processes S3 files from notifications posted to the panther-rules-engine-queue SQS queue. Matching events are written to S3.

Failure Impact

  • Failure of this lambda will impact alerts generated for rule matches against log data.

  • Failed events will go into the panther-rules-engine-queue-dlq. When the system has recovered they should be re-queued to the panther-rules-engine using the Panther tool requeue.

panther-rules-engine-queue

The panther-rules-engine-queue sqs queue receives S3 notifications of log files to be processed by panther-rules-engine lambda.

Failure Impact

  • Failure of this sqs queue will impact executions of rules on log files.

  • Failed events will go into the panther-rules-engine-queue-dlq. When the system has recovered they should be re-queued to the panther-rules-engine-queue using the Panther tool requeue.

panther-rules-engine-queue-dlq

This is the dead letter queue for the panther-rules-engine-queue. Items are in this queue due to a failure of the panther-rules-engine lambda. When the system has recovered they should be re-queued to the panther-rules-engine-queue using the Panther tool requeue.

panther-saved-queries

This table holds named saved queries (SQL).

Failure Impact

  • The saved query user interface will not function.

panther-snapshot-pollers

This lambda read requests from the panther-snapshot-queue and scans infrastructure calling the panther-resource-api to trigger policy evaluations.

Failure Impact

  • Failure of this lambda will impact cloud security infrastructure editing.

  • Failed events will go into the panther-snapshot-queue-dlq. When the system has recovered they should be re-queued to the panther-snapshot-queue using the Panther tool requeue.

panther-snapshot-queue

This sqs queue has snapshot request events processed by the panther-snapshot-pollers lambda.

Failure Impact

  • Failure of this sqs queue will impact the scanning for changes in infrastrucure.

  • Failed events will go into the panther-snapshot-queue-dlq. When the system has recovered they should be re-queued to the panther-snapshot-queue using the Panther tool requeue.

panther-snapshot-queue-dlq

The dead letter queue for the panther-snapshot-queue. Items are in this queue due to a failure of the panther-snapshot-pollers lambda. When the system has recovered they should be re-queued to the panther-snapshot-queue using the Panther tool requeue.

panther-snapshot-scheduler

The panther-snapshot-scheduler lambda enumerates aws-scan sources by calling the panther-source-api and then scans those sources. Triggered by 24 hour CloudWatch timer events.

Failure Impact

  • Failure of this lambda will prevent daily infrastructure scans from running.

panther-snowflake-admin-api

The panther-snowflake-admin-api lambda is used by deployment and source api lambdas to create tables and views.

Failure Impact

  • Failure of this lambda will stop the registration of new tables and views in Snowflake.

panther-snowflake-api

The panther-snowflake-api lambda is used by AppSync to query Snowflake and Glue for readonly queries.

Failure Impact

  • Failure of this lambda will stop the Panther UI from doing Snowflake queries.

panther-source-api

The panther-source-api lambda manages Cloud Security and Log Analysis sources. This includes creating, testing, updating, listing, and deleting sources.

Failure Impact

  • Failure of this lambda will prevent sources from being manageable, and will interrupt daily scans.

panther-source-integrations

This table does hold the configured accounts and log sources for monitoring.

Failure Impact

  • Processing of policies could be slowed or stopped if there are errors/throttles.

  • The Panther user interface could be impacted.

panther-system-status

The panther-system-status reports the status of different parts of the system.

Failure Impact

  • Failure of this lambda will prevent System Status from being updated

panther-system-status-notifications

This topic receives notifications every time a system alarm switches to ALARM state

panther-users-api

This lambda implements user api.

Failure Impact

  • Failure of this lambda will impact user administration in the Panther user interface.

web

The load balancer associated with the Panther UI.

Failure Impact

  • The Panther user interface will not be available.

Contents
panther-alert-delivery-api
panther-alert-forwarder
panther-alert-forwarder
panther-alert-processor
panther-alert-processor-queue
panther-alert-processor-queue-dlq
panther-alerts-api
panther-alerts-queue
panther-alerts-queue-dlq
panther-analysis
panther-analysis-api
panther-analysis-packs
panther-athena-admin-api
panther-athena-api
panther-auditlog-processing
panther-aws-event-processor
panther-aws-events-queue
panther-aws-events-queue-dlq
panther-aws-remediation
panther-boostrap-input-data-notifications
panther-cfn-custom-resources
panther-cloudsecurity-datalake-forwarder
panther-compliance
panther-compliance-api
panther-cw-alarms
panther-database-workflow
panther-datacatalog-compactor
panther-datacatalog-compactor-callbacks
panther-datacatalog-compactor-dlq.fifo
panther-datacatalog-compactor-queue.fifo
panther-datacatalog-compactor-reaper
panther-datacatalog-compactor-reaper-dlq
panther-datacatalog-compactor-reaper-queue
panther-datacatalog-updater
panther-datacatalog-updater-dlq
panther-datacatalog-updater-queue
panther-graphql-api
panther-input-data-notifications-queue
panther-input-data-notifications-queue-dlq
panther-kv-store
panther-layer-manager
panther-layer-manager-queue
panther-layer-manager-queue-dlq
panther-lock
panther-log-alert-dedup
panther-log-alert-forwarder
panther-log-alert-info
panther-log-processor
panther-log-puller
panther-log-router
panther-logtypes
panther-logtypes-api
panther-message-forwarder
panther-metrics-api
panther-organization
panther-organization-api
panther-outputs
panther-outputs-api
panther-policy-engine
panther-processed-data-notifications
panther-query-executions
panther-remediation-api
panther-remediation-processor
panther-remediation-queue
panther-remediation-queue-dlq
panther-resource-processor
panther-resources
panther-resources-api
panther-resources-queue
panther-resources-queue-dlq
panther-roles
panther-rules-engine
panther-rules-engine-queue
panther-rules-engine-queue-dlq
panther-saved-queries
panther-snapshot-pollers
panther-snapshot-queue
panther-snapshot-queue-dlq
panther-snapshot-scheduler
panther-snowflake-admin-api
panther-snowflake-api
panther-source-api
panther-source-integrations
panther-system-status
panther-system-status-notifications
panther-users-api
web