Run-books

Refer to the Cloud Security and Log Analysis architecture diagrams for context.

Resource names below refer to resources in the Cloud Formation templates in Panther.

Each resource describes its function and failure impacts.

panther-alert-delivery-api

This lambda dispatches alerts to their specified outputs (destinations).

Failure Impact

  • Failure of this lambda will impact delivery of alerts.

  • Failed events will go into the panther-alerts-queue-dlq. When the system has recovered they should be re-queued to the panther-alerts-queue using the Panther tool requeue.

panther-alert-forwarder

The panther-alert-forwarder lambda reads from the ddb stream for the table panther-alert-forwarder and sends them to the panther-alerts-queue sqs queue.

Failure Impact

  • Failure of this lambda will stop delivery of alerts to destinations.

  • There will be no data loss until events are purged from the ddb stream (24 hours).

panther-alert-forwarder

The panther-alert-forwarder ddb table is updated conditionally when new policies have a violation or a time limit has been exceeded.

Failure Impact

  • Processing of alerts could be slowed or stopped if there are errors/throttles.

panther-alert-processor

This lambda reads events from the panther-alert-processor-queue generated by the panther-policy-engine lambda. It updates the panther-alert-forwarder ddb table (which enables deduplication) and may trigger remediation by calling the panther-remediation-api.

Failure Impact

  • Failure of this lambda will impact alerts generated policy violations.

  • Failed events will go into the panther-alert-processor-queue-dlq. When the system has recovered they should be re-queued to the panther-alert-processor-queue using the Panther tool requeue.

panther-alert-processor-queue

This sqs queue contains events that may become alerts. The panther-alert-processor lambda reads from this queue to determine if the event requires action or is a duplicate.

Failure Impact

  • Failure of this sqs queue will impact delivery of alerts to output destinations.

  • Failed events will go into the panther-alert-processor-queue-dlq. When the system has recovered they should be re-queued to the panther-alert-processor-queue using the Panther tool requeue.

panther-alert-processor-queue-dlq

This is the dead letter queue for the panther-alert-processor-queue. Items are in this queue due to a failure of the panther-alert-processor lambda. When the system has recovered they should be re-queued to the panther-alert-processor-queue using the Panther tool requeue.

panther-alerts-api

Lambda for CRUD actions for the alerts API.

Failure Impact

  • Failure of this lambda will impact the Panther user interface.

panther-alerts-queue

This sqs queue holds alerts to be delivered to user configured destinations.

Failure Impact

  • Failure of this sqs q will impact delivery of alerts to output destinations.

  • Failed events will go into the panther-alerts-queue-dlq. When the system has recovered they should be re-queued to the panther-alerts-queue using the Panther tool requeue.

panther-alerts-queue-dlq

This is the dead letter queue for the panther-alerts-queue. Items are in this queue due to a failure of the panther-alerts-delivery lambda. When the system has recovered they should be re-queued to the panther-alerts-queue using the Panther tool requeue.

panther-analysis

This ddb table holds the policies applied by the panther-rules-engine lambda and managed by the panther-analysis-api.

Failure Impact

  • Processing of policies could be slowed or stopped if there are errors/throttles.

  • The Panther user interface could be impacted.

panther-analysis-api

This lambda implements the analysis API which is responsible for policies/rules from being created, updated, and deleted.

Failure Impact

  • Failure of this lambda will prevent policies/rules from being created, updated, deleted. Additionally, policies and rules will stop being evaluated by the policy/rules engines.

panther-analysis-api

The panther-analysis-api API Gateway calls the panther-analysis-api lambda.

panther-athena-api

The panther-athena-api lambda is used by AppSync to query Athena and Glue as well the panther-datacatalog-compactor lambda to generate Parquet data.

If there are persistent Athena timeouts or other limit related errors request a limit increase: https://docs.aws.amazon.com/athena/latest/ug/service-limits.html

Failure Impact

  • Failure of this lambda will stop the Panther UI from doing Athena queries.

  • Failure of this lambda will interfere with Parquet compaction.

panther-athena-workflow

The panther-athena-workflow is a Step Functions state machine used execute Athena queries and notify callers when done.

Failure Impact

  • Failure of this state machine will stop the Panther UI from doing Athena queries.

panther-auditlog-processing

The panther-auditlog-processing topic is used to send s3 notifications to log processing for log sources internal to the Panther account.

panther-aws-event-processor

This lambda reads events from the panther-aws-events-queue sqs queue and determines if the infrastructure referenced in the event has changed. If so, it writes events to the panther-shapshot-queue to be scanned for the nature of the changes.

Failure Impact

  • Failure of this lambda will impact from continuous infrastructure monitoring.

  • Failed events will go into the panther-aws-events-queue-dlq. When the system has recovered they should be re-queued to the panther-aws-events-queue using the Panther tool requeue.

panther-aws-events-queue

This sqs q receives CloudTrail events delivered by CloudWatch events as well as S3 notifications from log processing. The panther-aws-event-processor lambda processes these events.

Failure Impact

  • Failure of this sqs q will impact the continuous scanning accounts.

  • Failed events will go into the panther-aws-events-queue-dlq. When the system has recovered they should be re-queued to the panther-aws-events-queue using the Panther tool requeue.

panther-aws-events-queue-dlq

This is the dead letter queue for the panther-aws-events-queue. Items are in this queue due to a failure of the panther-aws-event-processor lambda. When the system has recovered they should be re-queued to the panther-aws-events-queue using the Panther tool requeue.

panther-aws-remediation

The panther-aws-remediation lambda executes automated infrastructure remediations.

Failure Impact

  • Failure of this lambda will mean specific remediations are failing and infrastructure will remain in violation of policy.

panther-boostrap-input-data-notifications

This topic triggers the log analysis flow for data integrations configured internally by Panther e.g. data by Amazon EventBridge.

panther-cfn-custom-resources

Used by CloudFormation when deploying or updating Panther.

Failure Impact

  • Panther itself will not be affected, but deployments may be failing

panther-compliance

This ddb table holds policy violation events for associated resources in the panther-resources ddb table.

Failure Impact

  • Processing of policies could be slowed or stopped if there are errors/throttles.

panther-compliance-api

This lambda implements the compliance API which is responsible for tracking resource and policy pass/fail states.

Failure Impact

  • The UI experiences errors on nearly every page for cloud security related data.

  • Alerts for cloud security stop.

  • Policy failures are no longer be recorded.

panther-compliance-api

The panther-compliance-api API Gateway calls the panther-compliance-api lambda.

panther-cw-alarms

CloudWatch alarms are configured to notify this topic

Failure Impact

  • CloudWatch alarm notifications will not be delivered to subscribers

panther-datacatalog-compactor

This SINGLETON lambda polls the Glue catalog every hour, converting partitions to Parquet.

Failure Impact

  • Parquet conversion will stop

panther-datacatalog-compactor-callbacks

This lambda runs the same code as the panther-datacatalog-compactor lambda but is NOT a singleton. This allows asynchronous callbacks from CTAS Step functions to complete.

Failure Impact

  • Parquet conversion will stop

panther-datacatalog-compactor-dlq.fifo

This is the dead letter queue for the panther-datacatalog-compactor-queue.fifo. Items are in this queue due to a failure of the panther-datacatalog-compactor lambda. When the system has recovered they should be re-queued to the panther-datacatalog-compactor-queue.fifo using the Panther tool requeue.

panther-datacatalog-compactor-queue.fifo

This FIFO queue contains Glue partition data notifications. This is used by the panther-datacatalog-compactor lambda to detect if the there is late data (new JSON data after a partition has been converted to Parquet) in order to trigger re-processing of the partition.

Failure Impact

  • Late data processing will stop

panther-datacatalog-updater

This lambda reads events from the panther-datacatalog-updater-queue generated by generated by the panther-rules-engine and panther-log-processor lambda. It creates new partitions to the Glue tables in panther* Glue Databases.

Failure Impact The tables in panther* Glue databases will not be updated with new partitions. This will result in:

  • Users will not be able to search the latest log data

  • Users will not be able to see new events that matched some rule.

panther-datacatalog-updater-dlq

This is the dead letter queue for the panther-datacatalog-updater-queue. Items are in this queue due to a failure of the panther-datacatalog-updater lambda. When the system has recovered they should be re-queued to the panther-datacatalog-updater-queue using the Panther tool requeue.

panther-datacatalog-updater-queue

This queue contains notifications that include information of new log data.

Failure Impact The tables in panther* Glue databases will not be updated with new partitions. This will result in:

  • Users will not be able to search the latest log data

  • Users will not be able to see new events that matched some rule.

panther-graphql-api

This is the GraphQL endpoint for the Panther UI.

Failure Impact

  • The Panther user interface will show errors.

panther-input-data-notifications-queue

This sqs queue receives S3 notifications of log files to be processed by panther-log-processor lambda.

Failure Impact

  • Failure of this sqs queue will impact stop log analysis (log ingest and rule processing).

  • Failed events will go into the panther-input-data-notifications-queue-dlq. When the system has recovered they should be re-queued to the panther-input-data-notifications-queue using the Panther tool requeue.

panther-input-data-notifications-queue-dlq

This is the dead letter queue for the panther-input-data-notifications-queue. Items are in this queue due to a failure of the panther-log-processor lambda. When the system has recovered they should be re-queued to the panther-input-data-notifications-queue using the Panther tool requeue.

panther-kv-store

Key-value store for Python policies/rules to use however they like

Failure Impact

  • Custom rules / policies which leverage the table may be failing

  • Panther itself is not affected

panther-layer-manager

This lambda manages updates to the lambda layers attached to the Panther policy and rule engines.

Failure Impact

  • Failure of this lambda will prevent users from updating global helper functions.

  • Failed events will go into the panther-layer-manager-queue-dlq. When the system has recovered they should be re-queued to the panther-layer-manager-queue using the Panther tool requeue.

panther-layer-manager-queue

This sqs queue is used to communicate layer update requests to the layer manager function.

Failure Impact

  • Failure of this sqs queue will prevent users from updating the globals layer.

  • Failed events will go into the panther-layer-manager-queue-dlq. When the system has recovered, one event should be re-queued to the panther-layer-manager-queue using the Panther tool requeue and the rest should be purged.

panther-layer-manager-queue-dlq

This is the dead letter queue for the panther-layer-manager-queue. Items are in this queue due to a failure of the panther-layer-manager lambda. When the system has recovered they should be re-queued to the panther-layer-manager-queue using the Panther tool requeue.

panther-lock

This table is holding Panther's distributed lock data.

Failure Impact

  • Different aspects of Panther normal operation might be impacted.

panther-log-alert-dedup

The panther-rules-engine lambda manages this table and it is used to deduplicate of alerts. The panther-log-alert-forwarder reads the ddb stream from this table.

Failure Impact

  • Processing of rules could be slowed or stopped if there are errors/throttles.

panther-log-alert-forwarder

This lambda reads from a DDB stream for the panther-alert-dedup table and writes alerts to the panther-log-alert-info ddb table. It also forwards alerts to panther-alerts-queue SQS queue where the appropriate Lambda picks them up for delivery.

Failure Impact

  • Delivery of alerts could be slowed or stopped.

  • There will be no data loss until events are purged from the ddb stream (24 hours).

  • This Lambda processes alerts in batches. In case a batch partially fails, the whole batch will be retried which might lead

    to duplicate notifications for some alerts.

panther-log-alert-info

This table holds the alerts history and is managed by the panther-log-alert-forwarder lambda.

Failure Impact

  • Delivery of alerts could be slowed or stopped if there are errors/throttles.

  • The Panther user interface may be impacted.

panther-log-processor

The lambda function that processes S3 files from notifications posted to the panther-input-data-notifications-queue SQS queue.

Troubleshooting

  • If files cannot be processed errors will be generated. Some root causes can be:

    • S3 event configured against a whole bucket rather than a prefix, which causes

      files other than the intended logs to be processed.

    • Variations in the log format not handled by the parsers.

      Open a bug report.

    Failure Impact

  • Failure of this lambda will cause log processing and rule processing (because rules match processed logs) to stop.

  • Failed events will go into the panther-input-data-notifications-queue-dlq. When the system has recovered they should be

  • re-queued to the panther-input-data-notifications-queue using the Panther tool requeue.

  • There is the possibility of duplicate data ingested if the failures had partial results.

panther-log-puller

This lambda runs on a schedule, pulling data from SaaS log providers.

Failure Impact Panther will stop pulling data from SaaS logs sources

panther-logtypes

This ddb table stores settings about log types.

panther-logtypes-api

This lambda implements logtypes API to manage logtypes.

panther-message-forwarder

This Lambda pulls data from user configured SQS sources and pushes them to Panther for further processing. Failure Impact Panther will stop processing data from SQS sources.

panther-metrics-api

The panther-metrics-api lambda handles requests for metric data by properly translating them to CloudWatch requests and then translating the results back.

Failure Impact

  • Failure of this lambda will prevent requests for metric data.

panther-organization

This ddb table stores general settings about an organizations.

panther-organization-api

This lambda implements organization API to manage settings.

panther-outputs

This table describes the user configured destinations for alerts.

Failure Impact

  • Processing of alerts could be slowed or stopped if there are errors/throttles.

  • The Panther user interface for managing destinations may be impacted.

panther-outputs-api

This lambda implements CRUD actions for alert outputs (destinations).

Failure Impact

  • Failure of this lambda will impact the Panther user interface for managing destinations.

panther-policy-engine

This lambda executes the user-defined policies against infrastructure events. It is called directly from the panther-resource-processor lambda. Failure Impact

  • Failure of this lambda will impact evaluating policies.

panther-processed-data-notifications

This topic triggers the log analysis flow

panther-remediation-api

The panther-remediation-api lambda triggers AWS remediations.

Failure Impact

  • Failure of this lambda will impact performing remediations and infrastructure will remain in violation of policy.

panther-remediation-api

The panther-remediation-api API Gateway calls the panther-remediation-api lambda.

panther-remediation-processor

The panther-remediation-processor lambda processes queued remediations in the panther-remediation-queue and calls the panther-aws-remediation lambda.

Failure Impact

  • Failure of this lambda will impact performing remediations and infrastructure will remain in violation of policy.

  • Failed events will go into the panther-remediation-queue. When the system has recovered they should be re-queued to the panther-remediation-queue-dlq using the Panther tool requeue.

panther-remediation-queue

This sqs queue has remediation request events that are processed by the panther-remediation-processor lambda.

Failure Impact

  • Failure of this sqs queue will impact the remediation of policy violations.

  • Failed events will go into the panther-remediation-queue. When the system has recovered they should be re-queued to the panther-remediation-queue-dlq using the Panther tool requeue.

panther-remediation-queue-dlq

This is the dead letter queue for the panther-remediation-queue. Items are in this queue due to a failure of the panther-remediation-processor lambda. When the system has recovered they should be re-queued to the panther-remediation-queue using the Panther tool requeue.

panther-resource-processor

This lambda reads from panther-resources-queue which has events concerning recently changed infrastructure. The lambda calls the policy-engine lambda to determine if there are any policy engine violations. Each violation is sent to the panther-alert-processor-queue for deduplication and delivery. Failure Impact

  • Failure of this lambda will impact continuous monitoring of infrastructure.

  • Failed events will go into the panther-resources-queue-dlq. When the system has recovered they should be re-queued to the panther-resources-queue using the Panther tool requeue.

panther-resources

This table holds descriptions of the AWS resources in all accounts being monitored. The panther-resources-api lambda manages this table.

Failure Impact

  • Processing of policies could be slowed or stopped if there are errors/throttles.

  • The Panther user interface could be impacted.

panther-resources-api

The panther-resources-api lambda implements the resources API.

Failure Impact

  • Infrastructure scans may be impacted when updating resources.

  • The Panther user interface for display of resources.

panther-resources-api

The panther-resources-api API Gateway calls the panther-resources-api lambda.

panther-resources-queue

This sqs queue has events from recently changed infrastructure. The lambda panther-resource-processor consumes these events to generate alerts.

Failure Impact

  • Failure of this sqs queue will impact the delivery of alerts for recently changed infrastructure.

  • Failed events will go into the panther-resources-queue-dlq. When the system has recovered they should be re-queued to the panther-resources-queue using the Panther tool requeue.

panther-resources-queue-dlq

The dead letter queue for the panther-resources-queue. Items are in this queue due to a failure of the panther-resource-processor lambda. When the system has recovered they should be re-queued to the panther-resources-queue using the Panther tool requeue.

panther-roles

This table stores role metadata (name, id, permissions, lastModified)

Failure Impact

  • The web app may show errors when trying to list or modify roles

panther-rules-engine

The panther-rules-engine lambda function processes S3 files from notifications posted to the panther-rules-engine-queue SQS queue. Matching events are written to S3.

Failure Impact

  • Failure of this lambda will impact alerts generated for rule matches against log data.

  • Failed events will go into the panther-rules-engine-dlq. When the system has recovered they should be re-queued to the panther-rules-engine using the Panther tool requeue.

panther-rules-engine-queue

The panther-rules-engine-queue sqs queue receives S3 notifications of log files to be processed by panther-rules-engine lambda.

Failure Impact

  • Failure of this sqs queue will impact executions of rules on log files.

  • Failed events will go into the panther-rules-engine-queue-dlq. When the system has recovered they should be re-queued to the panther-rules-engine-queue using the Panther tool requeue.

panther-rules-engine-queue-dlq

This is the dead letter queue for the panther-rules-engine-queue. Items are in this queue due to a failure of the panther-rules-engine lambda. When the system has recovered they should be re-queued to the panther-rules-engine-queue using the Panther tool requeue.

panther-snapshot-pollers

This lambda read requests from the panther-snapshot-queue and scans infrastructure calling the panther-resource-api to trigger policy evaluations.

Failure Impact

  • Failure of this lambda will impact cloud security infrastructure editing.

  • Failed events will go into the panther-snapshot-queue-dlq. When the system has recovered they should be re-queued to the panther-snapshot-queue using the Panther tool requeue.

panther-snapshot-queue

This sqs queue has snapshot request events processed by the panther-snapshot-pollers lambda.

Failure Impact

  • Failure of this sqs queue will impact the scanning for changes in infrastrucure.

  • Failed events will go into the panther-snapshot-queue-dlq. When the system has recovered they should be re-queued to the panther-snapshot-queue using the Panther tool requeue.

panther-snapshot-queue-dlq

The dead letter queue for the panther-snapshot-queue. Items are in this queue due to a failure of the panther-snapshot-pollers lambda. When the system has recovered they should be re-queued to the panther-snapshot-queue using the Panther tool requeue.

panther-snapshot-scheduler

The panther-snapshot-scheduler lambda enumerates aws-scan sources by calling the panther-source-api and then scans those sources. Triggered by 24 hour CloudWatch timer events.

Failure Impact

  • Failure of this lambda will prevent daily infrastructure scans from running.

panther-snowflake-api

The panther-snowflake-api lambda is used by AppSync to query Snowflake and Glue.

Failure Impact

  • Failure of this lambda will stop the Panther UI from doing Snowflake queries.

panther-source-api

The panther-source-api lambda manages Cloud Security and Log Analysis sources. This includes creating, testing, updating, listing, and deleting sources.

Failure Impact

  • Failure of this lambda will prevent sources from being manageable, and will interrupt daily scans.

panther-source-integrations

This table does hold the configured accounts and log sources for monitoring.

Failure Impact

  • Processing of policies could be slowed or stopped if there are errors/throttles.

  • The Panther user interface could be impacted.

panther-users-api

This lambda implements user api.

Failure Impact

  • Failure of this lambda will impact user administration in the Panther user interface.

web

The load balancer associated with the Panther UI.

Failure Impact

  • The Panther user interface will not be available.

Contents
panther-alert-delivery-api
panther-alert-forwarder
panther-alert-forwarder
panther-alert-processor
panther-alert-processor-queue
panther-alert-processor-queue-dlq
panther-alerts-api
panther-alerts-queue
panther-alerts-queue-dlq
panther-analysis
panther-analysis-api
panther-analysis-api
panther-athena-api
panther-athena-workflow
panther-auditlog-processing
panther-aws-event-processor
panther-aws-events-queue
panther-aws-events-queue-dlq
panther-aws-remediation
panther-boostrap-input-data-notifications
panther-cfn-custom-resources
panther-compliance
panther-compliance-api
panther-compliance-api
panther-cw-alarms
panther-datacatalog-compactor
panther-datacatalog-compactor-callbacks
panther-datacatalog-compactor-dlq.fifo
panther-datacatalog-compactor-queue.fifo
panther-datacatalog-updater
panther-datacatalog-updater-dlq
panther-datacatalog-updater-queue
panther-graphql-api
panther-input-data-notifications-queue
panther-input-data-notifications-queue-dlq
panther-kv-store
panther-layer-manager
panther-layer-manager-queue
panther-layer-manager-queue-dlq
panther-lock
panther-log-alert-dedup
panther-log-alert-forwarder
panther-log-alert-info
panther-log-processor
panther-log-puller
panther-logtypes
panther-logtypes-api
panther-message-forwarder
panther-metrics-api
panther-organization
panther-organization-api
panther-outputs
panther-outputs-api
panther-policy-engine
panther-processed-data-notifications
panther-remediation-api
panther-remediation-api
panther-remediation-processor
panther-remediation-queue
panther-remediation-queue-dlq
panther-resource-processor
panther-resources
panther-resources-api
panther-resources-api
panther-resources-queue
panther-resources-queue-dlq
panther-roles
panther-rules-engine
panther-rules-engine-queue
panther-rules-engine-queue-dlq
panther-snapshot-pollers
panther-snapshot-queue
panther-snapshot-queue-dlq
panther-snapshot-scheduler
panther-snowflake-api
panther-source-api
panther-source-integrations
panther-users-api
web