Refer to the Cloud Security and Log Analysis architecture diagrams for context.
Resource names below refer to resources in the Cloud Formation templates in Panther.
Each resource describes its function and failure impacts.
This lambda dispatches alerts to their specified outputs (destinations).
Failure Impact
Failure of this lambda will impact delivery of alerts.
Failed events will go into the panther-alerts-queue-dlq
. When the system has recovered they should be re-queued to the panther-alerts-queue
using the Panther tool requeue
.
The panther-alert-forwarder
lambda reads from the ddb stream for the table panther-alert-forwarder
and sends them to the panther-alerts-queue
sqs queue.
Failure Impact
Failure of this lambda will stop delivery of alerts to destinations.
There will be no data loss until events are purged from the ddb stream (24 hours).
The panther-alert-forwarder
ddb table is updated conditionally when new policies have a violation or a time limit has been exceeded.
Failure Impact
Processing of alerts could be slowed or stopped if there are errors/throttles.
This lambda reads events from the panther-alert-processor-queue
generated by the panther-policy-engine
lambda. It updates the panther-alert-forwarder
ddb table (which enables deduplication) and may trigger remediation by calling the panther-remediation-api
.
Failure Impact
Failure of this lambda will impact alerts generated policy violations.
Failed events will go into the panther-alert-processor-queue-dlq
. When the system has recovered they should be re-queued to the panther-alert-processor-queue
using the Panther tool requeue
.
This sqs queue contains events that may become alerts. The panther-alert-processor
lambda reads from this queue to determine if the event requires action or is a duplicate.
Failure Impact
Failure of this sqs queue will impact delivery of alerts to output destinations.
Failed events will go into the panther-alert-processor-queue-dlq
. When the system has recovered they should be re-queued to the panther-alert-processor-queue
using the Panther tool requeue
.
This is the dead letter queue for the panther-alert-processor-queue
. Items are in this queue due to a failure of the panther-alert-processor
lambda. When the system has recovered they should be re-queued to the panther-alert-processor-queue
using the Panther tool requeue
.
Lambda for CRUD actions for the alerts API.
Failure Impact
Failure of this lambda will impact the Panther user interface.
This sqs queue holds alerts to be delivered to user configured destinations.
Failure Impact
Failure of this sqs q will impact delivery of alerts to output destinations.
Failed events will go into the panther-alerts-queue-dlq
. When the system has recovered they should be re-queued to the panther-alerts-queue
using the Panther tool requeue
.
This is the dead letter queue for the panther-alerts-queue
. Items are in this queue due to a failure of the panther-alerts-delivery
lambda. When the system has recovered they should be re-queued to the panther-alerts-queue
using the Panther tool requeue
.
This ddb table holds the policies applied by the panther-rules-engine
lambda and managed by the panther-analysis-api
.
Failure Impact
Processing of policies could be slowed or stopped if there are errors/throttles.
The Panther user interface could be impacted.
This lambda implements the analysis API which is responsible for policies/rules from being created, updated, and deleted.
Failure Impact
Failure of this lambda will prevent policies/rules from being created, updated, deleted. Additionally, policies and rules will stop being evaluated by the policy/rules engines.
The panther-analysis-api
API Gateway calls the panther-analysis-api
lambda.
The panther-athena-api
lambda is used by AppSync to query Athena and Glue as well the panther-datacatalog-compactor
lambda to generate Parquet data.
If there are persistent Athena timeouts or other limit related errors request a limit increase: https://docs.aws.amazon.com/athena/latest/ug/service-limits.html​
Failure Impact
Failure of this lambda will stop the Panther UI from doing Athena queries.
Failure of this lambda will interfere with Parquet compaction.
The panther-athena-workflow
is a Step Functions state machine used execute Athena queries and notify callers when done.
Failure Impact
Failure of this state machine will stop the Panther UI from doing Athena queries.
The panther-auditlog-processing topic is used to send s3 notifications to log processing for log sources internal to the Panther account.
This lambda reads events from the panther-aws-events-queue
sqs queue and determines if the infrastructure referenced in the event has changed. If so, it writes events to the panther-shapshot-queue
to be scanned for the nature of the changes.
Failure Impact
Failure of this lambda will impact from continuous infrastructure monitoring.
Failed events will go into the panther-aws-events-queue-dlq
. When the system has recovered they should be re-queued to the panther-aws-events-queue
using the Panther tool requeue
.
This sqs q receives CloudTrail events delivered by CloudWatch events as well as S3 notifications from log processing. The panther-aws-event-processor
lambda processes these events.
Failure Impact
Failure of this sqs q will impact the continuous scanning accounts.
Failed events will go into the panther-aws-events-queue-dlq
. When the system has recovered they should be re-queued to the panther-aws-events-queue
using the Panther tool requeue
.
This is the dead letter queue for the panther-aws-events-queue
. Items are in this queue due to a failure of the panther-aws-event-processor
lambda. When the system has recovered they should be re-queued to the panther-aws-events-queue
using the Panther tool requeue
.
The panther-aws-remediation
lambda executes automated infrastructure remediations.
Failure Impact
Failure of this lambda will mean specific remediations are failing and infrastructure will remain in violation of policy.
This topic triggers the log analysis flow for data integrations configured internally by Panther e.g. data by Amazon EventBridge.
Used by CloudFormation when deploying or updating Panther.
Failure Impact
Panther itself will not be affected, but deployments may be failing
This ddb table holds policy violation events for associated resources in the panther-resources
ddb table.
Failure Impact
Processing of policies could be slowed or stopped if there are errors/throttles.
This lambda implements the compliance API which is responsible for tracking resource and policy pass/fail states.
Failure Impact
The UI experiences errors on nearly every page for cloud security related data.
Alerts for cloud security stop.
Policy failures are no longer be recorded.
The panther-compliance-api
API Gateway calls the panther-compliance-api
lambda.
CloudWatch alarms are configured to notify this topic
Failure Impact
CloudWatch alarm notifications will not be delivered to subscribers
This SINGLETON lambda polls the Glue catalog every hour, converting partitions to Parquet.
Failure Impact
Parquet conversion will stop
This lambda runs the same code as the panther-datacatalog-compactor
lambda but is NOT a singleton. This allows asynchronous callbacks from CTAS Step functions to complete.
Failure Impact
Parquet conversion will stop
This is the dead letter queue for the panther-datacatalog-compactor-queue.fifo
. Items are in this queue due to a failure of the panther-datacatalog-compactor
lambda. When the system has recovered they should be re-queued to the panther-datacatalog-compactor-queue.fifo
using the Panther tool requeue
.
This FIFO queue contains Glue partition data notifications. This is used by the panther-datacatalog-compactor
lambda to detect if the there is late data (new JSON data after a partition has been converted to Parquet) in order to trigger re-processing of the partition.
Failure Impact
Late data processing will stop
This lambda reads events from the panther-datacatalog-updater-queue
generated by generated by the panther-rules-engine
and panther-log-processor
lambda. It creates new partitions to the Glue tables in panther*
Glue Databases.
Failure Impact The tables in panther*
Glue databases will not be updated with new partitions. This will result in:
Users will not be able to search the latest log data
Users will not be able to see new events that matched some rule.
This is the dead letter queue for the panther-datacatalog-updater-queue
. Items are in this queue due to a failure of the panther-datacatalog-updater
lambda. When the system has recovered they should be re-queued to the panther-datacatalog-updater-queue
using the Panther tool requeue
.
This queue contains notifications that include information of new log data.
Failure Impact The tables in panther*
Glue databases will not be updated with new partitions. This will result in:
Users will not be able to search the latest log data
Users will not be able to see new events that matched some rule.
This is the GraphQL endpoint for the Panther UI.
Failure Impact
The Panther user interface will show errors.
This sqs queue receives S3 notifications of log files to be processed by panther-log-processor
lambda.
Failure Impact
Failure of this sqs queue will impact stop log analysis (log ingest and rule processing).
Failed events will go into the panther-input-data-notifications-queue-dlq
. When the system has recovered they should be re-queued to the panther-input-data-notifications-queue
using the Panther tool requeue
.
This is the dead letter queue for the panther-input-data-notifications-queue
. Items are in this queue due to a failure of the panther-log-processor
lambda. When the system has recovered they should be re-queued to the panther-input-data-notifications-queue
using the Panther tool requeue
.
Key-value store for Python policies/rules to use however they like
Failure Impact
Custom rules / policies which leverage the table may be failing
Panther itself is not affected
This lambda manages updates to the lambda layers attached to the Panther policy and rule engines.
Failure Impact
Failure of this lambda will prevent users from updating global helper functions.
Failed events will go into the panther-layer-manager-queue-dlq
. When the system has recovered they should be re-queued to the panther-layer-manager-queue
using the Panther tool requeue
.
This sqs queue is used to communicate layer update requests to the layer manager function.
Failure Impact
Failure of this sqs queue will prevent users from updating the globals layer.
Failed events will go into the panther-layer-manager-queue-dlq
. When the system has recovered, one event should be re-queued to the panther-layer-manager-queue
using the Panther tool requeue
and the rest should be purged.
This is the dead letter queue for the panther-layer-manager-queue
. Items are in this queue due to a failure of the panther-layer-manager
lambda. When the system has recovered they should be re-queued to the panther-layer-manager-queue
using the Panther tool requeue
.
This table is holding Panther's distributed lock data.
Failure Impact
Different aspects of Panther normal operation might be impacted.
The panther-rules-engine
lambda manages this table and it is used to deduplicate of alerts. The panther-log-alert-forwarder
reads the ddb stream from this table.
Failure Impact
Processing of rules could be slowed or stopped if there are errors/throttles.
This lambda reads from a DDB stream for the panther-alert-dedup
table and writes alerts to the panther-log-alert-info
ddb table. It also forwards alerts to panther-alerts-queue
SQS queue where the appropriate Lambda picks them up for delivery.
Failure Impact
Delivery of alerts could be slowed or stopped.
There will be no data loss until events are purged from the ddb stream (24 hours).
This Lambda processes alerts in batches. In case a batch partially fails, the whole batch will be retried which might lead
to duplicate notifications for some alerts.
This table holds the alerts history and is managed by the panther-log-alert-forwarder
lambda.
Failure Impact
Delivery of alerts could be slowed or stopped if there are errors/throttles.
The Panther user interface may be impacted.
The lambda function that processes S3 files from notifications posted to the panther-input-data-notifications-queue
SQS queue.
Troubleshooting
If files cannot be processed errors will be generated. Some root causes can be:
S3 event configured against a whole bucket rather than a prefix, which causes
files other than the intended logs to be processed.
Variations in the log format not handled by the parsers.
Failure Impact
Failure of this lambda will cause log processing and rule processing (because rules match processed logs) to stop.
Failed events will go into the panther-input-data-notifications-queue-dlq
. When the system has recovered they should be
re-queued to the panther-input-data-notifications-queue
using the Panther tool requeue
.
There is the possibility of duplicate data ingested if the failures had partial results.
This lambda runs on a schedule, pulling data from SaaS log providers.
Failure Impact Panther will stop pulling data from SaaS logs sources
This ddb table stores settings about log types.
This lambda implements logtypes API to manage logtypes.
This Lambda pulls data from user configured SQS sources and pushes them to Panther for further processing. Failure Impact Panther will stop processing data from SQS sources.
The panther-metrics-api
lambda handles requests for metric data by properly translating them to CloudWatch requests and then translating the results back.
Failure Impact
Failure of this lambda will prevent requests for metric data.
This ddb table stores general settings about an organizations.
This lambda implements organization API to manage settings.
This table describes the user configured destinations for alerts.
Failure Impact
Processing of alerts could be slowed or stopped if there are errors/throttles.
The Panther user interface for managing destinations may be impacted.
This lambda implements CRUD actions for alert outputs (destinations).
Failure Impact
Failure of this lambda will impact the Panther user interface for managing destinations.
This lambda executes the user-defined policies against infrastructure events. It is called directly from the panther-resource-processor
lambda. Failure Impact
Failure of this lambda will impact evaluating policies.
This topic triggers the log analysis flow
The panther-remediation-api
lambda triggers AWS remediations.
Failure Impact
Failure of this lambda will impact performing remediations and infrastructure will remain in violation of policy.
The panther-remediation-api
API Gateway calls the panther-remediation-api
lambda.
The panther-remediation-processor
lambda processes queued remediations in the panther-remediation-queue
and calls the panther-aws-remediation
lambda.
Failure Impact
Failure of this lambda will impact performing remediations and infrastructure will remain in violation of policy.
Failed events will go into the panther-remediation-queue
. When the system has recovered they should be re-queued to the panther-remediation-queue-dlq
using the Panther tool requeue
.
This sqs queue has remediation request events that are processed by the panther-remediation-processor
lambda.
Failure Impact
Failure of this sqs queue will impact the remediation of policy violations.
Failed events will go into the panther-remediation-queue
. When the system has recovered they should be re-queued to the panther-remediation-queue-dlq
using the Panther tool requeue
.
This is the dead letter queue for the panther-remediation-queue
. Items are in this queue due to a failure of the panther-remediation-processor
lambda. When the system has recovered they should be re-queued to the panther-remediation-queue
using the Panther tool requeue
.
This lambda reads from panther-resources-queue
which has events concerning recently changed infrastructure. The lambda calls the policy-engine
lambda to determine if there are any policy engine violations. Each violation is sent to the panther-alert-processor-queue
for deduplication and delivery. Failure Impact
Failure of this lambda will impact continuous monitoring of infrastructure.
Failed events will go into the panther-resources-queue-dlq
. When the system has recovered they should be re-queued to the panther-resources-queue
using the Panther tool requeue
.
This table holds descriptions of the AWS resources in all accounts being monitored. The panther-resources-api
lambda manages this table.
Failure Impact
Processing of policies could be slowed or stopped if there are errors/throttles.
The Panther user interface could be impacted.
The panther-resources-api
lambda implements the resources API.
Failure Impact
Infrastructure scans may be impacted when updating resources.
The Panther user interface for display of resources.
The panther-resources-api
API Gateway calls the panther-resources-api
lambda.
This sqs queue has events from recently changed infrastructure. The lambda panther-resource-processor
consumes these events to generate alerts.
Failure Impact
Failure of this sqs queue will impact the delivery of alerts for recently changed infrastructure.
Failed events will go into the panther-resources-queue-dlq
. When the system has recovered they should be re-queued to the panther-resources-queue
using the Panther tool requeue
.
The dead letter queue for the panther-resources-queue
. Items are in this queue due to a failure of the panther-resource-processor
lambda. When the system has recovered they should be re-queued to the panther-resources-queue
using the Panther tool requeue
.
This table stores role metadata (name, id, permissions, lastModified)
Failure Impact
The web app may show errors when trying to list or modify roles
The panther-rules-engine
lambda function processes S3 files from notifications posted to the panther-rules-engine-queue
SQS queue. Matching events are written to S3.
Failure Impact
Failure of this lambda will impact alerts generated for rule matches against log data.
Failed events will go into the panther-rules-engine-dlq
. When the system has recovered they should be re-queued to the panther-rules-engine
using the Panther tool requeue
.
The panther-rules-engine-queue
sqs queue receives S3 notifications of log files to be processed by panther-rules-engine
lambda.
Failure Impact
Failure of this sqs queue will impact executions of rules on log files.
Failed events will go into the panther-rules-engine-queue-dlq
. When the system has recovered they should be re-queued to the panther-rules-engine-queue
using the Panther tool requeue
.
This is the dead letter queue for the panther-rules-engine-queue
. Items are in this queue due to a failure of the panther-rules-engine
lambda. When the system has recovered they should be re-queued to the panther-rules-engine-queue
using the Panther tool requeue
.
This lambda read requests from the panther-snapshot-queue
and scans infrastructure calling the panther-resource-api
to trigger policy evaluations.
Failure Impact
Failure of this lambda will impact cloud security infrastructure editing.
Failed events will go into the panther-snapshot-queue-dlq
. When the system has recovered they should be re-queued to the panther-snapshot-queue
using the Panther tool requeue
.
This sqs queue has snapshot request events processed by the panther-snapshot-pollers
lambda.
Failure Impact
Failure of this sqs queue will impact the scanning for changes in infrastrucure.
Failed events will go into the panther-snapshot-queue-dlq
. When the system has recovered they should be re-queued to the panther-snapshot-queue
using the Panther tool requeue
.
The dead letter queue for the panther-snapshot-queue
. Items are in this queue due to a failure of the panther-snapshot-pollers
lambda. When the system has recovered they should be re-queued to the panther-snapshot-queue
using the Panther tool requeue
.
The panther-snapshot-scheduler
lambda enumerates aws-scan sources by calling the panther-source-api and then scans those sources. Triggered by 24 hour CloudWatch timer events.
Failure Impact
Failure of this lambda will prevent daily infrastructure scans from running.
The panther-snowflake-api
lambda is used by AppSync to query Snowflake and Glue.
Failure Impact
Failure of this lambda will stop the Panther UI from doing Snowflake queries.
The panther-source-api
lambda manages Cloud Security and Log Analysis sources. This includes creating, testing, updating, listing, and deleting sources.
Failure Impact
Failure of this lambda will prevent sources from being manageable, and will interrupt daily scans.
This table does hold the configured accounts and log sources for monitoring.
Failure Impact
Processing of policies could be slowed or stopped if there are errors/throttles.
The Panther user interface could be impacted.
This lambda implements user api.
Failure Impact
Failure of this lambda will impact user administration in the Panther user interface.
The load balancer associated with the Panther UI.
Failure Impact
The Panther user interface will not be available.