All queries should bound the result size. This can be done with a LIMIT
or GROUP BY
clause.
Please note that all Athena queries should be qualified with partition columns (year, month, day, hour) for performance reasons.
This is often one of the first questions asked in an investigation. Given there is some known bad indicator such as an IP address, then if there is related activity in your network/systems, a detailed investigation will be needed.
Panther makes asking such questions easy using the 'all_logs' Athena view which will search all data for the indicator of interest. Since most often the answers to these question are negative, making this a fast and efficient operation reduces investigation time.
In this example the Panther field p_any_ip_addresses
is used. Panther extracts a number of common indicator fields over all data sources into standard fields (see Panther Fields).
-- AthenaSELECTp_log_type, count(1) AS row_countFROM panther_views.all_logsWHERE year=2020 AND month=1 AND day=31AND contains(p_any_ip_addresses, '1.2.3.4')GROUP BY p_log_type
-- SnowflakeSELECTp_log_type, count(1) AS row_countFROM panther_views.public.all_logsWHERE p_event_time BETWEEN '2020-01-31 00:00:00' AND '2020-01-31 23:59:59'AND array_contains('1.2.3.4'::variant, p_any_ip_addresses)GROUP BY p_log_type
Ranking activity (top or bottom) is a useful technique to gain visibility into a network. High ranking activity might help locate IP addresses involved in a DDOS attack while low ranking (change ORDER BY to ASC) might highlight sneaky activity.
-- AthenaSELECTip,count(1) as total_rowsFROM panther_views.all_logsCROSS JOIN UNNEST(p_any_ip_addresses) AS t(ip)WHERE year=2020 AND month=1 AND day=23GROUP BY ipORDER BY total_rows DESCLIMIT 10
-- SnowflakeSELECTip.value as ip,count(1) as total_rowsFROM panther_views.public.all_logs, LATERAL FLATTEN(input => p_any_ip_addresses) ipWHERE p_event_time BETWEEN '2020-01-23 00:00:00' AND '2020-01-23 23:59:59'GROUP BY ip.valueORDER BY total_rows DESCLIMIT 10
This is a variant of the above query where we are ranking the IPs by how many data sources they show activity. This shows the degree of "reach" the IP address has over all your systems.
-- AthenaSELECTip,count(distinct p_log_type) as datasetsFROM(SELECTp_log_type,ipFROM panther_views.all_logsCROSS JOIN UNNEST(p_any_ip_addresses) AS t(ip)WHERE year=2020 AND month=1 AND day=23GROUP BY ip, p_log_type)GROUP BY ipORDER BY datasets DESCLIMIT 10
-- SnowflakeSELECTip,count(distinct p_log_type) as datasetsFROM(SELECTp_log_type,ip. value as ipFROM panther_views.public.all_logs, LATERAL FLATTEN(input => p_any_ip_addresses) ipWHERE p_event_time BETWEEN '2020-01-23 00:00:00' AND '2020-01-23 23:59:59'GROUP BY ip.value, p_log_type)GROUP BY ipORDER BY datasets DESCLIMIT 10
Remote shells typically have a human at one end. During an investigation, isolating sessions from SSH and RDP is often a standard procedure to identify specific actor activity.
-- AthenaSELECT*FROM panther_logs.aws_vpcflowWHEREyear=2020 AND month=1 AND day=23AND(srcport IN (22, 3389) OR dstport IN (22, 3389))ORDER BY p_event_time ASCLIMIT 100
-- SnowflakeSELECT*FROM panther_logs.public.aws_vpcflowWHEREp_event_time BETWEEN '2020-01-23 00:00:00' AND '2020-01-23 23:59:59'AND(srcport IN (22, 3389) OR dstport IN (22, 3389))ORDER BY p_event_time ASCLIMIT 100
During an investigation often particular IP addresses are identified as being of interest (e.g, a known command and control node). Once the role of an IP address is identified, isolating and explaining that activity is of interest. This can indicate which resources are likely to be compromised.
-- AthenaSELECT*FROM panther_logs.aws_vpcflowWHERE year=2020 AND month=1 AND day=31AND contains(p_any_ip_addresses, '1.2.3.4')ORDER BY p_event_time ASCLIMIT 100
-- SnowflakeSELECT*FROM panther_logs.public.aws_vpcflowWHERE p_event_time BETWEEN '2020-01-31 00:00:00' AND '2020-01-31 23:59:59'AND array_contains('1.2.3.4'::variant, p_any_ip_addresses)ORDER BY p_event_time ASCLIMIT 100
If there are concerns of a credential breach, then accounting for all AWS console activity is of critical importance. This query will find all the CloudTrail sourceIPaddresses involved in console signins and then return all the VPC Flow activity related. This will show if there are common IP addresses. Of particular interest are IP addresses outside of your organization communicating with the instances as well as logging into the console. This may indicate a compromise where an unauthorized actor is accessing account resources.
-- AthenaWITH cloudTrailIPs as(SELECTDISTINCT sourceIPAddress AS ipFROM panther_logs.aws_cloudtrailWHEREyear=2020 AND month=2 AND day=1ANDeventtype = 'AwsConsoleSignIn')SELECT*FROM cloudTrailIPs ips JOIN panther_logs.aws_vpcflow flow ON (ips.ip = flow.srcaddr OR ips.ip = flow.dstaddr)WHEREyear=2020 AND month=2 AND day=1ORDER BY p_event_time ASCLIMIT 100
-- SnowflakeWITH cloudTrailIPs as(SELECTDISTINCT sourceIPAddress AS ipFROM panther_logs.public.aws_cloudtrailWHEREp_event_time BETWEEN '2020-02-01 00:00:00' AND '2020-02-01 23:59:59'ANDeventtype = 'AwsConsoleSignIn')SELECT*FROM cloudTrailIPs ips JOIN panther_logs.public.aws_vpcflow flow ON (ips.ip = flow.srcaddr OR ips.ip = flow.dstaddr)WHEREp_event_time BETWEEN '2020-02-01 00:00:00' AND '2020-02-01 23:59:59'ORDER BY p_event_time ASCLIMIT 100
The root account should almost never sign into the AWS console; find all such signins.
-- AthenaSELECT*FROM panther_logs.aws_cloudtrailWHEREyear=2020 AND month=1 AND day=23ANDeventtype = 'AwsConsoleSignIn'ANDuseridentity.arn LIKE '%root%'ORDER BY p_event_time ASCLIMIT 100
-- SnowflakeSELECT*FROM panther_logs.public.aws_cloudtrailWHEREp_event_time BETWEEN '2020-01-31 00:00:00' AND '2020-01-31 23:59:59'ANDeventtype = 'AwsConsoleSignIn'ANDuseridentity:arn LIKE '%root%'ORDER BY p_event_time ASCLIMIT 100
This query is similar to the above query, with the IP addresses ranked for all console logins. This helps identify which IP addresses are signing into the console while ranking the relative activity. This can often highlight anomalous behaviors.
-- AthenaSELECTsourceipaddress,count(1) as total_rowsFROM panther_logs.aws_cloudtrailWHEREyear=2020 AND month=1 AND day=23ANDeventtype = 'AwsConsoleSignIn'GROUP BY sourceipaddressORDER BY total_rows DESCLIMIT 100
-- SnowflakeSELECTsourceipaddress,count(1) as total_rowsFROM panther_logs.public.aws_cloudtrailWHEREp_event_time BETWEEN '2020-01-31 00:00:00' AND '2020-01-31 23:59:59'ANDeventtype = 'AwsConsoleSignIn'GROUP BY sourceipaddressORDER BY total_rows DESCLIMIT 100
During an investigation a particular instance may become the focus. For example, if it is compromised. This query uses the the Panther field p_any_aws_instance_ids
to easily search over all CloudTrail events for any related activity.
-- AthenaSELECT*FROM panther_logs.aws_cloudtrailWHERE year=2020 AND month=1AND contains(p_any_aws_instance_ids, 'i-0c4f541ef2f82481c')ORDER BY p_event_time ASCLIMIT 100
-- SnowflakeSELECT*FROM panther_logs.public.aws_cloudtrailWHERE p_event_time BETWEEN '2020-01-01 00:00:00' AND '2020-01-31 23:59:59'AND array_contains('i-0c4f541ef2f82481c'::variant, p_any_aws_instance_ids)ORDER BY p_event_time ASCLIMIT 100
Similar to the above query, the Panther field p_any_aws_arns
can be used to quickly and easily find all CloudTrail activity related to an ARN of interest (perhaps an ARN of role known to be compromised).
-- AthenaSELECT*FROM panther_logs.aws_cloudtrailWHERE year=2020 AND month=1AND contains(p_any_aws_arns, 'arn:aws:iam::123456789012:role/SomeRole')ORDER BY p_event_time ASCLIMIT 100
-- SnowflakeSELECT*FROM panther_logs.public.aws_cloudtrailWHERE p_event_time BETWEEN '2020-01-01 00:00:00' AND '2020-01-31 23:59:59'AND array_contains('arn:aws:iam::123456789012:role/SomeRole'::variant, p_any_aws_arns)ORDER BY p_event_time ASCLIMIT 100
This is another variation of using a Panther field to broadly query. In this case finding all CloudTrail data related to an account of interest using p_any_aws_account_ids
(perhaps the account is compromised and the concern is lateral movement).
-- AthenaSELECT*FROM panther_logs.aws_cloudtrailWHERE year=2020 AND month=1AND contains(p_any_aws_account_ids, '123456789012')ORDER BY p_event_time ASCLIMIT 100
-- SnowflakeSELECT*FROM panther_logs.public.aws_cloudtrailWHERE p_event_time BETWEEN '2020-01-01 00:00:00' AND '2020-01-31 23:59:59'AND array_contains('123456789012'::variant, p_any_aws_account_ids)ORDER BY p_event_time ASCLIMIT 100
Often when credentials have been breached, there is concern about an actor creating or modifying infrastructure. The below query finds all RunInstances commands. These should be reviewed for anomalous activity. For example, actors have been known to spin-up large numbers of GPU instances for bitcoin mining in compromised accounts.
-- AthenaSELECTp_event_time,p_any_aws_instance_idsFROM panther_logs.aws_cloudtrailWHERE year=2020 AND month=1AND eventname = 'RunInstances'ORDER BY p_event_time ASCLIMIT 100
-- SnowflakeSELECTp_event_time,p_any_aws_instance_idsFROM panther_logs.public.aws_cloudtrailWHERE p_event_time BETWEEN '2020-01-01 00:00:00' AND '2020-01-31 23:59:59'AND eventname = 'RunInstances'ORDER BY p_event_time ASCLIMIT 100
GuardDuty is a valuable source of visibility into threats against your infrastructure. However, it can produce a large number of findings. This query shows the distribution of findings which be used to assess the posture of an account.
-- AthenaSELECTseverity,count(1) AS total_rowsFROM panther_logs.aws_guarddutyWHERE year=2020 AND month=1GROUP BY severityORDER BY total_rows DESC
-- SnowflakeSELECTseverity,count(1) AS total_rowsFROM panther_logs.public.aws_guarddutyWHERE p_event_time BETWEEN '2020-01-01 00:00:00' AND '2020-01-31 23:59:59'GROUP BY severityORDER BY total_rows DESC
Similar to the above example, but in this example the query characterizes the findings by ranking affected resources.
-- AthenaSELECTjson_extract(resource, '$.resourcetype') AS resource_type,count(1) AS total_rowsFROM panther_logs.aws_guarddutyWHERE year=2020 AND month=1GROUP BY json_extract(resource, '$.resourcetype')ORDER BY total_rows DESC
-- SnowflakeSELECTresource:resourceType AS resource_type,count(1) AS total_rowsFROM panther_logs.public.aws_guarddutyWHERE p_event_time BETWEEN '2020-01-01 00:00:00' AND '2020-01-31 23:59:59'GROUP BY resource:resourceTypeORDER BY total_rows DESC
The misconfiguration of S3 buckets is a major threat vector. If an open bucket is detected that was not intended to be world readable, it is of critical importance to understand if there were any inappropriate accesses. This query will collect and rank all IP addresses accessing the bucket of interest. These should be reviewed to determine if any are outside your organization (if so, you may have had a data leak).
-- AthenaSELECTremoteip,count(1) AS total_rowsFROM panther_logs.aws_s3serveraccessWHEREyear=2020 AND month=1ANDbucket='somebucket'GROUP BY remoteipORDER BY total_rows DESC
-- SnowflakeSELECTremoteip,count(1) AS total_rowsFROM panther_logs.public.aws_s3serveraccessWHEREp_event_time BETWEEN '2020-01-01 00:00:00' AND '2020-01-31 23:59:59'ANDbucket='somebucket'GROUP BY remoteipORDER BY total_rows DESC
This query will characterize activity by UserAgent over ALB and Nginx logs. This can be useful in an investigation, if an actor has a known set of characteristic UserAgents.
-- AthenaSELECTuseragent,sum(row_count) AS total_rowsFROM (​SELECTuseragent,count(1) AS row_countFROM panther_logs.aws_albWHERE year=2020 AND month=1 AND day=31GROUP BY useragent​UNION ALL​SELECThttpuseragent AS useragent,count(1) AS row_countFROM panther_logs.nginx_accessWHERE year=2020 AND month=1 AND day=31GROUP BY httpuseragent)GROUP BY useragentORDER BY total_rows DESC
-- SnowflakeSELECTuseragent,sum(row_count) AS total_rowsFROM (​SELECTuseragent,count(1) AS row_countFROM panther_logs.public.aws_albWHERE p_event_time BETWEEN '2020-01-31 00:00:00' AND '2020-01-31 23:59:59'GROUP BY useragent​UNION ALL​SELECThttpuseragent AS useragent,count(1) AS row_countFROM panther_logs.public.nginx_accessWHERE p_event_time BETWEEN '2020-01-31 00:00:00' AND '2020-01-31 23:59:59'GROUP BY httpuseragent)GROUP BY useragentORDER BY total_rows DESC