SQL For Threat Hunters
SQL Basics for Threat Hunters
This module introduces SQL concepts in the context of Amazon Athena, focusing on querying security logs stored in S3. You’ll learn SQL syntax, how to work with various data types, and best practices for building effective queries to detect suspicious activity. By the end of this module, you’ll be able to run essential queries for threat hunting.
Introduction to SQL Syntax in Athena
Athena supports standard SQL for querying data. Below are the core SQL components you need to know:
SELECT: Retrieves data from a table.
FROM: Specifies the table or dataset.
WHERE: Filters rows based on conditions.
GROUP BY: Groups rows with the same values.
ORDER BY: Sorts the result set.
LIMIT: Restricts the number of returned rows.
Basic Query Structure:
SELECT column1, column2
FROM table_name
WHERE condition
GROUP BY column
ORDER BY column DESC
LIMIT 10;
Working with Athena Data Types
Athena can query data in structured (CSV), semi-structured (JSON), and columnar formats (Parquet). Threat hunters frequently encounter JSON objects, arrays, and nested structures.
Common Data Types in Athena:
STRING: Text values
BIGINT: Large integers
ARRAY: A list of values
STRUCT: Nested objects (common in JSON logs)
Example: Accessing Nested JSON Data
CloudTrail logs often store user identity information in a nested structure:
SELECT userIdentity.userName, eventName, eventTime
FROM cloudtrail_logs
WHERE userIdentity.type = 'IAMUser';
Writing Queries for Threat Hunting Use Cases
Here are practical SQL queries tailored for threat hunting using CloudTrail logs, VPC Flow Logs, and GuardDuty findings.
Query 1: Detect Failed Console Logins
Failed logins can indicate brute-force attempts or misconfigurations.
SELECT eventTime, userIdentity.userName, errorMessage
FROM cloudtrail_logs
WHERE eventName = 'ConsoleLogin'
AND errorCode IS NOT NULL
ORDER BY eventTime DESC
LIMIT 10;
Explanation:
Filters CloudTrail logs to show failed login attempts.
Orders the results by the most recent events.
Query 2: Track Large Outbound Data Transfers
Monitoring large data transfers is critical for detecting data exfiltration.
SELECT srcAddr, dstAddr, SUM(bytes) AS total_bytes
FROM vpc_flow_logs
WHERE action = 'ACCEPT'
GROUP BY srcAddr, dstAddr
HAVING SUM(bytes) > 1000000
ORDER BY total_bytes DESC;
Explanation:
Groups data by source and destination IPs and sums the bytes transferred.
Filters to show only large data transfers (greater than 1 MB).
Query 3: Identify High-Severity GuardDuty Alerts
Quickly identify the most severe alerts for incident response.
sqlCopy codeSELECT eventType, severity, description, count(*) AS alert_count
FROM guardduty_logs
WHERE severity >= 7
GROUP BY eventType, severity, description
ORDER BY severity DESC;
Explanation:
Retrieves high-severity alerts from GuardDuty.
Groups and counts alerts based on type and description.
Handling Arrays and Structs in Athena
When analyzing nested data, such as in CloudTrail or GuardDuty logs, it’s common to work with arrays and structs. Below is an example of how to query nested data.
Example: Extracting Information from Structs
SELECT userIdentity.arn, eventSource, eventTime
FROM cloudtrail_logs
WHERE userIdentity.type = 'AssumedRole';
Explanation:
Queries the user ARN for actions performed by assumed roles.
Optimizing Athena Queries for Large Datasets
When working with massive datasets, query optimization is essential to reduce costs and improve performance.
Optimization Techniques:
Use SELECT Specific Columns Avoid using
SELECT *
to reduce the amount of data scanned.sqlCopy codeSELECT userName, eventName FROM cloudtrail_logs WHERE eventName = 'CreateUser';
Use WHERE Clauses Filter data as early as possible using
WHERE
clauses.Partitioning Data Partition logs by time to make queries faster (e.g., year/month/day).
Example query using partition:
SELECT eventTime, eventName FROM cloudtrail_logs WHERE year = 2024 AND month = 10 AND day = 14;
Limit Rows Returned Use
LIMIT
to reduce the number of rows returned for quick inspections.
Creating Saved Queries for Reusability
In threat hunting, it’s common to reuse queries. You can save queries in Athena for quick access during investigations.
Steps to Save a Query:
Run a query in the Athena Console.
Click Save and give the query a meaningful name (e.g., “Failed Console Logins”).
Reuse the query whenever needed without rewriting SQL.
Practical Exercise: Building a Query from Scratch
Scenario:
You suspect an insider is trying to access resources they shouldn’t. Build a query to:
Find all actions taken by IAM users.
Focus on high-privilege operations, such as
CreateUser
.
SELECT userIdentity.userName, eventName, eventTime
FROM cloudtrail_logs
WHERE userIdentity.type = 'IAMUser'
AND eventName = 'CreateUser';
Last updated