SQL For Threat Hunters

SQL Basics for Threat Hunters

This module introduces SQL concepts in the context of Amazon Athena, focusing on querying security logs stored in S3. You’ll learn SQL syntax, how to work with various data types, and best practices for building effective queries to detect suspicious activity. By the end of this module, you’ll be able to run essential queries for threat hunting.

Introduction to SQL Syntax in Athena

Athena supports standard SQL for querying data. Below are the core SQL components you need to know:

  1. SELECT: Retrieves data from a table.

  2. FROM: Specifies the table or dataset.

  3. WHERE: Filters rows based on conditions.

  4. GROUP BY: Groups rows with the same values.

  5. ORDER BY: Sorts the result set.

  6. LIMIT: Restricts the number of returned rows.

Basic Query Structure:

SELECT column1, column2 
FROM table_name 
WHERE condition 
GROUP BY column 
ORDER BY column DESC 
LIMIT 10;

Working with Athena Data Types

Athena can query data in structured (CSV), semi-structured (JSON), and columnar formats (Parquet). Threat hunters frequently encounter JSON objects, arrays, and nested structures.

Common Data Types in Athena:

  • STRING: Text values

  • BIGINT: Large integers

  • ARRAY: A list of values

  • STRUCT: Nested objects (common in JSON logs)

Example: Accessing Nested JSON Data

CloudTrail logs often store user identity information in a nested structure:

SELECT userIdentity.userName, eventName, eventTime 
FROM cloudtrail_logs 
WHERE userIdentity.type = 'IAMUser';

Writing Queries for Threat Hunting Use Cases

Here are practical SQL queries tailored for threat hunting using CloudTrail logs, VPC Flow Logs, and GuardDuty findings.

Query 1: Detect Failed Console Logins

Failed logins can indicate brute-force attempts or misconfigurations.

SELECT eventTime, userIdentity.userName, errorMessage 
FROM cloudtrail_logs 
WHERE eventName = 'ConsoleLogin' 
  AND errorCode IS NOT NULL 
ORDER BY eventTime DESC 
LIMIT 10;

Explanation:

  • Filters CloudTrail logs to show failed login attempts.

  • Orders the results by the most recent events.

Query 2: Track Large Outbound Data Transfers

Monitoring large data transfers is critical for detecting data exfiltration.

SELECT srcAddr, dstAddr, SUM(bytes) AS total_bytes 
FROM vpc_flow_logs 
WHERE action = 'ACCEPT' 
GROUP BY srcAddr, dstAddr 
HAVING SUM(bytes) > 1000000 
ORDER BY total_bytes DESC;

Explanation:

  • Groups data by source and destination IPs and sums the bytes transferred.

  • Filters to show only large data transfers (greater than 1 MB).

Query 3: Identify High-Severity GuardDuty Alerts

Quickly identify the most severe alerts for incident response.

sqlCopy codeSELECT eventType, severity, description, count(*) AS alert_count 
FROM guardduty_logs 
WHERE severity >= 7 
GROUP BY eventType, severity, description 
ORDER BY severity DESC;

Explanation:

  • Retrieves high-severity alerts from GuardDuty.

  • Groups and counts alerts based on type and description.

Handling Arrays and Structs in Athena

When analyzing nested data, such as in CloudTrail or GuardDuty logs, it’s common to work with arrays and structs. Below is an example of how to query nested data.

Example: Extracting Information from Structs

SELECT userIdentity.arn, eventSource, eventTime 
FROM cloudtrail_logs 
WHERE userIdentity.type = 'AssumedRole';

Explanation:

  • Queries the user ARN for actions performed by assumed roles.

Optimizing Athena Queries for Large Datasets

When working with massive datasets, query optimization is essential to reduce costs and improve performance.

Optimization Techniques:

  1. Use SELECT Specific Columns Avoid using SELECT * to reduce the amount of data scanned.

    sqlCopy codeSELECT userName, eventName FROM cloudtrail_logs WHERE eventName = 'CreateUser';
  2. Use WHERE Clauses Filter data as early as possible using WHERE clauses.

  3. Partitioning Data Partition logs by time to make queries faster (e.g., year/month/day).

    Example query using partition:

    SELECT eventTime, eventName 
    FROM cloudtrail_logs 
    WHERE year = 2024 AND month = 10 AND day = 14;
  4. Limit Rows Returned Use LIMIT to reduce the number of rows returned for quick inspections.

Creating Saved Queries for Reusability

In threat hunting, it’s common to reuse queries. You can save queries in Athena for quick access during investigations.

Steps to Save a Query:

  1. Run a query in the Athena Console.

  2. Click Save and give the query a meaningful name (e.g., “Failed Console Logins”).

  3. Reuse the query whenever needed without rewriting SQL.

Practical Exercise: Building a Query from Scratch

Scenario:

You suspect an insider is trying to access resources they shouldn’t. Build a query to:

  • Find all actions taken by IAM users.

  • Focus on high-privilege operations, such as CreateUser.

SELECT userIdentity.userName, eventName, eventTime 
FROM cloudtrail_logs 
WHERE userIdentity.type = 'IAMUser' 
  AND eventName = 'CreateUser';

Last updated