Setting Up Athena

Setting Up Amazon Athena for Threat Hunting

In this module, we’ll cover the end-to-end setup of Amazon Athena for threat hunting, including preparing S3 buckets, configuring Athena workgroups, and integrating the Glue Data Catalog. Proper setup ensures log data is easily accessible, queries are optimized, and data is secure. This module includes step-by-step instructions and best practices to help you get started.

Configuring Amazon S3 for Log Storage

Since Athena queries data from S3, organizing and securing your S3 buckets is critical. Here’s how you can prepare your S3 buckets for log storage.

Step 1: Creating S3 Buckets for Log Storage

  1. Open the S3 Console and create a new bucket (e.g., security-logs-bucket).

  2. Choose a region closest to your log sources (e.g., us-east-1 for CloudTrail logs).

  3. Enable versioning (to track changes in logs) and server-side encryption with SSE-S3 or SSE-KMS.

Best Practices for Log Organization:

Organize logs by service and date to make queries faster and easier to manage:

/cloudtrail/year=2024/month=10/day=14/
/vpcflowlogs/year=2024/month=10/day=14/
/guardduty/year=2024/month=10/day=14/

Step 2: Configuring Bucket Policies for Security

Use an S3 bucket policy to restrict access to only authorized users and services. Example policy to allow Athena read-only access:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": { "Service": "athena.amazonaws.com" },
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::security-logs-bucket/*"
    }
  ]
}

Setting Up Athena Workgroups for Threat Hunting

Workgroups in Athena allow you to organize queries, manage costs, and monitor usage.

Step 1: Creating a Workgroup

  1. In the Athena Console, go to Workgroups and create a new one (e.g., ThreatHuntingGroup).

  2. Specify an S3 location to store query results (e.g., s3://security-logs-bucket/results/).

Step 2: Configuring Workgroup Settings

  • Enforce Encryption: Ensure all query results are encrypted by setting an S3 encryption policy.

  • Track Costs: Enable Amazon CloudWatch metrics to monitor query usage and costs.

  • Limit Access: Use IAM policies to restrict who can run queries within the workgroup.

Using AWS Glue Data Catalog to Manage Metadata

Athena relies on schemas and tables to query data effectively. The AWS Glue Data Catalog helps manage this metadata, so Athena knows how to read your logs.

Step 1: Creating a Glue Crawler

  1. Open the AWS Glue Console and go to Crawlers.

  2. Create a new crawler and point it to your S3 bucket containing logs.

  3. Choose IAM roles with the necessary permissions to read from S3 and create tables.

Step 2: Running the Crawler to Create Tables

  1. Run the crawler to scan and detect schemas in the logs (e.g., JSON, Parquet).

  2. The crawler will automatically create Athena tables based on the log structure.

Example: CloudTrail Logs Table Schema

CREATE EXTERNAL TABLE cloudtrail_logs (
  eventTime string,
  userIdentity struct<userName: string, type: string>,
  eventName string,
  eventSource string,
  errorCode string
) 
LOCATION 's3://security-logs-bucket/cloudtrail/';

Querying Log Data with Athena

Once your S3 buckets, workgroups, and Glue catalog are set up, you can start querying log data using Athena.

Example Query 1: Identify Failed Logins in CloudTrail Logs

SELECT eventTime, userIdentity.userName, eventName, errorCode 
FROM cloudtrail_logs 
WHERE eventName = 'ConsoleLogin' 
  AND errorCode IS NOT NULL;

Example Query 2: Detect Large Data Transfers in VPC Flow Logs

SELECT srcAddr, dstAddr, bytes 
FROM vpc_flow_logs 
WHERE action = 'ACCEPT' AND bytes > 1000000;

Securing Athena Queries and Data

Security is paramount in threat hunting. Follow these steps to secure your queries and protect your data:

  1. IAM Role Management: Assign read-only permissions to users querying sensitive data. Example IAM policy:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": "athena:StartQueryExecution",
          "Resource": "*"
        }
      ]
    }
  2. Enable Query Encryption: Ensure that all Athena query results are encrypted when stored in S3 by enabling encryption in the workgroup settings.

  3. Enable Logging for Queries: Use CloudTrail to track Athena query executions. This helps in auditing who accessed what data.

Testing Your Setup

To ensure everything is configured correctly:

  1. Run a basic query on your CloudTrail logs to ensure Athena can read from S3.

  2. Verify that query results are stored in the correct S3 bucket.

  3. Check CloudTrail logs to confirm query activity is being tracked.

Troubleshooting Common Issues

  1. Athena Can't Access S3 Bucket:

    • Check the S3 bucket policy to ensure Athena has the necessary permissions.

    • Verify that the IAM role attached to the Athena workgroup has access to S3.

  2. Query Results Not Showing Up in S3:

    • Confirm that the S3 result location is correctly configured in the workgroup.

    • Ensure the query completed successfully without errors.

  3. Tables Not Appearing in Athena:

    • Check the Glue Crawler logs to ensure it ran successfully and detected the schemas.

    • Verify the correct IAM role is assigned to the crawler.

Last updated