Guides
Guides
Jul 2, 2023

AWS Athena: A Step-by-Step Guide for a Secure Setup

AWS Athena: A Step-by-Step Guide for a Secure Setup

Welcome to our hands-on guide on configuring read access for AWS Athena. For this guide, we will be using ELB log examples directly available on AWS Athena. Keep in mind, these steps can be adapted to suit your own dataset. If you're wondering why read access matters, revisit our blog post 'The Principle of Least Privilege, Read Access, and Why Ctrl+Z Can't Always Save You'. Let's dive in!


Here's the snapshot of what we are going to do:

- Create a Destination S3 Bucket: This is where Athena will write the query results.

- Setup Athena: Configure Athena to query ELB logs from a public AWS bucket.

- Create a Restricted IAM User: This user will have only read access to Athena.

- Configure Airdot Connection: Set up a new connection on Airdot Notebook to integrate with Athena

- Establish a Workgroup in Athena: Ensure the IAM user can write query results to a specific S3 bucket, and nothing more.

Step 1: Create a Destination S3 Bucket for Athena Query Results

1. Navigate to S3 Service: Log into your AWS Management Console and select S3 under Storage.

2. Create Destination Bucket: Click on 'Create bucket', give it a unique name, select the region, and use the default settings for everything else. Click 'Create'. This will be the bucket where Athena stores query results. Remember this bucket name, we will need it in Step 3.

Step 2: Setup Athena to Query Our ELB Logs

1. Navigate to Athena: In your AWS Management Console, select Athena under Analytics.


2. Set up Query Editor: Once in the Athena dashboard, ensure that the Query Editor tab is active. 

3. Create a Database: Copy and paste the following SQL statement in the Query Editor.

                    
CREATE DATABASE athena_test_db;
                        


4. Create a Table: In the Query Editor, you'll create a table that maps to your ELB logs. Copy and paste the following SQL statement and click 'Run query'.

                    
CREATE EXTERNAL TABLE IF NOT EXISTS elb_logs (
    request_timestamp string, 
    elb_name string, 
    request_ip string, 
    request_port int, 
    backend_ip string, 
    backend_port int, 
    request_processing_time double, 
    backend_processing_time double, 
    client_response_time double, 
    elb_response_code string, 
    backend_response_code string, 
    received_bytes bigint, 
    sent_bytes bigint, 
    request_verb string, 
    url string, 
    protocol string, 
    user_agent string, 
    ssl_cipher string, 
    ssl_protocol string
) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
LOCATION 's3://athena-examples-us-west-2/elb/parquet/'
TBLPROPERTIES ('has_encrypted_data'='false');
                        


Note: this CREATE statement is specific to the example log file we've considered for this blog post. If you're using your own data, you'll need to adjust this statement to match the schema of your dataset. Replace 's3://athena-examples-us-west-2/elb/parquet/' with the S3 location of your data if needed, and run the query.

   


4. Check the Table: In the left-hand panel under 'Database: default', you should see your new 'elb_logs' table.


Now, you're all set to perform queries on your ELB logs data using Athena!


Step 3: Create a Restricted IAM User

1. Access IAM: In the AWS Management Console, select "IAM" under "Security, Identity, & Compliance."


2. Navigate to Users: On the IAM dashboard, choose the "Users" section.


3. Create a New User: Click the "Add user" button. 


4. Configure User Details: In the "Set user details" section, input a "User name" (e.g., AthenaReadOnlyUser), and check the "Provide user access to the AWS Management Console" and select "I want to create an IAM user" under User type.

 

5. Set Permissions: Click "Next: Permissions". In the "Set permissions" section, choose "Attach existing policies directly" and then "Create policy".

  

6. Create Custom Policy: A new tab will open for creating a policy. Choose the JSON tab and copy-paste the following policy, replacing "your_source_bucket" and "your_destination_bucket" with your own bucket names. Use the s3 bucket created in Step 1 as the destination bucket. For source bucket, use wildcard '*' to allow read access to all the buckets, or mention specific s3 buckets to restrict read access.

                    
{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Effect": "Allow",
			"Action": [
				"athena:StartQueryExecution",
				"athena:GetQueryExecution",
				"athena:GetQueryResults",
				"athena:ListWorkGroups",
				"athena:GetWorkGroup"
			],
			"Resource": "*"
		},
		{
			"Effect": "Allow",
			"Action": "s3:GetObject",
			"Resource": [
				"arn:aws:s3:::your_source_bucket/*"
			]
		},
		{
			"Effect": "Allow",
			"Action": [
				"s3:PutObject",
				"s3:GetObject",
				"s3:AbortMultipartUpload",
				"s3:ListBucket"
			],
			"Resource": [
				"arn:aws:s3:::your_destination_bucket",
				"arn:aws:s3:::your_destination_bucket/*"
			]
		},
		{
			"Effect": "Allow",
			"Action": [
				"glue:GetTable",
				"glue:GetDatabases",
				"glue:GetTables"
			],
			"Resource": "*"
		}
	]
}
                        



7. Review and Create Policy: After entering the policy, click 'Review policy'. On the next page, assign a 'Name' to your policy, such as 'AthenaReadAccessPolicy', and provide a 'Description' that explains the policy, like 'This policy grants the user read access to Athena and specified S3 buckets'. After filling in these details, click 'Create policy'."

The "Name" and "Description" are used for identifying and understanding the policy later, so it's important to choose a name that reflects the policy's purpose and a description that details its effects.


8. Attach Policy to User: Navigate back to the "Add user" tab, refresh the policy list, and search for the newly created policy (AthenaReadAccessPolicy). Select the checkbox beside it and click "Next: Tags".


9. Add Tags (Optional): If you want, add tags to your user (e.g., Name: Role, Value: AthenaReadOnly). Click "Next: Review".


10. Review and Create User: Check the details, then click "Create user". AWS will then provide you with username and password. Save these safely and prefer downloading the .csv file for future reference.

11. Create Access Keys: To generate these, you need to select the newly created user (under users in IAM) and navigate to the "Security credentials" tab. Once there, find the "Access keys" section and click on "Create access keys". Select "other" as use case and click "Next". Provide description tag value (optional) and click on "Create access key".

AWS will then generate a new Access Key ID and Secret Access Key for the user. Make sure to save these credentials in a safe place as they are essential for programmatic access to AWS services. Prefer downloading .csv file for future reference.

You've now created a new IAM user with limited read access to your ELB logs in Athena and write access to a specified S3 bucket for query results!"

Step 4: Connect Athena with Airdot Notebook

Now that we have our Athena setup ready, let's connect it with Airdot Notebook. Follow these steps:

1. Open Airdot Notebook: Log into your Airdot account and click on "New Project".

2. Add New Connection: Look for the "Data Sources" option in the left navigation menu and select "Athena".

3. Fill Connection Details: In the 'Add a data connection' window, input the following details:

  General
  -Name: Enter a name for the connection. This can be any meaningful name like 'ELB Logs'.

  Database
  - Host: Enter the host address in the format 'athena.[region].amazonaws.com'. Replace '[region]' with the AWS region code where you've setup Athena (e.g., 'athena.ap-south-1.amazonaws.com').
  - Port: Enter the port number. This is usually '443' for Athena.


  S3 Output Path

  This is the S3 path to the destination bucket we created in Step 1 (e.g., 's3://airdot-io-athena-demo'). Ensure that the IAM user created in Step 3 has write access to this path.


  Workgroup

   If you've created a workgroup (see Step 5), enter its name here. This field can be left blank if no workgroup was created.

  Authentication
  AWS Access Key ID and AWS Secret Key: Input the credentials that were generated in Step 3.

   

4. Save Connection: Once you've filled in all the details, click on the 'Test Connection' button. Airdot Notebook will validate the details and establish the connection.

5. Query Athena from Airdot: Add a SQL cell and select the newly established connection "ELB Logs" from Source dropdown menu. Write a sample SQL query and hit run.

Now that you've successfully set up Athena and connected it to Airdot Notebook, it's time for the fun part 🥳: exploring your data! Feel free to manipulate, analyze, and visualize the ELB logs data, just as you would with any other dataset. The power of AWS Athena combined with the flexibility of Airdot Notebook opens up new possibilities and enhances your data analysis workflow. Exciting things are waiting to be discovered in your data. Happy querying and exploring 🚀


(Optional) Step 5: Establish a Workgroup in Athena

Workgroups in Athena are a way of separating users and their queries for better management and control. When a user runs a query, it is attributed to a specific workgroup, and Athena can use this information for things like access control and query-related metrics. This step is optional, but it can be handy for better Athena management and following the PoLP (Principle of Least Privilege).


Here are the steps:

1. Navigate to Workgroups: From the Athena home page, click on "Workgroups" located on the left-side menu.  

2. Create a New Workgroup: Click on "Create workgroup".  

3. Configure Workgroup Details: Fill in the "Workgroup name" (e.g., AthenaReadOnlyWorkgroup), provide an optional description, and click "Next".  

4. Set Workgroup Settings: In the "Workgroup settings" section, enter the output location in the "Query result location" field (e.g., s3://your-output-bucket). Ensure that "Enforce workgroup configuration" and "Publish CloudWatch Metrics" are checked for tighter control and monitoring. Click "Create workgroup".  


With the workgroup set up, your IAM user will be able to write query results to the specified S3 bucket without having access to alter other aspects of your AWS infrastructure. By doing so, we have adhered to the Principle of Least Privilege, granting the user just enough access to perform their job and nothing more.

That's it folks. Happy data hunting! 🧭