AWS – S3 Allow Access for Organization Members

In order to allow read access from the S3 Bucket for all members included in the organization, the following policy must be applied to the S3 Bucket:

{
  "Version": "2012-10-17",
  "Statement": {
    "Sid": "AllowOrganizationToReadBucket",
    "Effect": "Allow",
    "Principal": "*",
    "Action": [
      "s3:GetObject",
      "s3:ListBucket"
    ],
    "Resource": [
      "arn:aws:s3:::stackset-lambdas",
      "arn:aws:s3:::stackset-lambdas/*"
    ],
    "Condition": {
      "StringEquals": {"aws:PrincipalOrgID":["o-xxxxxxxxxx"]}
    }
  }
}

 

Where "stackset-lambdas" is the S3 Bucket name and "o-xxxxxxxxxx" is your Organization ID.

Lambda – For stopping EC2 instances, RDS instances and ASG downscale in all regions

This Python script gets a list of all regions, finds EC2 instances, RDS instances and ASG in them, and if there is no "prevent_stop" tag equal to "true" on the resource, then it stops this resource, and in the case of ASG it scaledown it to 0.

main.py:

import boto3

custom_ec2_filter = [
    {
        'Name': 'instance-state-name',
        'Values': ['running', 'pending']
    }
]

# Get list of EC2 regions
ec2 = boto3.client('ec2')
regions = [region['RegionName'] for region in ec2.describe_regions()['Regions']]

def main(event, context):
    for region in regions:
        print("Region: " + str(region))
        # ASG
        print("Searching for ASG...")
        asg = boto3.client('autoscaling', region_name = region)
        asg_list = [asg_name['AutoScalingGroupName'] for asg_name in asg.describe_auto_scaling_groups()['AutoScalingGroups']]
        for asg_name in asg_list:
            api_response = asg.describe_auto_scaling_groups(
                AutoScalingGroupNames = [
                    asg_name,
                ]
            )
            tags = (api_response["AutoScalingGroups"][0]["Tags"])
            asg_scaledown = False 
            if 'prevent_stop' not in [tag['Key'] for tag in tags]:
                asg_scaledown = True
            else:
                for tag in tags:
                    if tag["Key"] == 'prevent_stop' and tag["Value"] != 'true':
                        asg_scaledown = True
            if asg_scaledown == True:
                minSize = (api_response["AutoScalingGroups"][0]["MinSize"])
                maxSize = (api_response["AutoScalingGroups"][0]["MaxSize"])
                desiredCapacity = (api_response["AutoScalingGroups"][0]["DesiredCapacity"])
                if minSize != 0 or maxSize != 0 or desiredCapacity != 0:
                    print("Scalling down ASG: " + str(asg_name))
                    asg.update_auto_scaling_group(
                        AutoScalingGroupName = asg_name,
                        MinSize = 0,
                        MaxSize = 0,
                        DesiredCapacity = 0
                    )
        # EC2 Instances
        print("Searching for running instances...")
        ec2 = boto3.resource('ec2', region_name = region)
        instances_list = ec2.instances.filter(Filters = custom_ec2_filter)
        for instance in instances_list:
            if 'prevent_stop' not in [tag['Key'] for tag in instance.tags]:
                print("Stopping instance: " + str(instance.id))
                instance.stop()
            else:
                for tag in instance.tags:
                    if tag["Key"] == 'prevent_stop' and tag["Value"] != 'true':
                        print("Stopping instance: " + str(instance.id))
                        instance.stop()
        # RDS
        print("Searching for running RDS instances...")
        rds = boto3.client('rds', region_name = region)
        rds_list = [instance['DBInstanceIdentifier'] for instance in rds.describe_db_instances()['DBInstances']]
        for instance in rds_list:
            arn = rds.describe_db_instances(DBInstanceIdentifier = instance)['DBInstances'][0]['DBInstanceArn']
            tags = rds.list_tags_for_resource(ResourceName = arn)['TagList']
            stop_instance = False
            if 'prevent_stop' not in [tag['Key'] for tag in tags]:
                stop_instance = True
            else:
                for tag in tags:
                    if tag["Key"] == 'prevent_stop' and tag["Value"] != 'true':
                        stop_instance = True
            if stop_instance == True:
                instance_status = rds.describe_db_instances(DBInstanceIdentifier = instance)['DBInstances'][0]['DBInstanceStatus']
                if instance_status != "stopping" and instance_status != "stopped":
                    engine = rds.describe_db_instances(DBInstanceIdentifier = instance)['DBInstances'][0]['Engine']
                    print("Engine: " + str(engine))
                    if engine.startswith('aurora') == True:
                        cluster_id = rds.describe_db_instances(DBInstanceIdentifier = instance)['DBInstances'][0]['DBClusterIdentifier']
                        print("Stopping Aurora cluster: " + str(cluster_id))
                        rds.stop_db_cluster(DBClusterIdentifier = cluster_id)
                    else:
                        print("Stopping RDS instances: " + str(instance))
                        rds.stop_db_instance(DBInstanceIdentifier = instance)
        print("----------------------------------------")

if __name__ == '__main__':
    main()

 

List of required permissions to run (besides the "AWSLambdaExecute" policy):

  • ec2:DescribeRegions
  • ec2:DescribeInstances
  • ec2:StopInstances
  • rds:ListTagsForResource
  • rds:DescribeDBInstances
  • rds:StopDBInstance
  • autoscaling:DescribeAutoScalingGroups
  • autoscaling:UpdateAutoScalingGroup

PagerDuty – Python script for creating events

This Python script creates events in PagerDuty using APIv2.

The following script was taken as a basis.

First you need to create a "Routing Key", aka "Integration Key", not to be confused with "API Access Key", which can be used for any API calls, we only need a key from a specific service.

Go to the service settings, in my case it is called "AWS", and go to the "Integrations" tab

 

Select – "Add a new integration"

 

Set the name and select "Integration Type" -> "Use our API directly" -> "Events API v2" and get "Integration Key"

main.py:

import requests
import json

from typing import Any, Dict, List, Optional

routing_key = 'abc123'
api_url = 'https://events.pagerduty.com/v2/enqueue'

def create_event(
    summary: str,
    severity: str,
    source: str = 'aws',
    action: str = 'trigger',
    dedup_key: Optional[str] = None,
    custom_details: Optional[Any] = None,
    group: Optional[str] = None,
    component: Optional[str] = None,
    class_type: Optional[str] = None,
    images: Optional[List[Any]] = None,
    links: Optional[List[Any]] = None
) -> Dict:
    payload = {
        "summary": summary,
        "severity": severity,
        "source": source,
    }
    if custom_details is not None:
        payload["custom_details"] = custom_details
    if component:
        payload["component"] = component
    if group:
        payload["group"] = group
    if class_type:
        payload["class"] = class_type

    actions = ('trigger', 'acknowledge', 'resolve')
    if action not in actions:
        raise ValueError("Event action must be one of: %s" % ', '.join(actions))
    data = {
        "event_action": action,
        "routing_key": routing_key,
        "payload": payload,
    }
    if dedup_key:
        data["dedup_key"] = dedup_key
    elif action != 'trigger':
        raise ValueError(
            "The dedup_key property is required for event_action=%s events, and it must \
            be a string."
            % action
        )
    if images is not None:
        data["images"] = images
    if links is not None:
        data["links"] = links

    response = requests.post(api_url, json = data)
    print(response.content)

def main():
    create_event(summary = "Instance is down", severity = 'critical', custom_details = "Instance ID: i-12345")
  
if __name__ == '__main__':
    main()

 

Change the value of the variable "routing_key" to the value "Integration Key"

In this example, an event with the "critical" level will be created with the "Instance is down" title and "Instance ID: i-12345" will be specified in detail.

The script can also be used to create events based on AWS CloudWatch Alarms without using the AWS PagerDuty integration. To do this, create an SNS topic, a Lambda function and specify the SNS topic as the source for the Lambda function. After that, you can create a CloudWatch Alarm and specify the SNS topic as an action when triggered.

main.py (Lambda + SNS):

import requests
import json

from typing import Any, Dict, List, Optional

routing_key = 'abc123'
api_url = 'https://events.pagerduty.com/v2/enqueue'

def get_message_info():
    subject = event["Records"][0]["Sns"]["Subject"]
    message = event["Records"][0]["Sns"]["Message"]
    return subject, message

def create_event(
    summary: str,
    severity: str,
    source: str = 'cloudwatch',
    action: str = 'trigger',
    dedup_key: Optional[str] = None,
    custom_details: Optional[Any] = None,
    group: Optional[str] = None,
    component: Optional[str] = None,
    class_type: Optional[str] = None,
    images: Optional[List[Any]] = None,
    links: Optional[List[Any]] = None
) -> Dict:
    payload = {
        "summary": summary,
        "severity": severity,
        "source": source,
    }
    if custom_details is not None:
        payload["custom_details"] = custom_details
    if component:
        payload["component"] = component
    if group:
        payload["group"] = group
    if class_type:
        payload["class"] = class_type

    actions = ('trigger', 'acknowledge', 'resolve')
    if action not in actions:
        raise ValueError("Event action must be one of: %s" % ', '.join(actions))
    data = {
        "event_action": action,
        "routing_key": routing_key,
        "payload": payload,
    }
    if dedup_key:
        data["dedup_key"] = dedup_key
    elif action != 'trigger':
        raise ValueError(
            "The dedup_key property is required for event_action=%s events, and it must \
            be a string."
            % action
        )
    if images is not None:
        data["images"] = images
    if links is not None:
        data["links"] = links

    response = requests.post(api_url, json = data)
    print(response.content)

def main(event, context):
    info = get_message_info()
    create_event(summary = info[0], severity='critical', custom_details = info[1])
  
if __name__ == '__main__':
    main()

 

The Python package "requests" will have to be zipped with the Lambda function, since Lambda doesn’t have it. You can use "virtualenv" for this.

AWS Transfer – Public FTP

AWS Transfer supports 3 protocols: SFTP, FTP, and FTPS. And only SFTP can have a public endpoint, FTP/FTPS can only be run inside a VPC. Also for login/password authorization, you must use a custom provider, you can find more information about this here.

Goal:

Create an AWS Transfer server for the FTP protocol, the service must be public and authorization must also be by login / password.

FTP is insecure and AWS does not recommend using it on public networks.

 

The first thing you need is the AWS SAM CLI installed.

Create a directory where we will download the template, go to it and download:

wget https://s3.amazonaws.com/aws-transfer-resources/custom-idp-templates/aws-transfer-custom-idp-secrets-manager-sourceip-protocol-support-apig.zip

 

Unzip and run the following command:

sam deploy --guided --stack-name aws-transfer-ftp

 

Where, "aws-transfer-ftp" is the name of the created CloudFormation stack, if you specify the name of an existing one, it will update it.

 

Then the interactive installation will start, where you will be prompted to specify the following parameters:

  • Stack Name – the name of the CloudFormation stack, the default is the value of the "–stack-name" key parameter;
  • AWS Region – the region where the CloudFormation stack will be deployed;
  • Parameter CreateServer – whether AWS Transfer service will be created (by default – true);
  • Parameter SecretManagerRegion – if your region does not support SecretsManager, then you can specify a separate region for it;
  • Parameter TransferEndpointType – PUBLIC or VPC, since FTP does not support public endpoints, specify VPC;
  • Parameter TransferSubnetIDs – ID's of the subnets in which the AWS Transfer endpoint will be;
  • Parameter TransferVPCID – VPC ID where the subnets specified in the previous parameter are located.

 

 

Let’s create a SecurityGroup for the FTP service in the required VPC. And we will allow incoming traffic to TCP ports 21 and 81928200 from any address. While we save the created SG, we will attach it in the future.

Then go to the AWS Console – "AWS Transfer Family", find the server created by AWS Transfer and edit its protocol, uncheck the "SFTP" protocol and select "FTP" protocol, and save the changes.

Now we need to add access to FTP from the world, for this we will use NLB. First, let’s find out the private IP addresses of VPC endpoint for AWS Transfer, for this in the "Endpoint details" block, click on the link to the VPC endpoint.

 

Go to the "Subnets" tab and copy all the IP addresses, they will be needed to create target groups.

 

Go to the "Security Groups" tab and change the default security group to the one created earlier.

Now let’s create a target group, for this in the AWS console go to "EC2" -> "Load Balancing" -> "Target Groups" and create the first target group for TCP port 21.

  • Target type: IP address
  • Protocol: TCP
  • Port: 21

It is better to indicate the port number in the name of the target group at the end, since there will be 10 of them and you can easily get confused.

We will also indicate the VPC in which the AWS Transfer service was created. In the next tab, one by one, we will indicate the IP addresses of the VPC endpoint, which we looked at earlier. We save the target group.

Now you need to create 9 more target groups for the port range: TCP 81928200. The procedure is the same as for the target group for port 21, except that you need to specify port 21 for HeathCheck. To do this, in the "Health checks" block, open the "Advanced health check setting" tab, select "Overrive" and specify the port number – 21.

After we are done with target groups, we need to create an "internet-facing" Network Load Balancer and place it on public networks of the same VPC where the AWS Transfer service is. We also create 10 listeners, for TCP ports 21, and for the range 81928200, and for each listener we point the desired target group corresponding to the port number. After which the FTP service must be accessible from outside.

In order to add an FTP user, go to the "Secrets Manager" in the AWS console and create a secret with the "Other type of secrets" type.

 

Create 3 "key/value" pairs:

  • Password – password for the new FTP user;
  • Role – ARN of the role that has write permission to the required S3 bucket;
  • HomeDirectoryDetails – [{"Entry": "/", "Target": "/s3-bucket/user-name"}]

Where "s3-bucket" is the name of the S3 bucket, "user-name" is the name of the directory that the user will go to when connecting to the FTP server (the directory name does not have to match the username, and may also be located outside the root of the bucket)

 

We must save the secret with a name in the format: "server_id/user_name", where "server_id" is the AWS Transfer server ID, "user_name" is the username that will be used to connect to the FTP server.

For convenience, you can also create a DNS CNAME record for the NLB record.

FIX ERROR – RDS: Error creating DB Parameter Group: InvalidParameterValue: ParameterGroupFamily

When creating an RDS by specifying an incorrect value for the "ParameterGroupFamily" parameter, a similar error may occur:

Error creating DB Parameter Group: InvalidParameterValue: ParameterGroupFamily default.mariadb10.2 is not a valid parameter group family

 

To see a list of all possible values for the "ParameterGroupFamily" parameter, you can use the following command:

aws rds describe-db-engine-versions --query "DBEngineVersions[].DBParameterGroupFamily"

 

 

Nginx – Regular Expression Tester

 

For quick testing of Nginx regular expressions, you can use a ready-made docker image. To do this, you need to clone the NGINX-Demos repository:

git clone https://github.com/nginxinc/NGINX-Demos

 

Follow to the "nginx-regex-tester" directory:

cd NGINX-Demos/nginx-regex-tester/

 

And launch the container using "docker-compose":

docker-compose up -d

 

And open the next page:

http://localhost/regextester.php

 

AWS – EKS Fargate – Fluentd CloudWatch

At the time of writing, EKS Fargate does not support a driver log for recording to CloudWatch. The only option is to use Sidecar

Let’s create a ConfigMap, in which we indicate the name of the EKS cluster, region and namespace:

kubectl create configmap cluster-info \
--from-literal=cluster.name=YOUR_EKS_CLUSTER_NAME \
--from-literal=logs.region=YOUR_EKS_CLUSTER_REGION -n KUBERNETES_NAMESPACE

 

Next, let’s create a service account and a ConfigMap with a configuration file for Fluentd. To do this, copy the text below and save it as "fluentd.yaml"

apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluentd
  namespace: {{NAMESPACE}}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: fluentd-role
rules:
  - apiGroups: [""]
    resources:
      - namespaces
      - pods
      - pods/logs
    verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: fluentd-role-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: fluentd-role
subjects:
  - kind: ServiceAccount
    name: fluentd
    namespace: {{NAMESPACE}}
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentd-config
  namespace: {{NAMESPACE}}
  labels:
    k8s-app: fluentd-cloudwatch
data:
  fluent.conf: |
    @include containers.conf

    <match fluent.**>
      @type null
    </match>
  containers.conf: |
    <source>
      @type tail
      @id in_tail_container_logs
      @label @containers
      path /var/log/application.log
      pos_file /var/log/fluentd-containers.log.pos
      tag *
      read_from_head true
      <parse>
        @type none
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      </parse>
    </source>

    <label @containers>
      <filter **>
        @type kubernetes_metadata
        @id filter_kube_metadata
      </filter>

      <filter **>
        @type record_transformer
        @id filter_containers_stream_transformer
        <record>
          stream_name "#{ENV.fetch('HOSTNAME')}"
        </record>
      </filter>

      <filter **>
        @type concat
        key log
        multiline_start_regexp /^\S/
        separator ""
        flush_interval 5
        timeout_label @NORMAL
      </filter>

      <match **>
        @type relabel
        @label @NORMAL
      </match>
    </label>

    <label @NORMAL>
      <match **>
        @type cloudwatch_logs
        @id out_cloudwatch_logs_containers
        region "#{ENV.fetch('REGION')}"
        log_group_name "/aws/containerinsights/#{ENV.fetch('CLUSTER_NAME')}/application"
        log_stream_name_key stream_name
        remove_log_stream_name_key true
        auto_create_stream true
        <buffer>
          flush_interval 5
          chunk_limit_size 2m
          queued_chunks_limit_size 32
          retry_forever true
        </buffer>
      </match>
    </label>

 

And apply it:

curl fluentd.yaml | sed "s/{{NAMESPACE}}/default/" | kubectl apply -f -

 

Where "default" is the name of the required namespace

 

An example of a sidecar deployment:

deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: testapp
  name: testapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: testapp
  strategy: {}
  template:
    metadata:
      labels:
        app: testapp
    spec:
      serviceAccountName: fluentd
      terminationGracePeriodSeconds: 30
      initContainers:
        - name: copy-fluentd-config
          image: busybox
          command: ['sh', '-c', 'cp /config-volume/..data/* /fluentd/etc']
          volumeMounts:
            - name: config-volume
              mountPath: /config-volume
            - name: fluentdconf
              mountPath: /fluentd/etc
      containers:
      - image: alpine:3.10
        name: alpine
        command: ["/bin/sh"]
        args: ["-c", "while true; do echo hello 2>&1 | tee -a /var/log/application.log; sleep 10;done"]
        volumeMounts:
        - name: fluentdconf
          mountPath: /fluentd/etc
        - name: varlog
          mountPath: /var/log
      - image: fluent/fluentd-kubernetes-daemonset:v1.7.3-debian-cloudwatch-1.0
        name: fluentd-cloudwatch
        env:
          - name: REGION
            valueFrom:
              configMapKeyRef:
                name: cluster-info
                key: logs.region
          - name: CLUSTER_NAME
            valueFrom:
              configMapKeyRef:
                name: cluster-info
                key: cluster.name
          - name: AWS_ACCESS_KEY_ID
            value: "XXXXXXXXXXXXXXX"
          - name: "AWS_SECRET_ACCESS_KEY"
            value: "YYYYYYYYYYYYYYY"
        resources:
          limits:
            memory: 400Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
          - name: config-volume
            mountPath: /config-volume
          - name: fluentdconf
            mountPath: /fluentd/etc
          - name: varlog
            mountPath: /var/log
      volumes:
        - name: config-volume
          configMap:
            name: fluentd-config
        - name: fluentdconf
          emptyDir: {}
        - name: varlog
          emptyDir: {}

 

In this deployment, the variables are set to "AWS_ACCESS_KEY_ID" and "AWS_SECRET_ACCESS_KEY", since at the moment there are endpoints for IAM roles only for services: EC2, ECS Fargate and Lambda. For avoid this you can use OpenID Connect provider for EKS.

Kubernetes – One role for multiple namespaces

 

 

Goal:

There are 2 namespaces, they are "kube-system" and "default". It is necessary to run a cron task in the "kube-system" namespace, which will clear the executed jobs and pods in the "default" space. To do this, create a service account in the "kube-system" namespace, a role with the necessary rights in the "default" namespace, and bind the created role for the created account.

cross-namespace-role.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: jobs-cleanup
  namespace: kube-system
automountServiceAccountToken: false
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
  name: jobs-cleanup
  namespace: default
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "watch", "list", "delete"]
- apiGroups: ["batch", "extensions"]
  resources: ["jobs"]
  verbs: ["get", "list", "watch", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: jobs-cleanup
  namespace: default
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: jobs-cleanup
subjects:
- kind: ServiceAccount
  name: jobs-cleanup
  namespace: kube-system