AWS CSAA Study Notes (Part 2)

Tháng Sáu 10, 2021
Posted by: codestar
Category: Uncategorized

Không có phản hồi

Elastic Load Balancers (ELB)

ELB is never given a static IP address, just DNS name.
ELBs can be “In Service” or “Out of Service”
Thresholds
- Unhealthy Threshold = how many intervals with no response before flagging as Out of Service
- Healthy Threshold = how many intervals with response before flagging as In Service
Support the following X-Forwarder headers:
- X-Forwarded-For
- X-Forwarded-Proto
- X-Forwarded-Port

CloudWatch – Performance Monitoring Service

Standard monitoring = 5 minutes
- Turned on by default
Detailed monitoring = 1 minute
Monitors the hypervisor, NOT the guest OS
- Does not monitor memory
Dashboards – create/configure widgets to monitor your environment
Alarms – notify when a given threshold is hit
Events – automatically respond to state changes in your AWS resources
Logs – aggregate, monitor & store logs. Agent installed onto EC2 instances

Auto scaling Groups

Have to have a launch configuration to have an auto scaling group
Can create rules to spin-up and/or shut down instances based on monitor triggers
Deleting an auto scaling group will automatically delete any instances it created

EC2 Placement Groups

A logical grouping of instances within a single AZ.
- Can’t span AZs (duh)
Enables applications to participate in low-latency, 10 GBps network
Recommended for apps that benefit from low latency networks, high network throughput, or both
- Grid computing
- Hadoop clusters
Name must be unique within your AWS account
Only certain types of instances can be launched in a placement group
- Compute Optimized
- GPU
- Memory Optimized
- Storage Optimized
AWS recommends homogenous instances within a placement group (size & family)
Can’t merge placement groups
Can’t move an existing instances into a placement group. You can create an AMI from your existing instance THEN launch a new instance from that AMI into a placement group… if you really wanted to.

Lambda

Compute service that runs your code in response to events and it automatically manages the underlying compute resources for you
Can automatically run code in response to events
- Modifications to objects in S3 buckets
- Messages arriving in Kinesis stream
- Table updates in DynamoDB
- API call logs created by CloudTrail
- Etc…
A new abstraction layer – run code without worrying about infrastructure at all
Javascript is the supported programming language
99.99% availability for the service and the functions it operates
1st 1 million requests are free, $0.20 per 1 million requests afterwards

Route53 (DNS)

IPv6 not fully supported yet.
Alias records work like CNAME records
- Used to map resource record sets in your hosted zone to ELB, CloudFront distributions, or S3 buckets that are configured as websites.
- Difference – a CNAME can’t be used for naked domain names (i.e. w/out “www”), you can with A record or Alias.
- Automatically recognizes changes in the record sets
ELBs don’t have a pre-defined IPv4 address, resolved using DNS
- This can be an issue because naked domain names need an IP address.
- Hence the need for Alias records
Given a choice, always choose an Alias record because you won’t incur additional charges (as you would with a CNAME)

DNS Routing Policies:

Simple
- Default when you create a new record set
- Most commonly used when you have a single resource that performs a given function (i.e. 1 webserver)
- No built-in intelligence
Weighted
- Split traffic based on weighted assignments (10% to X, 90% to Y)
- Different regions, ELBs, AZs, etc.
- Commonly used when testing a new website & you only want a small subset to see the new site
Latency
- Route traffic based on lowest network latency for your end user
- Need to create a latency resource record set for the EC2 or ELB resource in each region you want participating.
- Great for improving global page load times
Failover
- Used when you want to create an active/passive set up.
- Route53 will monitor health of primary site using a health check (which monitors your end points)
Geolocation
- You choose were traffic will be sent based on location of users
- Ex. All EU users get routed to servers w/ local language and prices in Euros

Databases

RDS – Been around since the 70s. Database: tables, rows, fields (columns) -> think spreadsheet

Read this FAQ: https://aws.amazon.com/rds/faqs/
For OLTP
SQL Server
Oracle
MySQL
PostgreSQL
Aurora
MariaDB

DynamoDB – non-relational databases (No SQL)

Database:
- Collection = Table
- Document = Row
- Key/Value pairs = Fields

ElastiCache

web service that deploys, operates & scales an in-memory cache. Improves performance of web apps by retrieving info from RAM instead of disk.
Supports 2 open source in-mem caching engines
- Memcached
- Redis
Caches most consistently queried data

Redshift (data warehousing)

OLAP
Used for BI. Cognos, Jaspersoft, SAP Netweaver
Used to pull in large & complex data sets. Usually used to do queries on data.

DMS (database migration services)

Migrate your prod DB into AWS
AWS manages all the complexities of migration like data type transformation, compression & parallel xfer
Schema conversion tool:
- Convert source DB to a different target DB (Oracle -> Aurora, etc…)

Backups, Multi-AZ & Read Replicas

Backups (2 types):

Automated
- Recover DB to any point in time within retention period (between 1 – 35 days)
- Point in time recovery down to a second, up to the last 5 minutes
- Enabled by default
- Backup data is stored in S3
- Free backup storage equal to size of DB
- Backups are taken within a defined window, retention period up to 35 days
- During backup, I/O suspended (typically a few minutes)
- This can be avoided if you go Multi-AZ as the backup is taken of the standby
DB Snapshots
- Done manually (user initiated), full backup
- Stored even after you delete the original RDS instance, until you explicitly delete them
- When you restore either automated or snap, the restored version will be a new RDS instance with a new endpoint

Encryption

At rest is supported for MySQL, Oracle, SQL, PostgreSQL & MariaDB
Done using AWS KMS
Once your RDS instance is encrypted at rest – underlying storage, backups, read replicas and snaps are also encrypted
Turning on encryption for an existing instance isn’t supported… create a new encrypted instance & migrate data to it

Multi-AZ

Primary RDS instance uses synchronous replication to an RDS in a diff AZ.
Automatic failover, same DNS point, AWS handles replication
Disaster Recovery only, not performance improvement
Only in:
- SQL Server
- Oracle
- MySQL Server
- PostgreSQL
- MariaDB

Read Replica

Uses asynchronous replication to create up to 5 read-only DB copies
Used for performance improvement & Scaling, not DR:
Write to prod, read from read replicas
Must have automatic backups turned on
You can have read replicas OF read replicas, but watch out for latency if you do this.
Each read replica will have it’s own DNS end point.
Cannot have read replicas that have Multi-AZ but you CAN create read replicas of Multi-AZ source DBs
Can break replication & turn a read replica to it’s own source DB
Only in:
- MySQL Server
- PostgreSQL
- MariaDB

DynamoDB vs RDS

DynamoDB offers “push button” scaling -> scale DB on the fly with no downtime
RDS isn’t as easy -> usually need to create bigger instance size manually or add a read replica

DynamoDB

Fast, flexible NoSQL DB service.
Used for apps that need consistent, single-digit millisecond latency at any scale
Fully managed & supports document and key/value data models
Stored on SSD storage
Spread across 3 “geographically distinct” data centers
Multiple consistency models:
Eventually consistent reads (default)
Consistency usually reached within 1 second (best read performance)
Strongly consistent reads
Returns a result that reflects all writes that got a successful response prior to the read
Use this if your app needs data back immediately & in less than 1 second.

Redshift

Fast (10 times faster), fully managed petabyte-scale data warehouse service
Can start small for $0.25 per hour with no commitments & scale up to PB or more for $1,000 per TB per year.
OLAP transactions
Data warehousing DBs us diff type of architecture from both a DB perspective & infrastructure layer.
2 Configurations:
- Single node (160Gb)
- Multi-node
  - Leader Node (manages client connections and receives queries)
  - Compute Node (store data & perform queries and computations). Up to 128 Compute Nodes
- Columnar Data Storage – instead of rows, redshift organizes data by column
  - Only columns involved in the queries are processed
  - Columnar data is stored sequentially on the storage media
  - Block size of 1MB for columnar storage
  - Therefore requires far fewer I/Os, greatly improving performance
- Advanced Compression
  - Columnar data can be compressed much better than row based data
  - Redshift automatically samples data & chooses the best compression scheme
- Massively Parallel Processing (MPP):
  - Automatically distributes data & query load across all nodes & newly added nodes
- Pricing:
  - Compute Node Hours
    - 1 unit per node per hour
  - Backup
  - Data Transfer
- Security
  - Encrypted in transit using SSL
  - At rest using AES-256
  - By default RedShift does it’s own key mgmt.
    - Can manage keys through HSM (hardware security modules) or KMS if you want
- Only available in 1 AZ
  - Can restore snaps to new AZs in the event of an outage
- Good choice if mgmt. runs lots of OLAP transactions & it’s stressing the DB
- Think Business Intelligence (BI)

Elasticache

Caches things – if your app is constantly going to a DB to pull the same data over and over, you can cache it for faster performance
Used to improve latency and throughput for read-heavy app workloads (social networks, gaming, media sharing) or compute heavy workloads (recommendation engine)
Improves application performance by storing critical pieces of data in mem for low-latency access.
Types of elasticache
- Memcached
  - Widely adopted mem object caching system.
- Redis
  - Open source in-mem key/value store.
- Supports master/slave replication & multi-AZ to achieve cross AZ redundancy
Good choice if your DB is read heavy & not prone to frequent changing

Aurora

MySQL compatible RDS DB engine
Speed & availability of commercial DBs
Simplicity & cost-effectiveness of open source DBs
5x better performance than MySQL @ 1/10th the price of commercial DB w/ similar performance & availability
Big challenge to Oracle
Scaling capabilities:
- Start w/ 10Gb, scales in 10Gb increments up to 64Tb
- Compute scales up to 32vCPUs & 244Gb of mem
- 2 copies of DB in each AZ w/ a min of 3 AZs (6 copies of data)
- Can handle loss of 2 copies w/out affecting write availability
- Can handle loss of 3 copies w/out affecting read availability
- Storage is self-healing. Blocks & disks are constantly scanned & repaired
Replica features:
- Aurora Replicas (currently 15)
- MySQL read replicas (currently 5)

VPC = Think of it as a Virtual Datacenter

By default you are allowed 5 VPCs per region
Logically isolated section of AWS where you can launch AWS resources in a virtual network of your own definition
You control the network environment: IP address range, subnets, routing tables, gateways, etc
By default when you create a VPC it will automatically create a route table
If you choose dedicated tenancy for your VPC, any instances you create in that VPC will also be dedicated
1 subnet = 1 AZ, you cannot have subnets cross AZ
Don’t forget to add internet gateway
- 1 IGW per VPC
- Need to attach IGW after you create it
Need to create InternetRouteTable if you want VPC to communicate in/out

Default VPC vs Custom VPC

Default is user friendly, can deploy instances right away
All subnets in default VPC have an internet gateway attached
Each EC2 instance has both a public & private IP address
If you delete default VPC, you have to call AWS to get it back

VPC Peering

Connect 1 VPC to another VPC via direct network route using private IP addresses
Instances behave as if they were on the same private network
You can peer VPC’s with other AWS accounts & with other VPC’s in the same account within a single region
AWS uses the existing infrastructure of a VPC to create a VPC peering connection.
It is not a gateway or a VPN connection.
It does not rely on a separate piece of hardware
No SPoF for communication or bandwidth bottleneck
Peering is done in a star configuration. VPC A ßà VPC B ßà VPC C = A cannot talk to C unless you connect directly (no transitive peering)
Peers cannot have matching or overlapping CIDR blocks

Network Address Translation (NAT)

Allows your instances that do not have internet access the ability to access the internet via a NAT server instance
create security group
allow inbound & outbound on HTTP and HTTPS
provision NAT inside public subnet
On a NAT instance, you need to change source/destination check to disabled
Set up route on private subnet to route through NAT instance

Access Control Lists (ACLs)

A numbered list of rules (in order, lowest applies first)
Put down network access lists across the entire subnet
Over rules security groups
Acts as a basic firewall
VPC automatically comes with an ACL
When you create a new ACL, by default everything is DENY
Only one ACL per subnet, but many subnets can have the same ACL

AWS CSAA Study Notes (Part 2)

Elastic Load Balancers (ELB)

CloudWatch – Performance Monitoring Service

Auto scaling Groups

EC2 Placement Groups

Lambda

Route53 (DNS)

DNS Routing Policies:

Databases

RDS – Been around since the 70s. Database: tables, rows, fields (columns) -> think spreadsheet

DynamoDB – non-relational databases (No SQL)

ElastiCache

Redshift (data warehousing)

DMS (database migration services)

Backups, Multi-AZ & Read Replicas

Backups (2 types):

Encryption

Multi-AZ

Read Replica

DynamoDB vs RDS

DynamoDB

Redshift

Elasticache

Aurora

VPC = Think of it as a Virtual Datacenter

Default VPC vs Custom VPC

VPC Peering

Network Address Translation (NAT)

Access Control Lists (ACLs)

Trả lời Hủy