AWS CSAA Study Notes (Part 2)
- June 10, 2021
- Posted by: codestar
- Category: Uncategorized
No Comments
Elastic Load Balancers (ELB)
- ELB is never given a static IP address, just DNS name.
- ELBs can be “In Service” or “Out of Service”
- Thresholds
- Unhealthy Threshold = how many intervals with no response before flagging as Out of Service
- Healthy Threshold = how many intervals with response before flagging as In Service
- Support the following X-Forwarder headers:
- X-Forwarded-For
- X-Forwarded-Proto
- X-Forwarded-Port
CloudWatch – Performance Monitoring Service
- Standard monitoring = 5 minutes
- Turned on by default
- Detailed monitoring = 1 minute
- Monitors the hypervisor, NOT the guest OS
- Does not monitor memory
- Dashboards – create/configure widgets to monitor your environment
- Alarms – notify when a given threshold is hit
- Events – automatically respond to state changes in your AWS resources
- Logs – aggregate, monitor & store logs. Agent installed onto EC2 instances
Auto scaling Groups
- Have to have a launch configuration to have an auto scaling group
- Can create rules to spin-up and/or shut down instances based on monitor triggers
- Deleting an auto scaling group will automatically delete any instances it created
EC2 Placement Groups
- A logical grouping of instances within a single AZ.
- Can’t span AZs (duh)
- Enables applications to participate in low-latency, 10 GBps network
- Recommended for apps that benefit from low latency networks, high network throughput, or both
- Grid computing
- Hadoop clusters
- Name must be unique within your AWS account
- Only certain types of instances can be launched in a placement group
- Compute Optimized
- GPU
- Memory Optimized
- Storage Optimized
- AWS recommends homogenous instances within a placement group (size & family)
- Can’t merge placement groups
- Can’t move an existing instances into a placement group. You can create an AMI from your existing instance THEN launch a new instance from that AMI into a placement group… if you really wanted to.
Lambda
- Compute service that runs your code in response to events and it automatically manages the underlying compute resources for you
- Can automatically run code in response to events
- Modifications to objects in S3 buckets
- Messages arriving in Kinesis stream
- Table updates in DynamoDB
- API call logs created by CloudTrail
- Etc…
- A new abstraction layer – run code without worrying about infrastructure at all
- Javascript is the supported programming language
- 99.99% availability for the service and the functions it operates
- 1st 1 million requests are free, $0.20 per 1 million requests afterwards
Route53 (DNS)
- IPv6 not fully supported yet.
- Alias records work like CNAME records
- Used to map resource record sets in your hosted zone to ELB, CloudFront distributions, or S3 buckets that are configured as websites.
- Difference – a CNAME can’t be used for naked domain names (i.e. w/out “www”), you can with A record or Alias.
- Automatically recognizes changes in the record sets
- ELBs don’t have a pre-defined IPv4 address, resolved using DNS
- This can be an issue because naked domain names need an IP address.
- Hence the need for Alias records
- Given a choice, always choose an Alias record because you won’t incur additional charges (as you would with a CNAME)
DNS Routing Policies:
- Simple
- Default when you create a new record set
- Most commonly used when you have a single resource that performs a given function (i.e. 1 webserver)
- No built-in intelligence
- Weighted
- Split traffic based on weighted assignments (10% to X, 90% to Y)
- Different regions, ELBs, AZs, etc.
- Commonly used when testing a new website & you only want a small subset to see the new site
- Latency
- Route traffic based on lowest network latency for your end user
- Need to create a latency resource record set for the EC2 or ELB resource in each region you want participating.
- Great for improving global page load times
- Failover
- Used when you want to create an active/passive set up.
- Route53 will monitor health of primary site using a health check (which monitors your end points)
- Geolocation
- You choose were traffic will be sent based on location of users
- Ex. All EU users get routed to servers w/ local language and prices in Euros
Databases
RDS – Been around since the 70s. Database: tables, rows, fields (columns) -> think spreadsheet
- Read this FAQ: https://aws.amazon.com/rds/faqs/
- For OLTP
- SQL Server
- Oracle
- MySQL
- PostgreSQL
- Aurora
- MariaDB
DynamoDB – non-relational databases (No SQL)
- Database:
- Collection = Table
- Document = Row
- Key/Value pairs = Fields
ElastiCache
- web service that deploys, operates & scales an in-memory cache. Improves performance of web apps by retrieving info from RAM instead of disk.
- Supports 2 open source in-mem caching engines
- Memcached
- Redis
- Caches most consistently queried data
Redshift (data warehousing)
- OLAP
- Used for BI. Cognos, Jaspersoft, SAP Netweaver
- Used to pull in large & complex data sets. Usually used to do queries on data.
DMS (database migration services)
- Migrate your prod DB into AWS
- AWS manages all the complexities of migration like data type transformation, compression & parallel xfer
- Schema conversion tool:
- Convert source DB to a different target DB (Oracle -> Aurora, etc…)
Backups, Multi-AZ & Read Replicas
Backups (2 types):
- Automated
- Recover DB to any point in time within retention period (between 1 – 35 days)
- Point in time recovery down to a second, up to the last 5 minutes
- Enabled by default
- Backup data is stored in S3
- Free backup storage equal to size of DB
- Backups are taken within a defined window, retention period up to 35 days
- During backup, I/O suspended (typically a few minutes)
- This can be avoided if you go Multi-AZ as the backup is taken of the standby
- DB Snapshots
- Done manually (user initiated), full backup
- Stored even after you delete the original RDS instance, until you explicitly delete them
- When you restore either automated or snap, the restored version will be a new RDS instance with a new endpoint
Encryption
- At rest is supported for MySQL, Oracle, SQL, PostgreSQL & MariaDB
- Done using AWS KMS
- Once your RDS instance is encrypted at rest – underlying storage, backups, read replicas and snaps are also encrypted
- Turning on encryption for an existing instance isn’t supported… create a new encrypted instance & migrate data to it
Multi-AZ
- Primary RDS instance uses synchronous replication to an RDS in a diff AZ.
- Automatic failover, same DNS point, AWS handles replication
- Disaster Recovery only, not performance improvement
- Only in:
- SQL Server
- Oracle
- MySQL Server
- PostgreSQL
- MariaDB
Read Replica
- Uses asynchronous replication to create up to 5 read-only DB copies
- Used for performance improvement & Scaling, not DR:
- Write to prod, read from read replicas
- Must have automatic backups turned on
- You can have read replicas OF read replicas, but watch out for latency if you do this.
- Each read replica will have it’s own DNS end point.
- Cannot have read replicas that have Multi-AZ but you CAN create read replicas of Multi-AZ source DBs
- Can break replication & turn a read replica to it’s own source DB
- Only in:
- MySQL Server
- PostgreSQL
- MariaDB
DynamoDB vs RDS
- DynamoDB offers “push button” scaling -> scale DB on the fly with no downtime
- RDS isn’t as easy -> usually need to create bigger instance size manually or add a read replica
DynamoDB
- Fast, flexible NoSQL DB service.
- Used for apps that need consistent, single-digit millisecond latency at any scale
- Fully managed & supports document and key/value data models
- Stored on SSD storage
- Spread across 3 “geographically distinct” data centers
- Multiple consistency models:
- Eventually consistent reads (default)
- Consistency usually reached within 1 second (best read performance)
- Strongly consistent reads
- Returns a result that reflects all writes that got a successful response prior to the read
- Use this if your app needs data back immediately & in less than 1 second.
Redshift
- Fast (10 times faster), fully managed petabyte-scale data warehouse service
- Can start small for $0.25 per hour with no commitments & scale up to PB or more for $1,000 per TB per year.
- OLAP transactions
- Data warehousing DBs us diff type of architecture from both a DB perspective & infrastructure layer.
- 2 Configurations:
- Single node (160Gb)
- Multi-node
- Leader Node (manages client connections and receives queries)
- Compute Node (store data & perform queries and computations). Up to 128 Compute Nodes
- Columnar Data Storage – instead of rows, redshift organizes data by column
- Only columns involved in the queries are processed
- Columnar data is stored sequentially on the storage media
- Block size of 1MB for columnar storage
- Therefore requires far fewer I/Os, greatly improving performance
- Advanced Compression
- Columnar data can be compressed much better than row based data
- Redshift automatically samples data & chooses the best compression scheme
- Massively Parallel Processing (MPP):
- Automatically distributes data & query load across all nodes & newly added nodes
- Pricing:
- Compute Node Hours
- 1 unit per node per hour
- Backup
- Data Transfer
- Compute Node Hours
- Security
- Encrypted in transit using SSL
- At rest using AES-256
- By default RedShift does it’s own key mgmt.
- Can manage keys through HSM (hardware security modules) or KMS if you want
- Only available in 1 AZ
- Can restore snaps to new AZs in the event of an outage
- Good choice if mgmt. runs lots of OLAP transactions & it’s stressing the DB
- Think Business Intelligence (BI)
Elasticache
- Caches things – if your app is constantly going to a DB to pull the same data over and over, you can cache it for faster performance
- Used to improve latency and throughput for read-heavy app workloads (social networks, gaming, media sharing) or compute heavy workloads (recommendation engine)
- Improves application performance by storing critical pieces of data in mem for low-latency access.
- Types of elasticache
- Memcached
- Widely adopted mem object caching system.
- Redis
- Open source in-mem key/value store.
- Supports master/slave replication & multi-AZ to achieve cross AZ redundancy
- Memcached
- Good choice if your DB is read heavy & not prone to frequent changing
Aurora
- MySQL compatible RDS DB engine
- Speed & availability of commercial DBs
- Simplicity & cost-effectiveness of open source DBs
- 5x better performance than MySQL @ 1/10th the price of commercial DB w/ similar performance & availability
- Big challenge to Oracle
- Scaling capabilities:
- Start w/ 10Gb, scales in 10Gb increments up to 64Tb
- Compute scales up to 32vCPUs & 244Gb of mem
- 2 copies of DB in each AZ w/ a min of 3 AZs (6 copies of data)
- Can handle loss of 2 copies w/out affecting write availability
- Can handle loss of 3 copies w/out affecting read availability
- Storage is self-healing. Blocks & disks are constantly scanned & repaired
- Replica features:
- Aurora Replicas (currently 15)
- MySQL read replicas (currently 5)
VPC = Think of it as a Virtual Datacenter
- By default you are allowed 5 VPCs per region
- Logically isolated section of AWS where you can launch AWS resources in a virtual network of your own definition
- You control the network environment: IP address range, subnets, routing tables, gateways, etc
- By default when you create a VPC it will automatically create a route table
- If you choose dedicated tenancy for your VPC, any instances you create in that VPC will also be dedicated
- 1 subnet = 1 AZ, you cannot have subnets cross AZ
- Don’t forget to add internet gateway
- 1 IGW per VPC
- Need to attach IGW after you create it
- Need to create InternetRouteTable if you want VPC to communicate in/out
Default VPC vs Custom VPC
- Default is user friendly, can deploy instances right away
- All subnets in default VPC have an internet gateway attached
- Each EC2 instance has both a public & private IP address
- If you delete default VPC, you have to call AWS to get it back
VPC Peering
- Connect 1 VPC to another VPC via direct network route using private IP addresses
- Instances behave as if they were on the same private network
- You can peer VPC’s with other AWS accounts & with other VPC’s in the same account within a single region
- AWS uses the existing infrastructure of a VPC to create a VPC peering connection.
- It is not a gateway or a VPN connection.
- It does not rely on a separate piece of hardware
- No SPoF for communication or bandwidth bottleneck
- Peering is done in a star configuration. VPC A ßà VPC B ßà VPC C = A cannot talk to C unless you connect directly (no transitive peering)
- Peers cannot have matching or overlapping CIDR blocks
Network Address Translation (NAT)
- Allows your instances that do not have internet access the ability to access the internet via a NAT server instance
- create security group
- allow inbound & outbound on HTTP and HTTPS
- provision NAT inside public subnet
- On a NAT instance, you need to change source/destination check to disabled
- Set up route on private subnet to route through NAT instance
Access Control Lists (ACLs)
- A numbered list of rules (in order, lowest applies first)
- Put down network access lists across the entire subnet
- Over rules security groups
- Acts as a basic firewall
- VPC automatically comes with an ACL
- When you create a new ACL, by default everything is DENY
- Only one ACL per subnet, but many subnets can have the same ACL