AWS Glue is a fully managed ETL (Extract, Transform, and Load) service provided by Amazon Web Services. It is designed to prepare and transform data for analytics and machine learning workflows by automating the processes of data discovery, cataloging, and preparation. Key Features of AWS Glue ETL (Extract, Transform, Load): Build, manage, and run ETL…
Category: aws
Analytics services
AWS Data Lake – Centralized repository that allows to store structured semi-structured and strutured data at any scale. Amazon Redshift – cloud-based data warehousing service from AWS that is designed to handle large-scale data analytics and queries efficiently.
AWS Lake Formation
AWS Lake Formation is a managed service that simplifies the process of creating, managing, and securing a data lake on AWS. It streamlines the tasks of ingesting, cataloging, securing, and preparing data, allowing you to focus on gaining insights from your data instead of managing the infrastructure. Simplified Data Ingestion: Easily ingest data from various…
AWS Data Lake
AWS Data Lake is a centralized repository that allows you to store and manage structured, semi-structured, and unstructured data at any scale. It enables you to store raw data as-is and process it later for analytics, machine learning, or other use cases. AWS provides a suite of services to build and manage data lakes efficiently,…
Amazon Redshift
Amazon Redshift is a cloud-based data warehousing service from AWS that is designed to handle large-scale data analytics and queries efficiently. It enables organizations to perform complex analytical queries on massive datasets quickly and cost-effectively.
Redshift Architecture
Components 1. Leader Node 2. Compute Nodes 3. Node Slices 4. Cluster 5. Network Layer Data Distribution and Processing Data Distribution: Data is distributed across compute nodes and slices based on the distribution style: Massively Parallel Processing (MPP): Queries are split into smaller tasks and distributed to compute nodes. Each node processes its portion of…
DynamoDB Comparisons
DynamoDB DAX vs Global Tables Feature DynamoDB DAX DynamoDB Global Tables Purpose In-memory caching for low-latency reads Multi-region replication for low-latency access globally Performance Focus Speeds up read-heavy workloads Ensures low-latency access across regions Latency Microseconds for cached reads Milliseconds (based on network latency and consistency) Data Replication No replication (caches only in-memory near application)…
Amazon QLDB
Amazon QLDB is a fully managed, serverless ledger database service offered by AWS. It provides a transparent, immutable, and cryptographically verifiable transaction log. QLDB is designed for use cases where there is a need to maintain a reliable and trusted record of all changes to data over time, such as in financial transactions, supply chains,…
Amazon Neptune
Amazon Neptune is a fully managed graph database service by AWS designed to work with highly connected datasets. It supports graph models such as property graphs and RDF (Resource Description Framework), making it ideal for use cases that require complex, highly interrelated data, such as social networks, recommendation engines, fraud detection, and knowledge graphs. Supports…
Amazon Timestream
Amazon Timestream is a fully managed, serverless, and purpose-built time series database service offered by AWS. It is designed to handle trillions of time-stamped data points per day with minimal operational overhead. Time series databases are particularly well-suited for workloads such as IoT data, application monitoring, DevOps, and industrial telemetry, where data arrives in a…