Skip to content

AWS Technologies Blog

Menu
  • Home
  • KB
  • Services
  • Resources
  • Posts
  • Find
    • Categories
    • Tags
  • About
Menu

Big Data Frameworks

Posted on January 28, 2025 by wpadmin
FrameworkDescriptionUse CasesStrengths
Apache HadoopA distributed framework for storing and processing large datasets using the Hadoop Distributed File System (HDFS) and the MapReduce programming model.Batch processing, ETL, data storage.– Scalable and reliable storage (HDFS).
– Proven technology.
Apache SparkA unified analytics engine for large-scale data processing, offering in-memory computing and support for batch, streaming, and ML workloads.Streaming analytics, machine learning, graph processing, batch ETL.– In-memory processing for speed.
– Broad use case support.
Apache HBaseA distributed, NoSQL database built on HDFS for real-time, random read/write access to large datasets.Real-time applications, time-series data, IoT data storage.– Low-latency reads/writes.
– Scales horizontally.
Apache FlinkA framework for real-time stream processing and distributed batch processing, with low-latency and high-throughput capabilities.Real-time analytics, event processing, streaming ETL.– True real-time processing.
– Stateful stream management.
PrestoA distributed SQL query engine designed for fast, interactive queries across large datasets, optimized for analytics over heterogeneous data sources.Interactive SQL querying, federated queries, analytics on data lakes.– High performance for SQL.
– Query federation.
HiveA data warehouse tool built on Hadoop, providing SQL-like query capabilities (HiveQL) for processing and analyzing structured datasets.Data warehousing, batch analytics, schema-on-read processing.– Familiar SQL-like interface.
– Integrates with Hadoop.

  • Product List
  • Documentation

billing ciem containers cost cspm ebs ec2 ecs edge eks elb event Firewall fsx hybrid iam lambda NACL outpostd policies pop princing rds route53 s3 security serverless services SG siem storage vpc

  • Amazon FSx
  • aws
  • aws notes
  • billing
  • cloud
  • compute
  • containers
  • core
  • databases
  • development
  • ebs
  • ec2
  • ecs
  • edge
  • efs
  • eks
  • hybrid
  • iam
  • lambda
  • network
  • outposts
  • pricing
  • rds
  • route53
  • s3
  • security
  • serverless
  • services
  • storage
  • support
  • vpc
©2025 AWS Technologies Blog | Built using WordPress and Responsive Blogily theme by Superb