


In Amazon Kinesis, shards and partitions are terms related to the way data is distributed and processed in Kinesis streams, but they refer to different concepts:
Shard:
A shard is the basic unit of capacity within an Amazon Kinesis stream. It acts as a container for the stream’s data and is responsible for:
- Storing the data records.
- Controlling the throughput (read and write operations) of the stream.
- Determining the partition key space for data records.
Each shard provides a fixed capacity:
- Write throughput: 1 MB per second or 1000 records per second.
- Read throughput: 2 MB per second for all consumers.
Shards are crucial because they determine the overall throughput capacity of the stream. If you need more throughput, you can increase the number of shards by shard splitting (for scaling up) or merging shards (for scaling down).
Partition:
A partition is the logical division of data within a shard, often determined by the partition key that is associated with each data record. A partition key is used to map data records to specific shards. Kinesis automatically distributes the records across the shards based on the partition key, and each record within a shard belongs to a specific partition.
- Partition Key: A string that is assigned to each data record. This key is hashed to determine which shard the record will go into.
- The partition key ensures that related data records are processed together, which can be important for certain use cases (e.g., processing events for the same customer or session).
Key Differences:
- Shards are physical units that provide throughput capacity and are the primary components of the stream’s data infrastructure.
- Partitions are logical groupings of data within a shard, determined by the partition key.
Summary:
- Shard is a unit of capacity that defines throughput limits and is used to scale the stream.
- Partition is a logical concept where records are grouped by their partition key, which determines where the record will go in a shard.
If you’re looking to scale your stream, you work with shards. If you’re managing or querying data within a shard, you’re working with partitions (implicitly, through partition keys).