Understanding the Cassandra Partitioner
Partitioning Data with Hashing
Cassandra uses a Partitioner to distribute data across all the nodes in a Cassandra cluster. When a row of data is written or read, the partitioner calculates the hash value of the partition key. This hash value is called a Token and is mapped to a node which owns that token value. Each node in a cluster is configured to own a primary token range unique to the cluster. Therefore, once the hash or token value has been calculated, we can determine which node the data belongs to.
In Cassandra, Replication Factor (RF) is typically greater than one, so replicas of the data are stored on multiple nodes. For the partitioner, this simply means finding the token owner and then distributing replicas among the adjacent token range owners.
A well partitioned result distributes data evenly and randomly across the nodes in a cluster. The application developer influences evenness through the definition of the partition key. Ensuring randomness requires a good hashing algorithm.
Hashing
Cassandra 1.0 used the classic 128-bit MD5 hashing algorithm for partitioning. MD5 was designed as a cryptographic hash function such that hash values must be random, evenly distributed, and in addition it must be hard or impossible to guess the original value from a hash value.
Beginning with Cassandra 1.2, Murmur3, a faster, non-cryptographic hash function, replaced MD5 as the default partition hash. The name was coined from the machine language operations multiply (MU) and rotate (R).
Murmur3Partitioner is now the default and should be used for all new clusters. It remains an option to configure the original RandomPartitioner using MD5 for compatibility with older clusters.
The published Murmur3 hash algorithm provides three versions optimized for different platforms. Murmur3Partitioner in Cassandra uses the x64 128-bit version, but it truncates the result and uses only the upper 64 bits of the hash value. The token range for Murmur3 in Cassandra is therefore -263 to +263-1.
The Murmur3Partitioner.java source code in Cassandra 3.x creates a 128 bit Murmur3 hash with the x64_128 algorithm. The first 64 bits in long[0] are returned as the token value and the lower 64 bits in long[1] are ignored.
private LongToken getToken(ByteBuffer key, long[] hash)
{
if (key.remaining() == 0)
return MINIMUM;
return new LongToken(normalize(hash[0]));
}
private long[] getHash(ByteBuffer key)
{
long[] hash = new long[2];
MurmurHash.hash3_x64_128(key, key.position(),
key.remaining(), 0, hash);
return hash;
}
Hashing Examples
Enter a string to be hashed:128 bits:
High 64 bits: as signed integer:
Low 64 bits: as signed integer:
Choosing Token Ranges
- Manually by assigning each node a starting range
- Automatically by enabling vnodes
- When adding or removing nodes from a cluster, manual rebalancing is not required.
- Faster recovery from node failures or removal. With vnodes, rebuilding can stream data from all online nodes. This compares with manual tokens which will read from at most four nodes adjacent to the node being replaced. Especially with larger clusters, this can be an important factor providing operational agility.
Your choice of a partitioner and token range scheme determines where the data resides in a cluster. Changes to either of these on a production cluster is operationally difficult and may require migrating all of your data.
Calculating Manual Token Ranges
When configuring token ranges manually, it helps to use a token range calculator.Enter the number of nodes below, and it will calculate the starting token offsets beginning with zero for the Murmur3Partitioner.
Token Calculator
Number of nodes:Initial Tokens:
Murmur3 Partitioning Example
As a practical example with real data, we can use the Murmur3 partitioner with the twelve zodiac signs to hash their values and observe how they would be partitioned if used as keys across a Cassandra cluster. I’ve chosen six nodes because it is a fairly common cluster size and an even divisor for the number of keys. A perfectly even distribution would provide two key values per node.
Step One: Calculate token ranges using the calculator above. Using n=6 we have:
Node Starting Range
1 0
2 3074457345618258602
3 6148914691236517204
4 9223372036854775806
5 -3074457345618258000
6 -6148914691236518000
Step Two: Calculate the hash value for each of the twelve zodiac sign names and match the hash with a node's token range:
Key Value Murmur3 Hash Node
Aries 6446536566984288488 3
Taurus 4155751160254564535 2
Gemini 1317029125904582964 1
Cancer -8016596991533194765 6
Leo -8583032252751962986 3
Virgo -8041781948673145583 6
Libra -2142727802591540075 4
Scorpio -5744609807935173055 5
Sagittarius -0816785684867175026 1
Capricorn -6957124044486481194 6
Aquarius -3903387275638502447 5
Pisces 7634852637572685346 3
Result: The result is fairly even for a random partitioner; two nodes have three values, two nodes have two values, and two nodes have one value.
We can now verify the Murmur3 hash values calculated above match exactly with what is seen in Cassandra.
Verify Hashing in Cassandra
We can now verify the Murmur3 hash values calculated above match exactly with what is seen in Cassandra.
CREATE TABLE test.zodiac (
sign text,
body text,
PRIMARY KEY (sign)
);
cqlsh> insert into zodiac (sign, body) values ('Aries', 'Mars');
cqlsh> insert into zodiac (sign, body) values ('Taurus', 'Earth');
cqlsh> insert into zodiac (sign, body) values ('Gemini', 'Mercury');
cqlsh> insert into zodiac (sign, body) values ('Cancer', 'Moon');
cqlsh> insert into zodiac (sign, body) values ('Leo', 'Sun');
cqlsh> insert into zodiac (sign, body) values ('Virgo', 'Mercury');
cqlsh> insert into zodiac (sign, body) values ('Libra', 'Venus');
cqlsh> insert into zodiac (sign, body) values ('Scorpio', 'Pluto');
cqlsh> insert into zodiac (sign, body) values ('Sagittarius', 'Jupiter');
cqlsh> insert into zodiac (sign, body) values ('Capricorn', 'Saturn');
cqlsh> insert into zodiac (sign, body) values ('Aquarius', 'Uranus');
cqlsh> insert into zodiac (sign, body) values ('Pisces', 'Neptune');
cqlsh> select sign, token(sign) from zodiac;
sign | system.token(sign)
-------------+----------------------
Leo | -8583032252751962986
Virgo | -8041781948673145583
Cancer | -8016596991533194765
Capricorn | -6957124044486481194
Scorpio | -5744609807935173055
Aquarius | -3903387275638502447
Libra | -2142727802591540075
Sagittarius | -816785684867175026
Sagittarius | -816785684867175026
Gemini | 1721847210301305769
Taurus | 4155751160254564535
Aries | 6446536566984288488
Pisces | 7634852637572685346
(12 rows)
No comments:
Post a Comment