Monday, February 27, 2017

Understanding the Cassandra Partitioner


Partitioning Data with Hashing


Cassandra uses a Partitioner to distribute data across all the nodes in a Cassandra cluster. When a row of data is written or read, the partitioner calculates the hash value of the partition key. This hash value is called a Token and is mapped to a node which owns that token value. Each node in a cluster is configured to own a primary token range unique to the cluster. Therefore, once the hash or token value has been calculated, we can determine which node the data belongs to.

In Cassandra, Replication Factor (RF) is typically greater than one, so replicas of the data are stored on multiple nodes. For the partitioner, this simply means finding the token owner and then distributing replicas among the adjacent token range owners.

A well partitioned result distributes data evenly and randomly across the nodes in a cluster. The application developer influences evenness through the definition of the partition key.  Ensuring randomness requires a good hashing algorithm.

Hashing


Cassandra 1.0 used the classic 128-bit MD5 hashing algorithm for partitioning. MD5 was designed as a cryptographic hash function such that hash values must be random, evenly distributed, and in addition it must be hard or impossible to guess the original value from a hash value.

Beginning with Cassandra 1.2, Murmur3, a faster, non-cryptographic hash function, replaced MD5 as the default partition hash. The name was coined from the machine language operations multiply (MU) and rotate (R).

Murmur3Partitioner is now the default and should be used for all new clusters. It remains an option to configure the original RandomPartitioner using MD5 for compatibility with older clusters.

The published Murmur3 hash algorithm provides three versions optimized for different platforms. Murmur3Partitioner in Cassandra uses the x64 128-bit version, but it truncates the result and uses only the upper 64 bits of the hash value. The token range for Murmur3 in Cassandra is therefore -263 to +263-1.

The Murmur3Partitioner.java source code in Cassandra 3.x creates a 128 bit Murmur3 hash with the x64_128 algorithm.  The first 64 bits in long[0] are returned as the token value and the lower 64 bits in long[1] are ignored.

    private LongToken getToken(ByteBuffer key, long[] hash)
    {
        if (key.remaining() == 0)
            return MINIMUM;

        return new LongToken(normalize(hash[0]));
    }

    private long[] getHash(ByteBuffer key)
    {
        long[] hash = new long[2];
        MurmurHash.hash3_x64_128(key, key.position(), 
            key.remaining(), 0, hash);
        return hash;

    }


Hashing Examples

Enter a string to be hashed:


128 bits:

High 64 bits: as signed integer:
Low 64 bits: as signed integer:

Choosing Token Ranges


Token ranges are configured in cassandra.yaml by one of the following:
  • Manually by assigning each node a starting range
  • Automatically by enabling vnodes
VNodes are administratively easier and are recommended for most new environments. The default number of vnodes is 128, which creates 128 token ranges per node. For DSE clusters using Search the recommended number of vnodes is lower, with recommended values of16 or 32. The primary advantages of vnodes are:


  • When adding or removing nodes from a cluster, manual rebalancing is not required. 
  • Faster recovery from node failures or removal.  With vnodes, rebuilding can stream data from all online nodes.  This compares with manual tokens which will read from at most four nodes adjacent to the node being replaced. Especially with larger clusters, this can be an important factor providing operational agility.
You must not mix vnodes and manual tokens within a single data center.

Your choice of a partitioner and token range scheme determines where the data resides in a cluster. Changes to either of these on a production cluster is operationally difficult and may require migrating all of your data.



Calculating Manual Token Ranges

When configuring token ranges manually, it helps to use a token range calculator. 

Enter the number of nodes below, and it will calculate the starting token offsets beginning with zero for the Murmur3Partitioner.


Token Calculator

Number of nodes:

Initial Tokens:



Murmur3 Partitioning Example



As a practical example with real data, we can use the Murmur3 partitioner with the twelve zodiac signs to hash their values and observe how they would be partitioned if used as keys across a Cassandra cluster. I’ve chosen six nodes because it is a fairly common cluster size and an even divisor for the number of keys. A perfectly even distribution would provide two key values per node.

Step One: Calculate token ranges using the calculator above.  Using n=6 we have:


Node   Starting Range
1      0
2      3074457345618258602
3      6148914691236517204
4      9223372036854775806
5      -3074457345618258000
6      -6148914691236518000

Step Two: Calculate the hash value for each of the twelve zodiac sign names and match the hash with a node's token range:


Key Value    Murmur3 Hash         Node
Aries        6446536566984288488  3
Taurus       4155751160254564535  2
Gemini       1317029125904582964  1
Cancer      -8016596991533194765  6
Leo         -8583032252751962986  3
Virgo       -8041781948673145583  6
Libra       -2142727802591540075  4
Scorpio     -5744609807935173055  5
Sagittarius -0816785684867175026  1
Capricorn   -6957124044486481194  6
Aquarius    -3903387275638502447  5
Pisces       7634852637572685346  3

Result:  The result is fairly even for a random partitioner; two nodes have three values, two nodes have two values, and two nodes have one value. 



Verify Hashing in Cassandra


We can now verify the Murmur3 hash values calculated above match exactly with what is seen in Cassandra.


CREATE TABLE test.zodiac (
    sign text,
    body text,
    PRIMARY KEY (sign)

);

cqlsh> insert into zodiac (sign, body) values ('Aries', 'Mars');
cqlsh> insert into zodiac (sign, body) values ('Taurus', 'Earth');
cqlsh> insert into zodiac (sign, body) values ('Gemini', 'Mercury');
cqlsh> insert into zodiac (sign, body) values ('Cancer', 'Moon');
cqlsh> insert into zodiac (sign, body) values ('Leo', 'Sun');
cqlsh> insert into zodiac (sign, body) values ('Virgo', 'Mercury');
cqlsh> insert into zodiac (sign, body) values ('Libra', 'Venus');
cqlsh> insert into zodiac (sign, body) values ('Scorpio', 'Pluto');
cqlsh> insert into zodiac (sign, body) values ('Sagittarius', 'Jupiter');
cqlsh> insert into zodiac (sign, body) values ('Capricorn', 'Saturn');
cqlsh> insert into zodiac (sign, body) values ('Aquarius', 'Uranus');
cqlsh> insert into zodiac (sign, body) values ('Pisces', 'Neptune');


cqlsh> select sign, token(sign) from zodiac;

 sign        | system.token(sign)
-------------+----------------------
         Leo | -8583032252751962986
       Virgo | -8041781948673145583
      Cancer | -8016596991533194765
   Capricorn | -6957124044486481194
     Scorpio | -5744609807935173055
    Aquarius | -3903387275638502447
       Libra | -2142727802591540075
 Sagittarius-816785684867175026
      Gemini1721847210301305769
      Taurus4155751160254564535
       Aries6446536566984288488 
      Pisces7634852637572685346


(12 rows)