How To Use Cache Sharding: Scale Out Neo4j


Neo4j is a high availability graph database that support ACID properties for transactions.

As per the CAP theorem, any system can provide only two of the properties from Consistency, Availability and Partition tolerance.

Just like relational databases, Neo4j has also chosen the Consistency and Availability options.

Consistency in Neo4j is guaranteed by supporting ACID properties to transaction.

Availability is achieved using high availability clustering that is supported by a master slave architecture with one master for all write operations and many slaves for read operations.

The architecture of Neo4j is fairly scalable, however it can only scale UP. For most of cases adding more nodes to the Neo4j cluster will be sufficient however if you have a really huge dataset adding node may not scale your system linearly.

In case you are looking for a scaling OUT option you may need to use your own technique to make best use of Neo4j resources. I recently attended GraphConnect developer tutorials session. I was fortunate enough to meet with the creators of Neo4j and discuss some of the options with them. One of the experts from Neo4j at GraphConnect suggested the option of Cache Sharding that many developers liked and wanted to get more understanding on it. I am trying to provide one example where we can use cache sharding and almost linearly scale a Neo4j Graph Database.

In fact, this technique of cache sharding is not really new to Neo4j. It has been used a lot with relational and other data storage as well.

Continue Reading