Hashing and Consistent Hashing : Simplifying Data Management and Distribution

Jaspreet Singh Sodhi
4 min readJul 15, 2024

--

Consitent Hashing

In the world of computer science, hashing is a common technique used to manage and store data efficiently. Let’s dive into what hashing is and why consistent hashing is so important.

What is Hashing?

Hashing is like a magical way to quickly find and store data. Think of it as putting your keys in a specific drawer so you can find them easily later.

Traditional Hashing

How it works:

  1. Input Data: You have some data (like a key).
  2. Hash Function: This data goes through a special function called a hash function.
  3. Hash Value: The hash function converts the input data into a unique number called a hash value.
  4. Storage: This hash value tells you exactly where to store or find your data in a database or memory.

Example: Imagine you have a list of names, and you want to store them in a way that you can quickly find them later. Using hashing, the name “Alice” might be turned into a number like 123. This number tells you where “Alice” is stored.

Use Cases of Hashing:

  • Fast Data Retrieval: Hashing helps you find data quickly without searching through everything.
  • Databases: Used to index and retrieve data efficiently.
  • Password Storage: Hashing stores passwords securely by converting them into hash values.

In a distributed system, the number of server nodes often changes due to scaling or unexpected failures. We can’t assume the number of nodes will always stay the same. If a node fails with a basic hashing approach, we’d need to rehash all the data because the mapping of keys depends on the number of nodes and their positions.

Issues in Hashing when no. of available locations/server nodes changes

Here comes the concept of Consistent hashing :

What is Consistent Hashing?

Consistent Hashing is a technique used in distributed systems to efficiently distribute data across multiple servers. It ensures that the system can handle changes like adding or removing servers without needing to reassign all the data.

Sure! Here’s an easy-to-understand explanation of consistent hashing:

What is Consistent Hashing?

Consistent Hashing is a technique used in distributed systems to efficiently distribute data across multiple servers. It ensures that the system can handle changes like adding or removing servers without needing to reassign all the data.

How Consistent Hashing Works:

  1. Hash Ring: Imagine a big circle (the hash ring) where both servers and data requests are placed. Each point on the circle is a possible position for a server or a data request.

2. Placing Servers on the Ring :

  • Servers are assigned positions on the circle using a hash function, which generates a unique number (position) for each server.
  • These positions are random and spread around the circle.

3. Placing Data Requests on the Ring : Data requests (like user queries or stored data) are also assigned positions on the circle using the same hash function.

4. Assigning Requests to Servers :

  • To determine which server handles a request, start at the request’s position and move clockwise around the circle.
  • The first server you encounter in this direction will handle the request.
  • If the request is beyond the last server’s position on the circle, it wraps around to the first server.
Server A,B,C & Request R1 on the HashRing
  • Server A is at position 10.
  • Server B is at position 50.
  • Server C is at position 80.
  • Request R1 is at position 5.
  • Request R1 is assigned to server A (the first server clockwise from R1’s position).

Handling Server Failures:

1. If a Server Fails:

  • Suppose server B (at position 50) fails.
  • Any requests that server B was handling will now go to the next server clockwise, which is server C.

2. Minimal Data Movement:

  • Only the data handled by the failed server (B) needs to be reassigned.
  • The rest of the data on the ring remains unaffected.

Why is Consistent Hashing Useful?

  • Scalability: Easily add or remove servers with minimal disruption.
  • Efficiency: Only a small portion of data is moved when servers are added or removed.
  • Reliability: System continues to function smoothly even if servers fail.

Conclusion

Consistent hashing is like a smart way of organizing a distributed system so that changes in the number of servers cause minimal disruption. It places both servers and data requests on a circle, ensuring that data is evenly distributed and easily reassigned when needed.

I hope this explanation helps you understand consistent hashing!

That’s it!. Feel free to follow me and share your thoughts on what else I can improve.

See you in the next part! 😊

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Jaspreet Singh Sodhi
Jaspreet Singh Sodhi

Written by Jaspreet Singh Sodhi

Full Stack Software Engineer | Curating Top-Notch Content @jaspreet.dev on Instagram ✨

No responses yet

Write a response