What is a HashMap?
A HashMap is composed of two main components: an array data structure and a hash function.
Hash Function
A hash function is a process that takes a value (plaintext) as input and outputs another value (ciphertext or hashed text). The transformation is irreversible, meaning once a value is hashed, it cannot be reverted to its original form.
Key properties of a hash function:
- Irreversibility: Hashing is not reversible, but for equality checks, you can rehash the plaintext and compare the resulting hashes.
- Consistency: A given input will always produce the same hashed output.
An ideal hash function avoids collisions, where different inputs produce the same hash output.
Mapping Indexes
To map the hash function’s output to an array index, we use the modulus
operator. For example, if the array size is 10, applying the modulus operator to the hash value using 10 ensures that the result stays within the bounds of the array.
However, the modulus operator can lead to collisions. For instance, if your hash code is 9 and the array size is 3, applying the modulus gives an output of 0, which is fine. But if another element hashes to 6, its modulus with 3 also results in 0, hence a collision occurs.
Further Reading:
Some Ideas:
- Load Balancing and Hashing
- JWT Tokens and Hashing
- Passwords
Notes from Primegen
- Load Factor: The ratio of data points to the amount of storage. For example, if we have 7 items in our map and 10 available units for storing, the load factor is 7/10 or
0.7
.
We need a hash function that will take a key and return a number. This number can be large or small, with no specific requirements. We then take this number and apply the modulus operator with the length of our array. The resulting output is the index where we store our value.
If a collision occurs, it is typically handled in modern implementations by using an ArrayList at each array location. In the event of a collision, new values are appended to the ArrayList associated with the respective array index.
Keys are also stored in the ArrayList to facilitate searches during retrieval by matching keys in the ArrayList slot.
The ideal load factor is 0.7
. Once this load factor is reached, we need to double the storage capacity.
The more collisions occur, the less efficient the map becomes in achieving O(1) complexity. However, as long as collisions don’t cluster at a single data point, the complexity will remain O(1).