What is Hashing?
Concept of building a data structure that can be searched in O(l) time is called Hashing. Hashing is the process of converting a given key into another value.
What is Hash Table
A Hash Table is a collection of items which are stored in such a way as to make it easy to find them later. This is a hash table with 4 slots:
What is Slot?
Each position of the hash table is called a slot. A slot can hold an item and is named by an integer value starting at 0.
What is Hash Function?
A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called hash values, hash codes, digests, or simply hashes. The values are used to index a hash table.
What is Perfect Hash Function?
In computer science, a perfect hash function for a set S is a hash function that maps distinct elements in S to a set of integers, with no collisions. In mathematical terms, it is an injective function. Perfect hash functions may be used to implement a lookup table with constant worst-case access time.
In other words, A hash function that maps each item into a unique slot is called a Perfect Hash Function.
What is Minimal Perfect Hash Function?
A minimal perfect hash function is a perfect hash function that maps n keys to n consecutive integers – usually the numbers from 0 to n − 1 or from 1 to n. In minimal perfect hash function there is no free or empty slot. The best currently known minimal perfect hashing schemes can be represented using less than 1.56 bits/key if given enough time.
What is Load Factor?
In a Hash Table ratio of occupied slots and table size is called Load Factor. It is denoted by λ. The Load factor is a measure that decides when to increase the Hash Map capacity to maintain the get() and put() operation complexity of O(1). The default load factor of Hash Map is 0.75f (75% of the map size).
What is Collision?
If we find that we need to insert two or more different items in the same slot then that is called Collision.
Widely, in computer science, a collision or clash is a situation that occurs when two distinct pieces of data have the same hash value, checksum, fingerprint, or cryptographic digest. The impact of collisions depends on the application.
Definition of Collision Resolution
Sometime two items hash to the same slot, there the process of placing the second item in the hash table is called Collision Resolution. There are some popular ways for Collision Resolution.
Collision Resolution Processes
- Linear Probing
- Open Addressing
- Plus 3 Probe
- Quadratic Probing
The process of looking another slot skipping one after collision is called Rehashing. That means if there is a collision at slot 3 we will try to put the second element to slot 4 in Rehashing.
In this case we will assign “Sandra Dee” to slot 03 as Rehashing.
In open addressing collision resolution process we perform an open addressing technique where if any collision occur we assign it to the first free slot of the hash table. This is called Linear Probing.
Linear probing is a scheme in computer programming for resolving collisions in hash tables, data structures for maintaining a collection of key–value pairs and looking up the value associated with a given key.
In the above image we will assign “Sandra Dee” to slot 00 as Linear Probing.
Open Addressing Process is a method for collision resolution. In this method first we find what item is making problem. After finding that we start search from the slot it should be placed. If we find any free slot then we place the item there.
A disadvantage of open addressing is the tendency of clustering. This means if many collisions occur at the same hash value, a number of surrounding slots will be filled by the linear probing resolution. This will have an impact on other items that are being inserted.
Rehashing & Linear Probing are most common Open Addressing collision resolution method.
Plus 3 Probe
One way to deal with clustering we skip slots, thereby more evenly distributing the items that have caused collisions. This is potentially reduce the clustering. Tis process is called Plus 3 Prob. This means that once a collision occurs, we will look at every third slot until we find one that is empty.
In the above example, “Sandra Dee” will be assigned to slot 05 as Plus 3 Probe.
Quadratic probing is an open addressing scheme in computer programming for resolving hash collisions in hash tables. Quadratic probing operates by taking the original hash index and adding successive values of an arbitrary quadratic polynomial until an open slot is found.
A variation of the linear probing idea is called quadratic probing. In this process instead of using a constant “skip” value, we use rehash function that increments the hash value by 1, 4, 9, 16 and so on. This means that if the first hash become h then the successive are h+1, h+4, h+9 and so on. We can skip slots using different polynomials also.
Another method for collision resolution allows each slot to hold a reference to a collection of Items. This is called Chaining. Chaining allows many items to exist at the same location in the hash table. When collision happen, the item takes placed in the proper slot in chain. As more and more the item hash to the same slot the difficulty of the collection increases.
Methods of Constructing Hash Table
- Mid Square Method
- Folding Method
- Reminder Method
Mid Square Method
In Mid-Square Method we first square the item and then extract mid portion of the resulting number and finally perform the reminder step. For Example: If the item is 44. First we calculate 442=1936. Then we will take a part of 1936. If we extract middle two digit we find 93. After performing reminder step we get 93%11=5. This will be the hash number.
Find hash value for the following items for table size 11 using Mid Square Method:
542 = 2916
262 = 676
932 = 8649
172 = 289
772 = 5929
312 = 961
If we take the mid digits (except one in both side) we get 91, 7, 64, 8, 92, 6.
After reminder step (number/size of table) we get 3, 7, 9, 8, 4, 6. Hence the Hash Values are 3, 7, 9, 8, 4, 6 and Hash Table is:
The Folding method for constructing hash function begins by dividing the item into equal size pieces. Then they are added together and being divided by the total number of slots in the hash table. For example: If we consider a phone number 436-555-4601 it will be divided as 43,65,55,46,01. Then adding these we get 210. If size of hash table be 11 then 210%11=1. Hence it will be in slot 1.
Find hash value for the following items for table size 11 using Folding Method:
54 = 5 + 4 = 9
26 = 2 + 6 = 8
93 = 9 + 3 = 13 & 13%11 = 2
17 = 1 + 7 = 8
77 = 7 + 7 = 14 & 14%11 = 3
31 = 3 + 1 = 4
Taking all reminder we get 9, 8, 2, 8, 3, 4.
Hence the Hash Values are 9, 8, 2, 8, 3, 4 and Hash Table is (Including Linear Probing):
The Reminder Method will take any item from the collection, divide by number of slots m and return the reminder in the range if slot names between 0 to m-1. Assume that we have the set of integer items 54, 26, 93, 17, 77, 31. In ‘reminder method’ simply taken an item and divided it by m and the hash function return hash value (h(item) = item % m). Where m is the size of table.
Find hash value for the following items for table size 11 using Reminder Method:
54%11 = 10
26%11 = 4
93%11 = 5
17%11 = 6
77%11 = 0
31%11 = 9
Taking all reminder we get 10, 4, 5, 6, 0, 9.
Hence the Hash Values are 10, 4, 5, 6, 0, 9 and Hash Table is:
Thanks for following this blog. Comment your expression. Thanks.