2. Collision Handling
A Collision occurs when two or more different keys produce the same hash value (index). For example, "Budi" and "Dina" are both hashed to row #5. We can't overwrite Budi's data. We must have a strategy to handle this.
2.1. Separate Chaining

This is the most common strategy to avoid collisions.
- Concept: Instead of each table row storing a single value, each row stores a pointer to another data structure, usually a Linked List.
- How it Works:
- "Budi" hashes to row 5. We create a linked list at row 5 and add "Budi".
- "Dina" hashes to row 5. We add "Dina" to the linked list that already exists at row 5.
- Row 5 now contains:
[Budi] -> [Dina] -> NULL
- Search: To find "Dina," we compute its hash (5), go to row 5, and then traverse the linked list at that row until we find "Dina." The worst-case search time becomes
O(k)wherekis the number of elements in the chain.
2.1.1. Manual Implementation: Insertion
The insert function handles adding a new key-value pair. Its logic is:
- Hash the
keyto find the correct bucketindex. - Get the linked list (the chain) at that
index. - Traverse the list:
- If the
keyis already in the list, update itsvalueand stop. - If the end of the list is reached and the
keywas not found, add the new key-value pair to the end of the list.
- If the
void insert(std::string key, int value) {
// Hash the key to get the bucket index
int index = hashFunction(key);
// Get the chain (linked list) at that index
std::list<KeyValuePair>& chain = table[index];
// Traverse the chain
for (auto& pair : chain) {
if (pair.key == key) {
// Key found: update the value and return
pair.value = value;
return;
}
}
// Key not found: add new pair to the end of the list
chain.push_back(KeyValuePair(key, value));
}
2.1.2. Manual Implementation: Searching
The search function finds the value associated with a key. Its logic is:
- Hash the
keyto find the bucketindex. - Get the linked list at that
index. - Traverse the list:
- If the
keyis found, return itsvalue. - If the end of the list is reached and the
keywas not found, return a sentinel value (like -1) or throw an exception.
- If the
int search(std::string key) {
// Hash the key to get the bucket index
int index = hashFunction(key);
// Get the chain at that index
std::list<KeyValuePair>& chain = table[index];
// Traverse the chain
for (auto& pair : chain) {
if (pair.key == key) {
// Key found: return the value
return pair.value;
}
}
// Key not found
return -1;
}
2.1.3. Manual Implementation: Deletion
The remove function deletes a key-value pair. Its logic is:
- Hash the
keyto find the bucketindex. - Get the linked list at that
index. - Traverse the list using an iterator. We must use an iterator because we need to know the position of the element to delete it.
- If the
keyis found:- Call the list's
erase()method, passing the iterator. - Stop and return.
- Call the list's
void remove(std::string key) {
// Hash the key to get the bucket index
int index = hashFunction(key);
// Get a reference to the chain
auto& chain = table[index];
// Traverse the chain using an iterator
for (auto it = chain.begin(); it != chain.end(); ++it) {
// 'it' is an iterator (position pointer)
// 'it->key' accesses the key of the element at that position
if (it->key == key) {
// Key found: erase the element at this position
chain.erase(it);
return;
}
}
// If loop finishes, key was not found. Do nothing.
}
2.2. Open Addressing (Probing)
This is an alternative strategy where all data is stored within the table itself. No linked lists are used.
- Concept: If a collision occurs (the slot is full), we "probe" for the next empty slot according to a specific rule and place the item there.
- Types of Probing:
- Linear Probing: This is the simplest rule. If slot
h(key)is full, tryh(key) + 1, thenh(key) + 2,h(key) + 3, and so on, wrapping around the table if necessary. The problem of this type is it tends to create clusters of occupied slots, which degrades performance as the table fills up. - Quadratic Probing: If slot
h(key)is full, tryh(key) + 1^2, thenh(key) + 2^2,h(key) + 3^2, and so on. This helps spread out elements better than linear probing. - Double Hashing: Uses a second hash function to determine the "step size" for probing, which is the most effective at avoiding clusters.
- Linear Probing: This is the simplest rule. If slot