1. Main Concept of Hash Map
Hashing is the process of converting data of any size (like a string, number, or object) into a fixed-length integer value. This integer value is called a "hash value," "hash code," or simply "hash."
The primary data structure that uses this concept is called a Hash Table.
1.1. Components of a Hash System
A hash-based search system consists of three main components:
- Hash Table: This is fundamentally an array (or a
std::vector). Each "row" or "slot" in this array is called a bucket. The size of this table (m) is a key factor in its performance. - Key: This is the data you want to store or look up. The key is fed into the hash function to find its proper location. For example, in a phone book, the name is the key.
- Hash Function: This is the "engine" or mathematical function that takes your key and computes an index (a row number from
0tom-1) within the array where your data should be stored.
The main goal of hashing is to achieve extremely fast data access. In the ideal case, the hash function provides a unique index for every key, allowing insertion, search, and deletion operations to be performed in constant time, or O(1).
Hash Map Analogy:
Imagine a large file cabinet (Hash Table) with 100 drawers numbered 0-99. To store a document for "Budi," you don't search one by one. You use a rule (Hash Function), for example, "Take the first letter ('B' -> 2), store it in drawer #2." When you need to find "Budi," you compute the same rule and immediately jump to drawer #2.
1.2. Good Hash Function Properties
The Hash Function is the most critical part of the Hash Table. A good function must have the following properties:
- Deterministic: The same input (key) must always produce the same output (hash value).
- Efficient: The hash function must be fast to compute (ideally
O(1)). - Uniform Distribution: This is the most important. The hash function must spread keys as evenly as possible across all slots (indices) in the table. This minimizes collisions.
If the hash function doesn't distribute data well (e.g., it hashes all names starting with 'A', 'B', and 'C' to the same slot), it will cause collisions, and the performance will degrade significantly.
1.3. Common Hash Function Methods
There are several methods to design a hash function:
1.3.1. Division Method
This is the most common and simplest method. You take the key (or a numeric representation of it) and use the modulo operator with the table size.
h(key) = key % tableSize- Pro: Very fast and simple.
- Con: Can perform poorly if
tableSizehas common factors with the keys. It's a good practice to maketableSizea prime number to help ensure better distribution.
1.3.2. Multiplication Method
This method is less sensitive to the choice of table size.
h(key) = floor( tableSize * ( (key * A) % 1 ) )- Where
Ais a constant between 0 and 1.
1.3.3. Mid-Square Method
The key is squared, and then the "middle" digits of the result are extracted as the hash index. This is good for spreading out patterns in the keys.
1.3.4. Folding Method:
The key is divided into several parts, and all parts are added or XORed together to create the hash value.