A technique which used to identify a particular object uniquely from a set of similar objects is known as Hashing. Some real-life examples of hashing implementations include:
In both the above examples, the books and students are hashed to a unique number.
Let’s assume that we have an object, and all we need is to assign a key to the object so that the searching part becomes easier. Usually, we use a basic array-like data structure for storing the key/value pair, where keys are represented by integers and can be used directly as an index for storing the values. But what if the keys are pretty larger and we can’t use them directly as an index. In such cases, the hashing comes into action.
Hashing allows the users to convert the large keys into small keys with its hash functions. A data structure, also known as the Hash Table, is used for the values assigned to the keys. The concept of Hashing is based on distributing the entries (key/value pairs) consistently across an array. A key (converted key) is assigned to each data element. We can easily access the data element with the help of that key in O(1) time. The hash function processes an index using the key and suggests where an item can be inserted or found.
Hashing Implementation can be done in two steps:
The hash is the array size-independent in the above method and can be reduced to an index (a number between 0 and arraysize – 1) using the modulo operator (%).
A data structure used for storing values with a pair of keys and values is known as the Hash Table. A hash function helps generate a unique key that is then assigned to each value.
We can access the value associated with the key using a key name. This process makes the values searching in a hash table pretty faster, irrespective of the total items available in the hast table.
A function used for mapping an arbitrary size data set to a fixed size data set, which falls into the hash table, is known as the Hash Function. The hash function returns some values called hash codes, hash sums, hash values or hashes.
A good hash function plays a significant role in processing a sound hashing mechanism. Some of the basic characteristics of a good hash function are as follows:
Note: Collisions are bound to happen regardless of how good a hash function is. Thus, to manage the hash table performance, it becomes necessary to handle collisions using numerous techniques available for collision resolution.
Apart from the characteristics stated above, the hash functions are often one-way functions. This statement implies that we can get a hash from a string but cannot get the original string from a hash due to the voluntary data loss implemented in the function. This is not a feature that required for every hash function; however, it becomes significant while securing cryptographically.
Some of the popular hash algorithms include NTLM, SHA-1, SHA-2, MD5 and many more.
Python provides a built-in function called the hash() function that produces an object’s hash. The hash() function generates the hash as an integer once it takes an object as input.
The hash() function helps invoke the input object’s __hash__() method internally. Thus, we can implement the __hash__() method for creating a custom hashable class. It returns an integer relied on the object’s internal state.
Let’s see some basic interpretation of the hash() function in Python.
From the above example, some might wonder why these hashes seem to have a different length? So, the Python hash() function provides integers objects. On a standard 64-bit Python 3 interpreter, these objects are often represented with 24 bytes.
Moreover, we can also observe that an integer’s hash value is the same as its value by default. We can also note that this function works irrespectively of the value type we are hashing, which implies that the integer ‘2’ and the float ‘2.0’ have the same hash: 2.
As we have discussed earlier, the hash functions are often one-way functions. In cases where two distinct objects have an identical hash, it is impracticable to reverse the process, starting from a hash and returning to the original object. In such cases, the detail regarding the type of the original hashed object is gone lost.
In addition, we can also observe that the decimal numbers have some different value from their value in hashes while hashing numbers. Even negative numbers have their negative hashes. But what if we try hashing the same number we got after hashing the decimal value? It is quite interesting that we will get the same hash. Let’s see the following example to understand the same:
In the above example, we can see that the integer number 461168601842738816 has the same hash as the number 0.2. It is normal, as we have discussed so far, about the hash functions. Since we cannot have a fixed-length value representing infinite values when we can hash any string or number to get a fixed-length value, this statement implies that there have to be duplicated values. And that’s what we have already discussed; they are known as collisions. The hash values are said to collide when two objects share identical hashes.
Hashing a string is quite similar to hashing a numeric value. Let’s see an example given below to understand the same.
From the above example, we can observe that a string can be hashed, producing a numeric value.
However, it is not necessary that the Python Interpreter would return the same output while processing the commands from the above example. This problem arises because, in Python 3.3 and above, the bytes objects and string values are salted with a random value before the hash processing.
This statement means that the string values are modified with some random values that change every time we start an interpreter before hashing.
However, we can set the PYTHONHASHSEED environment variable to an integer value for overriding this behavior. Note that this integer value should be greater than zero before we start the interpreter.
Let’s discuss if any of the Python types are hashable. Well, we would rather say no as only immutable types are hashable in Python by default. In such cases, the user uses an immutable container (such as a tuple), and the content should also be immutable for being hashable.
Note that Python Interpreter will return a TypeError when we try to get the hash from an unhashable type. Let’s see the following example:
However, the custom-defined objects in Python are hashable, and the hashes of these objects are derived from their IDs. This statement means that two distinct instances of the same class have different hashes by default. This can be observed in the following example:
Output:
In the above example, we have created two distinct instances in one custom object. Both instances have different hash values. However, we can implement the __hash__() method for modifying the behavior inside the custom class.
As we have discussed earlier, the hash table is one of the data structure types which generates the data element index value or the address using the hash function. This data structure provides faster access to the data because the index value acts as a key for the data value. In simpler terms, we can also say that the Hash table helps store the key-value pairs where the keys are generated using a hashing function.
So, the functions of a data element or item, including the search and insertion functions, becomes pretty much faster because the key values turn themselves into the array index, which helps store the data.
Let’s discuss how the data structure, such as Dictionary is used for implementing the hash tables. The Keys in the Dictionary meet the requirements stated below:
Thus, the data structure, such as a dictionary, can help to implement the hash table in the following ways:
We can access the elements in the dictionary using the standard square brackets along with the key for obtaining the value of elements. Here is an example as follows that illustrates the same:
We can add a new entry or a key-value pair for updating a dictionary. In addition to this, we can also modify an existing entry and even delete an existing entry to update a dictionary. Here is an example as follows that illustrates the same:
We can either delete an individual data element of the dictionary or clear the dictionary’s entire content. In addition to this, we can also remove the entire dictionary using a single operation. This single operation can remove an entire dictionary explicitly using the del command. Here are some examples as follows that illustrates the same:
The above snippet of code should produce an Output, as shown below:
Designed by Elegant Themes | Powered by WordPress