Introduction to hash functions
Hash function h()
is an arbitrary function which mapped data x ∈ X
of arbitrary size to value y ∈ Y
of fixed size: y = h(x)
. Good hash functions have follows restrictions:
 hash functions behave likes uniform distribution
 hash functions is deterministic.
h(x)
should always return the same value for a givenx
 fast calculating (has runtime O(1))
In general case size of hash function less then size of input data: y < x
. Hash functions are not reversible or in other words it may be collision: ∃ x1, x2 ∈ X, x1 ≠ x2: h(x1) = h(x2)
. X
may be finite or infinite set and Y
is finite set.
Hash functions are used in a lot of parts of computer science, for example in software engineering, cryptography, databases, networks, machine learning and so on. There are many different types of hash functions, with differing domain specific properties.
Often hash is an integer value. There are special methods in programmning languages for hash calculating. For example, in C#
GetHashCode()
method for all types returns Int32
value (32 bit integer number). In Java
every class provides hashCode()
method which return int
. Each data type has own or user defined implementations.
Hash methods
There are several approaches for determinig hash function. Without loss of generality, lets x ∈ X = {z ∈ ℤ: z ≥ 0}
are positive integer numbers. Often m
is prime (not too close to an exact power of 2).
Method  Hash function 
——  —— 
Division method  h(x) = x mod m
 is often a good choice for m 
Multiplication method  h(x) = ⌊m (xA mod 1)⌋, A ∈ {z ∈ ℝ: 0 < z < 1}

Hash table
Hash functions used in hash tables for computing index into an array of slots. Hash table is data structure for implementing dictionaries (keyvalue structure). Good implemented hash tables have O(1) time for the next operations: insert, search and delete data by key. More than one keys may hash to the same slot. There are two ways for resolving collision:
 Chaining: linked list is used for storing elements with the same hash value in slot
 Open addressing: zero or one element is stored in each slot
The next methods are used to compute the probe sequences required for open addressing
Method  Formula 
——  —— 
Linear probing  h(x, i) = (h'(x) + i) mod m

Quadratic probing  h(x, i) = (h'(x) + c1*i + c2*i^2) mod m

Double hashing  h(x, i) = (h1(x) + i*h2(x)) mod m

Where i ∈ {0, 1, ..., m1}
, h'(x), h1(x), h2(x)
are auxiliary hash functions, c1, c2
are positive auxiliary constants.
Examples
Lets x ∈ U{1, 1000}, h = x mod m
. The next table shows the hash values in case of not prime and prime. Bolded text indicates the same hash values.
x  m = 100 (not prime)  m = 101 (prime)  ––  ––  —  723  23  16  103  3  2  738  38  31  292  92  90  61  61  61  87  87  87  995  95  86  549  49  44  991  91  82  757  57  50  920  20  11  626  26  20  557  57  52  831  31  23  619  19  13 
Links
 Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein. Introduction to Algorithms.
 Overview of Hash Tables
 Wolfram MathWorld  Hash Function