Rabin Karp
The Rabin–Karp algorithm or Karp–Rabin algorithm is a string searching algorithm that uses hashing to find any one of a set of pattern strings in a text.Its average and best case running time is O(n+m) in space O(p), but its worst-case time is O(nm) where n is the length of the text and m is the length of the pattern.
Algorithm implementation in java for string matching
void RabinfindPattern(String text,String pattern){ /* q a prime number p hash value for pattern t hash value for text d is the number of unique characters in input alphabet */ int d=128; int q=100; int n=text.length(); int m=pattern.length(); int t=0,p=0; int h=1; int i,j; //hash value calculating function for (i=0;i<m-1;i++) h = (h*d)%q; for (i=0;i<m;i++){ p = (d*p + pattern.charAt(i))%q; t = (d*t + text.charAt(i))%q; } //search for the pattern for(i=0;i<end-m;i++){ if(p==t){ //if the hash value matches match them character by character for(j=0;j<m;j++) if(text.charAt(j+i)!=pattern.charAt(j)) break; if(j==m && i>=start) System.out.println("Pattern match found at index "+i); } if(i<end-m){ t =(d*(t - text.charAt(i)*h) + text.charAt(i+m))%q; if(t<0) t=t+q; } } }
While calculating hash value we are dividing it by a prime number in order to avoid collision.After dividing by prime number the chances of collision will be less, but still ther is a chance that the hash value can be same for two strings,so when we get a match we have to check it character by character to make sure that we got a proper match.
t =(d*(t - text.charAt(i)*h) + text.charAt(i+m))%q;
This is to recalculate the hash value for pattern,first by removing the left most character and then adding the new character from the text.