1 Star 0 Fork 79

zhizou/javascript-algorithms

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
.github
.husky
assets
src
algorithms
cryptography
caesar-cipher
hill-cipher
polynomial-hash
__test__
PolynomialHash.js
README.md
SimplePolynomialHash.js
rail-fence-cipher
graph
image-processing
linked-list
math
ml
search
sets
sorting
string
tree
uncategorized
data-structures
playground
utils/comparator
.babelrc
.editorconfig
.eslintrc
.gitignore
.npmrc
BACKERS.md
CODE_OF_CONDUCT.md
CONTRIBUTING.md
LICENSE
README.ar-AR.md
README.es-ES.md
README.fr-FR.md
README.id-ID.md
README.it-IT.md
README.ja-JP.md
README.ko-KR.md
README.md
README.pl-PL.md
README.pt-BR.md
README.ru-RU.md
README.tr-TR.md
README.uk-UA.md
README.zh-CN.md
README.zh-TW.md
jest.config.js
package-lock.json
package.json
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README

Polynomial Rolling Hash

Hash Function

Hash functions are used to map large data sets of elements of an arbitrary length (the keys) to smaller data sets of elements of a fixed length (the fingerprints).

The basic application of hashing is efficient testing of equality of keys by comparing their fingerprints.

A collision happens when two different keys have the same fingerprint. The way in which collisions are handled is crucial in most applications of hashing. Hashing is particularly useful in construction of efficient practical algorithms.

Rolling Hash

A rolling hash (also known as recursive hashing or rolling checksum) is a hash function where the input is hashed in a window that moves through the input.

A few hash functions allow a rolling hash to be computed very quickly — the new hash value is rapidly calculated given only the following data:

  • old hash value,
  • the old value removed from the window,
  • and the new value added to the window.

Polynomial String Hashing

An ideal hash function for strings should obviously depend both on the multiset of the symbols present in the key and on the order of the symbols. The most common family of such hash functions treats the symbols of a string as coefficients of a polynomial with an integer variable p and computes its value modulo an integer constant M:

The Rabin–Karp string search algorithm is often explained using a very simple rolling hash function that only uses multiplications and additions - polynomial rolling hash:

H(s0, s1, ..., sk) = s0 * pk-1 + s1 * pk-2 + ... + sk * p0

where p is a constant, and (s1, ... , sk) are the input characters.

For example we can convert short strings to key numbers by multiplying digit codes by powers of a constant. The three letter word ace could turn into a number by calculating:

key = 1 * 262 + 3 * 261 + 5 * 260

In order to avoid manipulating huge H values, all math is done modulo M.

H(s0, s1, ..., sk) = (s0 * pk-1 + s1 * pk-2 + ... + sk * p0) mod M

A careful choice of the parameters M, p is important to obtain “good” properties of the hash function, i.e., low collision rate.

This approach has the desirable attribute of involving all the characters in the input string. The calculated key value can then be hashed into an array index in the usual way:

function hash(key, arraySize) {
  const base = 13;

  let hash = 0;
  for (let charIndex = 0; charIndex < key.length; charIndex += 1) {
    const charCode = key.charCodeAt(charIndex);
    hash += charCode * (base ** (key.length - charIndex - 1));
  }

  return hash % arraySize;
}

The hash() method is not as efficient as it might be. Other than the character conversion, there are two multiplications and an addition inside the loop. We can eliminate one multiplication by using *Horner's method:

a4 * x4 + a3 * x3 + a2 * x2 + a1 * x1 + a0 = (((a4 * x + a3) * x + a2) * x + a1) * x + a0

In other words:

Hi = (P * Hi-1 + Si) mod M

The hash() cannot handle long strings because the hashVal exceeds the size of int. Notice that the key always ends up being less than the array size. In Horner's method we can apply the modulo (%) operator at each step in the calculation. This gives the same result as applying the modulo operator once at the end, but avoids the overflow.

function hash(key, arraySize) {
  const base = 13;

  let hash = 0;
  for (let charIndex = 0; charIndex < key.length; charIndex += 1) {
    const charCode = key.charCodeAt(charIndex);
    hash = (hash * base + charCode) % arraySize;
  }

  return hash;
}

Polynomial hashing has a rolling property: the fingerprints can be updated efficiently when symbols are added or removed at the ends of the string (provided that an array of powers of p modulo M of sufficient length is stored). The popular Rabin–Karp pattern matching algorithm is based on this property

References

马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
JavaScript
1
https://gitee.com/zhizous/javascript-algorithms.git
git@gitee.com:zhizous/javascript-algorithms.git
zhizous
javascript-algorithms
javascript-algorithms
master

搜索帮助