microhash Specification

Overview

microhash is a lightweight, non-cryptographic 64-bit hash function designed to run on a wide range of platforms — from modern desktops, servers, and mobile devices down to resource-constrained embedded systems and historically constrained 8/16-bit architectures. It prioritises simplicity, minimal memory footprint, and portability over cryptographic strength.

The algorithm is intentionally straightforward so that it can be implemented in any language and compiled for any target without requiring hardware acceleration, operating-system support, or dynamic memory allocation.

Algorithm

1. Constants

Two 32-bit state words are initialised from the fractional part of π:

state[0] = 0x243F6A88
state[1] = 0x85A308D3

Using well-known irrational-number constants (“nothing-up-my-sleeve” numbers) reduces the risk of accidentally choosing pathological initial values.

2. Padding

Before processing, the input is logically extended to a multiple of 32 bytes using the following scheme:

Position relative to original input	Value
Immediately after last data byte	`0x80`
Intermediate bytes	`0x00`
Last 4 bytes of padded message	Big-endian 32-bit encoding of the original byte length

The padded length is:

paddedLength = ceil((inputLength + 5) / 32) * 32

The +5 overhead reserves one byte for the 0x80 marker and four bytes for the length field, guaranteeing at least one padding block even for empty input.

Note: Although the length is encoded into the padding, only the first 16 bytes of every 32-byte block enter the mixing step (see §3). The length field therefore resides in the unreachable second half of the final block and does not currently influence the output. This is a known limitation of the present design.

3. Block Processing

Each 32-byte block is consumed in four 4-byte little-endian words (word[0]…word[3], covering bytes 0–15 of the block). For each word the two state registers are updated:

state[0] = RotateLeft32(state[0] XOR word, 5)  + state[1]
state[1] = RotateLeft32(state[1]  + word, 11) XOR state[0]

Where RotateLeft32(v, n) = (v << n) | (v >> (32 - n)).

The alternating XOR/ADD operations with two different rotation amounts spread bit influence across both state words and provide avalanche behaviour without requiring large lookup tables or multiple rounds.

4. Finalisation

After all blocks have been processed, the 64-bit digest is constructed as follows:

final  = state[0] XOR RotateLeft32(state[1], 3)
digest = (final << 32) | state[1]

The upper 32 bits of the digest are a mix of both state words; the lower 32 bits are state[1] unchanged. The result is returned as a single unsigned 64-bit integer.

Properties

Property	Value
Output size	64 bits
Internal state	2 × 32-bit words (64 bits)
Block size	32 bytes (16 bytes actively mixed)
Minimum memory (core)	32-byte block buffer + 8 bytes state
Cryptographic	No
Streaming	No (whole-message)
Endianness dependency	Little-endian word assembly (all implementations)

Revision note — microhash-ng. The mixing step below reads only the first 16 bytes of each 32-byte block; bytes 16–31 (including the encoded length field) never influence the digest. A revised variant, microhash-ng (src/microhash-ng/), consumes all eight 4-byte words per block (eliminating the dead zone and making the length field live), absorbs each word with two ARX rounds — the second reinjects the word rotated left by 16 — and applies four finalization rounds with π-derived constants (13198A2E 03707344 A4093822 299F31D0) before the output combiner. State constants, padding, the per-round operations, and the 64-bit output format are unchanged, but digests are not compatible with the original. microhash-ng passes the rurban/SMHasher quality battery (Avalanche, Sparse, Cyclic, TwoBytes, Zeroes, Seed, DiffDist, BIC) with zero failures; the original fails 21 sub-tests. New integrations — especially anything doing change detection — should use microhash-ng.

Implementations

C++ (`src/cpp/microhash.hpp`)

Target platforms: Any platform with a C++ compiler that supports fixed-width integer types and basic bitwise arithmetic — modern desktop/server (x86-64, ARM64), embedded microcontrollers (ARM Cortex-M, AVR, RISC-V), and, via porting (see §Porting the C++ Implementation), classic and retro architectures such as the Z80, M68K, and 6502.

Key characteristics:

Header-only single-file library; no compilation step required beyond including the header.
The core ComputeHash(const uint8_t* data, size_t length) overload uses only standard fixed-width integer types and basic bitwise operations — no heap allocation, no OS calls, no standard library beyond the integer-type headers.
The 32-byte block buffer is stack-allocated, making the function safe to call from interrupt handlers or bare-metal environments.
Words are assembled manually from individual bytes using explicit bit-shifts (block[base] | block[base+1]<<8 | …), guaranteeing identical behaviour on both little-endian and big-endian hosts.
A convenience overload accepts std::vector<uint8_t> for use in hosted environments; the core pointer/length overload can be used without the STL entirely.
main.cpp provides a CLI tool that can hash strings supplied as command-line arguments, via stdin, or run a fixed set of test vectors with --test.

C# (`src/csharp/microhash.cs`)

Target platforms: Any C# runtime — modern .NET, legacy .NET Framework, and Mono. This covers Windows, Linux, macOS, iOS, and Android. The core hash logic uses only fundamental C# language features (integer arithmetic, arrays, and bitwise operators) that have been stable since the earliest versions of the language and runtime.

Key characteristics:

Implemented as a static class (hashPipe) in a standard console project.
Word construction uses BitConverter.ToUInt32, which is correct on the little-endian architectures targeted by every common .NET runtime; on a big-endian host the result would differ from the C++ implementation.
Input is taken as a byte[]; callers convert strings via Encoding.UTF8.GetBytes, making the encoding explicit.
The hash logic itself (microhash.cs) has no dependencies beyond System and can be dropped into any C# project, including ones targeting .NET Framework 2.0 or later, without modification.
The benchmarking harness (Benchmarks.cs) depends on the BenchmarkDotNet package and a modern SDK build system. If cross-platform build tooling is unavailable (for example, when targeting .NET Framework on Windows without the .NET SDK), the benchmarks can be removed and the core logic compiled independently.
Program.cs exposes a CLI interface matching the C++ tool. When a single argument is supplied, the string is hashed and the result is printed. With no arguments, the program reads a line from standard input, hashes it, and prints the result.

Ruby (`src/ruby/microhash.rb`)

Target platforms: Any Ruby interpreter — CRuby/MRI 2.5+, JRuby, and TruffleRuby. The implementation uses only core language features (integer arithmetic, arrays, and bitwise operators), so it works unmodified inside Rails applications, plain scripts, and embedded mruby-style environments that provide Integer and Array.

Key characteristics:

Single-file, pure-Ruby module (MicroHash) with no gem dependencies; an optional native C extension (src/ruby/ext/microhash/) is auto-detected at require time and transparently replaces the hashing core with the C implementation (~5–8× faster than OpenSSL’s SHA-256 on the same input). The pure-Ruby path remains the fallback and produces identical output.
Ruby integers are arbitrary-precision, so 32-bit wrapping semantics are enforced explicitly by masking with 0xFFFFFFFF after every addition, shift, and rotate — matching the fixed-width overflow behaviour of the C++ and C# implementations exactly.
Words are assembled manually from individual bytes using explicit bit-shifts, so output is identical on little-endian and big-endian hosts (like C++, unlike C#).
MicroHash.compute_hash accepts a String (hashed as raw bytes, independent of its declared encoding) or an Array of byte values, and returns the 64-bit digest as an Integer. MicroHash.hexdigest returns the digest as a 16-character uppercase hex string.
main.rb provides a CLI tool matching the C++ and C# tools, including the --test vector runner.
An RSpec suite (tests/ruby/microhash_spec.rb) verifies the golden test vectors and Ruby-specific behaviour (byte-array input, encoding-agnosticism, block-boundary lengths).

Porting the C++ Implementation

Because the core algorithm uses only 32-bit integer arithmetic, bitwise XOR, addition, and bit-rotation, it can be translated to virtually any language or assembly dialect. The sections below describe what is required to port microhash to architectures that pre-date modern C++ compilers.

General requirements for any port

Two 32-bit unsigned accumulators (state[0] and state[1]). On 8-bit or 16-bit CPUs these must be synthesised from multiple narrower registers or memory locations.
A 32-byte scratch buffer for the current block. On very constrained systems (e.g. 6502 with 256 bytes of zero-page) this buffer can be placed in RAM outside zero-page; only 16 bytes are read per block so it is also possible to process four bytes at a time without a full block buffer.
32-bit rotate-left by 5, 11, and 3 bits. On CPUs without a barrel shifter this is implemented as a sequence of single-bit shifts and rotates.
32-bit addition — straightforward with carry-propagation on any architecture.
Padding logic — a simple byte counter that appends 0x80, then zeros, then a big-endian 32-bit length in the last four bytes of the final block.

Z80 / CP/M

The Z80 is an 8-bit processor. All 32-bit quantities must be maintained in pairs of 16-bit register pairs (or in memory). The standard approach:

Represent each state word as a four-byte region in memory (e.g. in the BSS segment of a CP/M .COM file).
The rotate-left by 5 is equivalent to rotate-left by 8 (a byte swap, which is free) followed by rotate-left by 5 minus 8 = rotate-right by 3, both achievable with RLA/RRA instructions and carry manipulation.
Alternatively: rotate-left by 5 = shift-left-by-4 (four ADD HL,HL / ADC sequences) then rotate-left-by-1; rotate-left by 11 = rotate-right by 21.
32-bit addition is done in two 16-bit ADD HL,DE + ADC HL,BC steps.
Under CP/M, the input string can be read from the default file control block (FCB) or the command tail at 0x0080, and the hash can be printed with BDOS call 2 or 9.
A complete .COM implementation requires approximately 100–200 Z80 instructions and fits comfortably in the 64 KB TPA.

Motorola 68000 (M68K)

The M680X0 is a 16/32-bit processor with eight data registers (D0–D7), making it a natural fit:

Each state word occupies a single data register; the rotate-left instructions (ROL.L #5,Dn and ROL.L #11,Dn) are single native instructions.
32-bit XOR and ADD are single instructions (EOR.L, ADD.L).
The block buffer can live on the stack (SUBQ.L #32,SP).
On Amiga, Atari ST, or a bare 68k SBC the function can be a position-independent relocatable module with no OS dependencies.
A straightforward assembly port is around 30–50 instructions for the inner loop.

MOS 6502

The 6502 is an 8-bit processor with no native 32-bit arithmetic:

State words are stored as four consecutive bytes in zero-page (8 addresses each) for fast access.
32-bit XOR is four EOR zpg instructions.
32-bit addition uses CLC + four ADC instructions with carry.
Rotate-left by 5: easiest as rotate-right by 3 (three ROR instructions cycling through the four bytes and carry). Alternatively shift-left-by-5 using a loop.
Rotate-left by 11: rotate-right by 21, or equivalently rotate-left by 8 (a byte rotation — three LDA/STA moves) then rotate-left by 3.
On Apple II or Commodore 64 the 256-byte zero-page makes it practical to keep both state words and the 32-byte block buffer in zero-page; on a bare 6502 any RAM location works.
Performance will be slow (each 32-bit rotate is 20–40 cycles), but correctness is achievable.

Other languages / platforms

The algorithm translates directly to any language that provides unsigned 32-bit integers and bitwise operators — C, Pascal, BASIC (with UINT32 or library support), Forth, assembly, Lua, Python, Rust, Go, and so on. The minimum requirements are:

An unsigned (or wrapping) 32-bit add.
Bitwise XOR.
Left and right bit-shift by a constant amount.
A byte array or equivalent for the 32-byte block buffer.

Reduced-Output and Efficiency Considerations for Constrained Platforms

The full microhash output is 64 bits, but many embedded and retro applications do not need that much output. Equally, the cost of 32-bit arithmetic on 8-bit CPUs means it is worth considering how to minimise work while still producing a useful digest. Nothing in the algorithm forbids using a narrower slice of the output, and a few structural choices can reduce the per-byte cycle cost considerably.

Truncated output widths

The finalisation step produces a 64-bit value whose halves have different mixing properties:

Variant	How to obtain it	Notes
64-bit (full)	`(final << 32) \\| state[1]`	Both halves mixed; maximum collision resistance
32-bit	Use `state[1]` alone, or `final` alone	`state[1]` is the most-mixed accumulator; `final` adds `state[0]` contribution. Either is a valid 32-bit digest
16-bit	Upper or lower 16 bits of `state[1]`	Sufficient for small hash tables on 8-bit systems (e.g. 256- or 512-entry tables)
8-bit	Low byte of `state[1]`	Useful only for tiny lookup structures; collision probability is high for inputs beyond a few dozen

Truncation is free — it requires no extra computation and produces a deterministic sub-digest fully derived from the same mixing steps. The internal state remains 2 × 32-bit throughout; narrowing the output only affects what is read from the state at the end.

Reducing memory pressure: eliminating the block buffer

The reference implementation copies each input block into a 32-byte stack buffer before processing. On severely RAM-constrained systems (e.g. a 6502 with only zero-page available, or a microcontroller with a few hundred bytes of RAM total) this buffer can be eliminated by processing the input stream four bytes at a time directly:

Keep a 4-byte (one word) accumulation register and a byte counter.
As each byte arrives, store it in the accumulation register at the correct byte position.
When four bytes are collected (one word complete), apply the mixing step immediately and reset the accumulation register.
Track the total byte count for padding.
At the end of input, pad out the current partial word with the 0x80 marker and zero bytes, apply the mix, then flush any remaining words up to the 16-byte (four-word) boundary before finalising.

This reduces the working RAM requirement to approximately 4 bytes (accumulation word) + 8 bytes (state) + 4 bytes (byte counter) = 16 bytes, at the cost of slightly more complex control flow.

Cycle cost on 8-bit platforms

On 8-bit architectures such as the Z80 or 6502, each 32-bit rotate-left is the most expensive primitive in the inner loop. Approximate costs:

Operation	Z80 (cycles)	6502 (cycles)
32-bit XOR (4 bytes)	~16	~16
32-bit ADD with carry	~24	~16
32-bit rotate-left by 5	~60–80	~60–80
32-bit rotate-left by 11	~60–80	~60–80
Full inner-loop iteration (one word)	~200–240	~200–240
Per 16-byte block (4 words)	~800–960	~800–960

These are rough estimates; hand-optimised assembly using byte-rotation tricks (e.g. rotate-left by 8 is a free byte-lane rotate; rotate-left by 16 is two byte swaps) can cut these figures by 30–50 %.

To reduce cycle cost further while keeping the same algorithm structure:

Process only the first word per block (4 bytes mixed per 32-byte block boundary). This is a significant loss in mixing quality but reduces inner-loop work by 4×.
Use a smaller block size — if the application controls the input framing, aligning on 4-byte boundaries and skipping the 32-byte padding entirely reduces overhead to the bare minimum of four mix iterations per 16 input bytes.
Prefer the 32-bit output variant — it saves the finalisation XOR-and-shift step, which on a 6502 involves a 32-bit rotate and 32-bit OR (another ~40–60 cycles).

Summary of trade-offs

Constraint	Recommended adaptation
RAM < 32 bytes	Stream one word at a time; eliminate block buffer
Output storage limited	Truncate to 32 or 16 bits post-finalisation
Cycle budget tight (8-bit CPU)	Byte-rotation tricks for ROL-by-8/16; use 32-bit output (skip upper-half finalisation)
Hash table ≤ 256 entries	8-bit truncation is sufficient; only low byte of `state[1]` is needed
Hash table ≤ 65536 entries	16-bit truncation; upper 16 bits of `state[1]` tend to have better distribution

None of these adaptations require changing the core mixing logic — they are all post-processing choices or buffer-management strategies that any port can adopt independently.

Cross-Implementation Compatibility

All implementations produce identical output for the same byte sequence on little-endian hosts. The only observable difference is the word-loading path:

Aspect	C++	C#	Ruby
Word loading	Manual byte shifts	`BitConverter.ToUInt32`	Manual byte shifts
Length field type	`size_t` (≥32-bit)	`int` (32-bit signed)	`Integer` (arbitrary precision)
Input interface	`const uint8_t*` + length, or `std::vector<uint8_t>`	`byte[]`	`String` (raw bytes) or `Array` of bytes
Endianness safe	Yes (explicit shifts)	Only on little-endian hosts	Yes (explicit shifts)

Design Goals and Trade-offs

Goal	Approach
Minimal footprint	Fixed 32-byte stack buffer; 8 bytes of state
Portability	Standard integer types; no SIMD, no hardware instructions
Simplicity	~30 lines of logic; no lookup tables
Speed	Single-pass; rotate-XOR-ADD mixing without multi-round expansion
Not a goal	Cryptographic security — no resistance to preimage, collision, or length-extension attacks is claimed

microhash is suitable for hash tables, checksums, data fingerprinting, and similar non-security-sensitive applications where a fast, deterministic, platform-independent digest is required.

microhash Specification

Overview

Algorithm

1. Constants

2. Padding

3. Block Processing

4. Finalisation

Properties

Implementations

C++ (src/cpp/microhash.hpp)

C# (src/csharp/microhash.cs)

Ruby (src/ruby/microhash.rb)

Porting the C++ Implementation

General requirements for any port

Z80 / CP/M

Motorola 68000 (M68K)

MOS 6502

Other languages / platforms

Reduced-Output and Efficiency Considerations for Constrained Platforms

Truncated output widths

Reducing memory pressure: eliminating the block buffer

Cycle cost on 8-bit platforms

Summary of trade-offs

Cross-Implementation Compatibility

Design Goals and Trade-offs

C++ (`src/cpp/microhash.hpp`)

C# (`src/csharp/microhash.cs`)

Ruby (`src/ruby/microhash.rb`)