microhash Specification

Overview

microhash is a lightweight, non-cryptographic 64-bit hash function designed to run on a wide range of platforms — from modern desktops, servers, and mobile devices down to resource-constrained embedded systems and historically constrained 8/16-bit architectures. It prioritises simplicity, minimal memory footprint, and portability over cryptographic strength.

The algorithm is intentionally straightforward so that it can be implemented in any language and compiled for any target without requiring hardware acceleration, operating-system support, or dynamic memory allocation.


Algorithm

1. Constants

Two 32-bit state words are initialised from the fractional part of π:

state[0] = 0x243F6A88
state[1] = 0x85A308D3

Using well-known irrational-number constants (“nothing-up-my-sleeve” numbers) reduces the risk of accidentally choosing pathological initial values.

2. Padding

Before processing, the input is logically extended to a multiple of 32 bytes using the following scheme:

Position relative to original input Value
Immediately after last data byte 0x80
Intermediate bytes 0x00
Last 4 bytes of padded message Big-endian 32-bit encoding of the original byte length

The padded length is:

paddedLength = ceil((inputLength + 5) / 32) * 32

The +5 overhead reserves one byte for the 0x80 marker and four bytes for the length field, guaranteeing at least one padding block even for empty input.

Note: Although the length is encoded into the padding, only the first 16 bytes of every 32-byte block enter the mixing step (see §3). The length field therefore resides in the unreachable second half of the final block and does not currently influence the output. This is a known limitation of the present design.

3. Block Processing

Each 32-byte block is consumed in four 4-byte little-endian words (word[0]word[3], covering bytes 0–15 of the block). For each word the two state registers are updated:

state[0] = RotateLeft32(state[0] XOR word, 5)  + state[1]
state[1] = RotateLeft32(state[1]  + word, 11) XOR state[0]

Where RotateLeft32(v, n) = (v << n) | (v >> (32 - n)).

The alternating XOR/ADD operations with two different rotation amounts spread bit influence across both state words and provide avalanche behaviour without requiring large lookup tables or multiple rounds.

4. Finalisation

After all blocks have been processed, the 64-bit digest is constructed as follows:

final  = state[0] XOR RotateLeft32(state[1], 3)
digest = (final << 32) | state[1]

The upper 32 bits of the digest are a mix of both state words; the lower 32 bits are state[1] unchanged. The result is returned as a single unsigned 64-bit integer.


Properties

Property Value
Output size 64 bits
Internal state 2 × 32-bit words (64 bits)
Block size 32 bytes (16 bytes actively mixed)
Minimum memory (core) 32-byte block buffer + 8 bytes state
Cryptographic No
Streaming No (whole-message)
Endianness dependency Little-endian word assembly (both implementations)

Implementations

C++ (src/cpp/microhash.hpp)

Target platforms: Any platform with a C++ compiler that supports fixed-width integer types and basic bitwise arithmetic — modern desktop/server (x86-64, ARM64), embedded microcontrollers (ARM Cortex-M, AVR, RISC-V), and, via porting (see §Porting the C++ Implementation), classic and retro architectures such as the Z80, M68K, and 6502.

Key characteristics:

  • Header-only single-file library; no compilation step required beyond including the header.
  • The core ComputeHash(const uint8_t* data, size_t length) overload uses only standard fixed-width integer types and basic bitwise operations — no heap allocation, no OS calls, no standard library beyond the integer-type headers.
  • The 32-byte block buffer is stack-allocated, making the function safe to call from interrupt handlers or bare-metal environments.
  • Words are assembled manually from individual bytes using explicit bit-shifts (block[base] | block[base+1]<<8 | …), guaranteeing identical behaviour on both little-endian and big-endian hosts.
  • A convenience overload accepts std::vector<uint8_t> for use in hosted environments; the core pointer/length overload can be used without the STL entirely.
  • main.cpp provides a CLI tool that can hash strings supplied as command-line arguments, via stdin, or run a fixed set of test vectors with --test.

C# (src/csharp/microhash.cs)

Target platforms: Any C# runtime — modern .NET, legacy .NET Framework, and Mono. This covers Windows, Linux, macOS, iOS, and Android. The core hash logic uses only fundamental C# language features (integer arithmetic, arrays, and bitwise operators) that have been stable since the earliest versions of the language and runtime.

Key characteristics:

  • Implemented as a static class (hashPipe) in a standard console project.
  • Word construction uses BitConverter.ToUInt32, which is correct on the little-endian architectures targeted by every common .NET runtime; on a big-endian host the result would differ from the C++ implementation.
  • Input is taken as a byte[]; callers convert strings via Encoding.UTF8.GetBytes, making the encoding explicit.
  • The hash logic itself (microhash.cs) has no dependencies beyond System and can be dropped into any C# project, including ones targeting .NET Framework 2.0 or later, without modification.
  • The benchmarking harness (Benchmarks.cs) depends on the BenchmarkDotNet package and a modern SDK build system. If cross-platform build tooling is unavailable (for example, when targeting .NET Framework on Windows without the .NET SDK), the benchmarks can be removed and the core logic compiled independently.
  • Program.cs exposes a CLI interface matching the C++ tool. When a single argument is supplied, the string is hashed and the result is printed. With no arguments, the program reads a line from standard input, hashes it, and prints the result.

Porting the C++ Implementation

Because the core algorithm uses only 32-bit integer arithmetic, bitwise XOR, addition, and bit-rotation, it can be translated to virtually any language or assembly dialect. The sections below describe what is required to port microhash to architectures that pre-date modern C++ compilers.

General requirements for any port

  1. Two 32-bit unsigned accumulators (state[0] and state[1]). On 8-bit or 16-bit CPUs these must be synthesised from multiple narrower registers or memory locations.
  2. A 32-byte scratch buffer for the current block. On very constrained systems (e.g. 6502 with 256 bytes of zero-page) this buffer can be placed in RAM outside zero-page; only 16 bytes are read per block so it is also possible to process four bytes at a time without a full block buffer.
  3. 32-bit rotate-left by 5, 11, and 3 bits. On CPUs without a barrel shifter this is implemented as a sequence of single-bit shifts and rotates.
  4. 32-bit addition — straightforward with carry-propagation on any architecture.
  5. Padding logic — a simple byte counter that appends 0x80, then zeros, then a big-endian 32-bit length in the last four bytes of the final block.

Z80 / CP/M

The Z80 is an 8-bit processor. All 32-bit quantities must be maintained in pairs of 16-bit register pairs (or in memory). The standard approach:

  • Represent each state word as a four-byte region in memory (e.g. in the BSS segment of a CP/M .COM file).
  • The rotate-left by 5 is equivalent to rotate-left by 8 (a byte swap, which is free) followed by rotate-left by 5 minus 8 = rotate-right by 3, both achievable with RLA/RRA instructions and carry manipulation.
  • Alternatively: rotate-left by 5 = shift-left-by-4 (four ADD HL,HL / ADC sequences) then rotate-left-by-1; rotate-left by 11 = rotate-right by 21.
  • 32-bit addition is done in two 16-bit ADD HL,DE + ADC HL,BC steps.
  • Under CP/M, the input string can be read from the default file control block (FCB) or the command tail at 0x0080, and the hash can be printed with BDOS call 2 or 9.
  • A complete .COM implementation requires approximately 100–200 Z80 instructions and fits comfortably in the 64 KB TPA.

Motorola 68000 (M68K)

The M680X0 is a 16/32-bit processor with eight data registers (D0D7), making it a natural fit:

  • Each state word occupies a single data register; the rotate-left instructions (ROL.L #5,Dn and ROL.L #11,Dn) are single native instructions.
  • 32-bit XOR and ADD are single instructions (EOR.L, ADD.L).
  • The block buffer can live on the stack (SUBQ.L #32,SP).
  • On Amiga, Atari ST, or a bare 68k SBC the function can be a position-independent relocatable module with no OS dependencies.
  • A straightforward assembly port is around 30–50 instructions for the inner loop.

MOS 6502

The 6502 is an 8-bit processor with no native 32-bit arithmetic:

  • State words are stored as four consecutive bytes in zero-page (8 addresses each) for fast access.
  • 32-bit XOR is four EOR zpg instructions.
  • 32-bit addition uses CLC + four ADC instructions with carry.
  • Rotate-left by 5: easiest as rotate-right by 3 (three ROR instructions cycling through the four bytes and carry). Alternatively shift-left-by-5 using a loop.
  • Rotate-left by 11: rotate-right by 21, or equivalently rotate-left by 8 (a byte rotation — three LDA/STA moves) then rotate-left by 3.
  • On Apple II or Commodore 64 the 256-byte zero-page makes it practical to keep both state words and the 32-byte block buffer in zero-page; on a bare 6502 any RAM location works.
  • Performance will be slow (each 32-bit rotate is 20–40 cycles), but correctness is achievable.

Other languages / platforms

The algorithm translates directly to any language that provides unsigned 32-bit integers and bitwise operators — C, Pascal, BASIC (with UINT32 or library support), Forth, assembly, Lua, Python, Rust, Go, and so on. The minimum requirements are:

  • An unsigned (or wrapping) 32-bit add.
  • Bitwise XOR.
  • Left and right bit-shift by a constant amount.
  • A byte array or equivalent for the 32-byte block buffer.

Reduced-Output and Efficiency Considerations for Constrained Platforms

The full microhash output is 64 bits, but many embedded and retro applications do not need that much output. Equally, the cost of 32-bit arithmetic on 8-bit CPUs means it is worth considering how to minimise work while still producing a useful digest. Nothing in the algorithm forbids using a narrower slice of the output, and a few structural choices can reduce the per-byte cycle cost considerably.

Truncated output widths

The finalisation step produces a 64-bit value whose halves have different mixing properties:

Variant How to obtain it Notes
64-bit (full) (final << 32) \| state[1] Both halves mixed; maximum collision resistance
32-bit Use state[1] alone, or final alone state[1] is the most-mixed accumulator; final adds state[0] contribution. Either is a valid 32-bit digest
16-bit Upper or lower 16 bits of state[1] Sufficient for small hash tables on 8-bit systems (e.g. 256- or 512-entry tables)
8-bit Low byte of state[1] Useful only for tiny lookup structures; collision probability is high for inputs beyond a few dozen

Truncation is free — it requires no extra computation and produces a deterministic sub-digest fully derived from the same mixing steps. The internal state remains 2 × 32-bit throughout; narrowing the output only affects what is read from the state at the end.

Reducing memory pressure: eliminating the block buffer

The reference implementation copies each input block into a 32-byte stack buffer before processing. On severely RAM-constrained systems (e.g. a 6502 with only zero-page available, or a microcontroller with a few hundred bytes of RAM total) this buffer can be eliminated by processing the input stream four bytes at a time directly:

  • Keep a 4-byte (one word) accumulation register and a byte counter.
  • As each byte arrives, store it in the accumulation register at the correct byte position.
  • When four bytes are collected (one word complete), apply the mixing step immediately and reset the accumulation register.
  • Track the total byte count for padding.
  • At the end of input, pad out the current partial word with the 0x80 marker and zero bytes, apply the mix, then flush any remaining words up to the 16-byte (four-word) boundary before finalising.

This reduces the working RAM requirement to approximately 4 bytes (accumulation word) + 8 bytes (state) + 4 bytes (byte counter) = 16 bytes, at the cost of slightly more complex control flow.

Cycle cost on 8-bit platforms

On 8-bit architectures such as the Z80 or 6502, each 32-bit rotate-left is the most expensive primitive in the inner loop. Approximate costs:

Operation Z80 (cycles) 6502 (cycles)
32-bit XOR (4 bytes) ~16 ~16
32-bit ADD with carry ~24 ~16
32-bit rotate-left by 5 ~60–80 ~60–80
32-bit rotate-left by 11 ~60–80 ~60–80
Full inner-loop iteration (one word) ~200–240 ~200–240
Per 16-byte block (4 words) ~800–960 ~800–960

These are rough estimates; hand-optimised assembly using byte-rotation tricks (e.g. rotate-left by 8 is a free byte-lane rotate; rotate-left by 16 is two byte swaps) can cut these figures by 30–50 %.

To reduce cycle cost further while keeping the same algorithm structure:

  • Process only the first word per block (4 bytes mixed per 32-byte block boundary). This is a significant loss in mixing quality but reduces inner-loop work by 4×.
  • Use a smaller block size — if the application controls the input framing, aligning on 4-byte boundaries and skipping the 32-byte padding entirely reduces overhead to the bare minimum of four mix iterations per 16 input bytes.
  • Prefer the 32-bit output variant — it saves the finalisation XOR-and-shift step, which on a 6502 involves a 32-bit rotate and 32-bit OR (another ~40–60 cycles).

Summary of trade-offs

Constraint Recommended adaptation
RAM < 32 bytes Stream one word at a time; eliminate block buffer
Output storage limited Truncate to 32 or 16 bits post-finalisation
Cycle budget tight (8-bit CPU) Byte-rotation tricks for ROL-by-8/16; use 32-bit output (skip upper-half finalisation)
Hash table ≤ 256 entries 8-bit truncation is sufficient; only low byte of state[1] is needed
Hash table ≤ 65536 entries 16-bit truncation; upper 16 bits of state[1] tend to have better distribution

None of these adaptations require changing the core mixing logic — they are all post-processing choices or buffer-management strategies that any port can adopt independently.


Cross-Implementation Compatibility

Both implementations produce identical output for the same byte sequence on little-endian hosts. The only observable difference is the word-loading path:

Aspect C++ C#
Word loading Manual byte shifts BitConverter.ToUInt32
Length field type size_t (≥32-bit) int (32-bit signed)
Input interface const uint8_t* + length, or std::vector<uint8_t> byte[]
Endianness safe Yes (explicit shifts) Only on little-endian hosts

Design Goals and Trade-offs

Goal Approach
Minimal footprint Fixed 32-byte stack buffer; 8 bytes of state
Portability Standard integer types; no SIMD, no hardware instructions
Simplicity ~30 lines of logic; no lookup tables
Speed Single-pass; rotate-XOR-ADD mixing without multi-round expansion
Not a goal Cryptographic security — no resistance to preimage, collision, or length-extension attacks is claimed

microhash is suitable for hash tables, checksums, data fingerprinting, and similar non-security-sensitive applications where a fast, deterministic, platform-independent digest is required.