paging - Paging, page faults, and user virtual memory
Header: kernel/include/kernel/paging.h
Source: kernel/arch/i386/mm/paging.c
Enables the x86 32-bit paging unit, maintains the early identity map used by
the kernel, and participates in the user virtual-memory model used by ELF
loading, fork() copy-on-write, and anonymous mmap.
The paging subsystem is split across several files:
| File | Role |
|---|---|
arch/i386/mm/paging.c |
Boot page directory, PSE identity map, CR0/CR3/CR4 setup. |
arch/i386/mm/vmm.c |
4 KiB page-directory helpers for user address spaces and COW clones. |
arch/i386/mm/pmm.c |
Physical frame allocation and per-frame refcounts. |
arch/i386/debug/debug.c |
Page-fault diagnostics and COW fault resolution. |
arch/i386/proc/syscall.c |
mmap2/munmap syscall implementation. |
OSDev references:
Boot-time identity map
paging_init() identity-maps the first 256 MiB of physical memory
(physical address = virtual address) using 4 MiB PSE large pages.
Before loading CR3, CR4.PSE (bit 4) is set so the processor honours the
PS bit in PDE entries. Each of the 64 large-page PDE entries maps one
aligned 4 MiB region directly - no intermediate page table is needed.
This is the 32-bit equivalent of the 2 MiB large pages used by x86-64
long-mode kernels.
All pages in this range are supervisor-only and writable. User mappings are created later as 4 KiB PTEs in per-task page directories.
This window covers:
- The kernel image (loaded at 1 MiB by GRUB).
- The VGA text buffer (
0xB8000). - The PMM bitmap, page directory, and other BSS/data.
- ACPI tables placed by firmware anywhere below 256 MiB.
Kernel region mapping
A static pool of 128 extra 4 KiB page tables (extra_page_tables) handles
post-boot mapping requests for addresses above 256 MiB. Each page table
covers 4 MiB of virtual address space, so the pool can map up to 512 MiB of
additional regions. The pool was grown from 32 to 128 so a high-resolution
framebuffer, the SVGA II command FIFO, and the ACPI tables can all be mapped
without silent truncation — a partial framebuffer map previously caused a #PF
on the first present at 1080p.
Requests that fall within the 256 MiB large-page window are detected by
checking the PAGE_LARGE flag in the relevant PDE and silently skipped.
Functions
paging_init
void paging_init(void);
- Set
CR4.PSE(bit 4) to enable 4 MiB large pages. - Fill 64 PDE entries with identity-map entries for 0–256 MiB (PS bit set).
- Register
page_fault_handleron ISR 14. - Load CR3 with the physical address of the page directory.
- Set CR0 bit 31 (PG) to enable paging.
paging_map_region
void paging_map_region(uint32_t phys_start, uint32_t size);
Identity-map the physical address range [phys_start, phys_start + size).
Pages that fall within a large-page PDE entry (already covered by the
256 MiB boot window) are silently skipped. If the page-table pool is
exhausted the function returns without mapping anything further.
| Parameter | Description |
|---|---|
phys_start |
Physical start address. Need not be page-aligned; the function rounds down. |
size |
Length in bytes. Need not be page-aligned; the function rounds up. |
Called by:
heap_init()- to map the 16 MiB heap region.vesa_tty_init()- to map the VESA linear framebuffer before the first pixel write.acpi_init()(viaacpi_map_table()) - to map RSDT, FADT, and DSDT.
User address spaces
User tasks run in per-task page directories created by the VMM layer. The kernel identity map remains present so kernel code can continue to access its own text, data, heap, device buffers, and low physical memory after switching CR3. User program pages are installed as 4 KiB user PTEs.
Important user mapping sources:
| Source | Mapping behavior |
|---|---|
| ELF loader | Maps PT_LOAD segments at the addresses requested by the executable. |
| User stack setup | Allocates and maps a ring-3 stack for process startup. |
fork() |
Clones the parent’s address space with copy-on-write PTEs. |
mmap2(MAP_ANONYMOUS) |
Allocates zero-filled pages in a per-task anonymous-mmap window. |
Makar does not use a higher-half kernel layout. Kernel and user virtual addresses share one 32-bit address space, and paging permissions enforce which pages ring 3 may touch.
Copy-on-write
fork() uses vmm_clone_pd_cow(parent_pd) instead of eagerly copying every
user page. Writable user PTEs in the parent and child are changed to:
- present;
- user-accessible;
- read-only;
- tagged with software bit
VMM_PTE_COW(0x200); - backed by the same physical frame with an incremented PMM refcount.
When either process writes to such a page, x86 raises a page fault because the PTE is read-only. The fault path checks:
- the fault was a write;
- the PTE is present;
- the PTE has
VMM_PTE_COW; - the access belongs to a user mapping that the COW resolver understands.
If the frame refcount is 1, the resolver can clear the COW bit and restore writability in place. If the refcount is greater than 1, it allocates a fresh frame, copies the old page, maps the fresh frame writable for the faulting process, and decrements the old frame’s refcount.
paging_init() enables CR0.WP so supervisor writes also honour read-only PTEs.
That matters because tests and some kernel paths may write through user virtual
addresses while the page directory is active; without WP, kernel-mode writes
would bypass the COW fault and corrupt the shared frame.
Anonymous mmap
SYS_MMAP2 implements the Linux i386 syscall number 192 for anonymous
mappings. The current implementation is intentionally narrow:
MAP_ANONYMOUSis supported;- file-backed mappings are rejected;
- each mapping receives fresh zero-filled pages;
- protection is limited to the page flags Makar currently enforces;
- mappings are assigned from a per-task bump pointer starting at
USER_MMAP_BASE; - address reuse is not attempted when a mapping is unmapped.
SYS_MUNMAP unmaps the requested page range and frees the frames. It does not
rewind the task’s mmap_next pointer, so a later anonymous mapping receives a
new virtual range even if the old range was freed.
This is enough for allocator tests and hosted-libc bring-up paths that need anonymous heap-like memory, while keeping file mapping and demand paging out of scope.
Page-fault handler
The page-fault handler is installed on ISR 14. It reads the faulting address
from CR2, prints it alongside the error code when the fault cannot be resolved,
and calls PANIC("Page fault") for unhandled cases.
Before panicking, it gives the COW resolver a chance to handle write faults on COW-tagged user pages. A successful COW resolution returns to the faulting instruction, which then retries the write against a private writable page.
The error code bits indicate:
- Bit 0: 0 = not-present, 1 = protection violation.
- Bit 1: 0 = read, 1 = write.
- Bit 2: 0 = supervisor, 1 = user mode.
See the Page Fault OSDev article for the full error-code description.
Current limitations and future work
- No file-backed
mmap. - No demand paging from disk or swap.
- No address-space layout randomisation.
- No guard pages around every kernel stack.
- No SMP TLB shootdown mechanism; Makar is currently uniprocessor.
- No higher-half kernel split.
Those are deliberate constraints, not missing pieces for the current hosted
libc milestone. The active baseline is per-task address spaces, COW fork(),
anonymous mmap2, and deterministic in-guest tests for those mechanisms.