Tuesday, August 25, 2009

Using Variable Page Sizes without revamping UVM

Here's an idea for a simple change to UVM to allow pmaps to take advantage of larger mappings.

The first part is to add "special" aligned free and inactive page lists apart from the normal free list. These lists contain pages who's physical address is aligned one of a power-of-2 boundaries: 16KB, 64KB, 256KB, 1MB, 4MB should be enough.

The second part is that when uvm_map maps a page from a map entry, it tries to allocate a physical page that conforms to va & (map_entry's alignment_mask). If this is the first page of the entry, it tries to allocate from off of the aligned page lists. If successful, simply use that page. If not, reset the alignment mask to 0.

When a page is other than the first page, see if the page at (first page pa + (new va - first page va)) , if there is a first, page, can be used and use it, if that fails, and this page's (va & alignment mask) == 0, act as if it's a first page.

Otherwise, if the page belonging to (va ^ PAGE_SIZE) and is mapped and (its pa ^ its va) & PAGE_SIZE == 0, try the physical page at (its pa ^ PAGE_SIZE).

Next, if the page is even, try the pa for the previous, or if odd, the pa for the next page...

Should at least in most instances give you contiguous runs of 2 pages, and hopefully more.

Dumping swap partitions

About the only to use a swap partition these days is so there is someplace to store crash dumps. NetBSD will happily swap to files on a filesystem but it won't dump to a file.

Even 30 years, VMS could dump to file. It did so by using a very simple technique, when you specify the dumpfile, the kernel mapped it with a cathedral window. This was a term for a file mapping containing all the extents of the file, basicly a list of starting sector and sector count for each and every file extent used by that file.

There is no reason why NetBSD couldn't do the same thing. When a swapfile is added, simply record all of it extents. Of course, if the swapfile is a sparse file this won't work so rejecting sparse files as swapfiles might be acceptable. This also the problem of needing to find a buffer in low memory situations to read the swapfile extents (since a complete mapping is not stored anywhere). A VFS hook will be needed so prevent the file from deleted or truncated.

To the core dump code, the change is trivial. Instead of a dev_t, it will be a dev_t and a list of extents. Simply fill-up an extent and move the next until all have been exhausted. For the swap partition case, a single extent is supplied starting at 0 and with a length of the partition.

Sunday, August 23, 2009

mips64 multilib support

My current plan is to have the toolchain default to the N32 (IPL32 LL64) ABI for most programs but allow N64 programs to be built. The N64 libraries would not be required to live under a emulation but would be integrated with N32 libraries under /usr/lib.

But first is the ld.elf_so problem. If you are going to run O32 binaries, they will expect to execute ld.elf_so and find binaries in lib or /usr/lib. Now we could tweak ld.elf_so to prefer libraries in ${LIBDIR}/${abi} but should N32/N64 binaries use ld-${ABI}.elf_so to preserve use of O32 binaries? The kernel could also changed to change ld.elf_so to ld-${ABI}.elf_so and use that in preference. Got to ponder on this problem.

Let's say we have /usr/lib/libc.so and /usr/lib/n64/libc.so. Now for maximum performance, do you want to support ${LIBDIR}/${ABI}/${ARCH}/libc.so where the dynamic loader looks at the architecture embedded in the ELF flags and tries to find a library there first, then fall back to ${LIBDIR}/${ABI}? It seems to be a win in that you can have a system with tuned libraries for mips1, mips3, mips32, mips64 without having tied to that hardware.

mips32 is only use for O32 binaries any platform running N32 or N64 is going to have to be 64bit in nature.

Friday, August 21, 2009

FP Emulation

MIPS processors require anything from some help for denormalized and related matters to complete emulation of all FP instructions. Combine that with the emulation required for instructions in branch delay delays you can get a pretty large amount of emulation code in your kernel. While the latter is pretty much required to be in the kernel, the former could almost as easily live in userspace.

A simple signal like mechanism could quickly dispatch the Coprocessor Not Available exception back to user space using siginfo and a ucontext_t with the state of the lwp at the time of exception. The FP user code can then do the emulation, fixup the ucontext_t contents as needed, and then clean up with a setcontext(ucp) to resume execution. This does mean that setcontext will need to be able to restore entire context, not just the callee-saved registers.

This also means you can try different emulators without having to recompile your kernel or reboot. You've already taken the exception into the kernel for Coprocessor Not Available. You still need to return to userspace. The only difference is where you do the emulation. The flexibility you get by placing letting usermode do the emulation seems to be a clear winner.

One thing to note in this model is that the kernel never really has a copy of the FP state, that state is contained solely in user somewhere, probably in a per-thread context area.

Saturday, August 15, 2009

Include Machinations

For several releases I've want to make a radical change to how machine-specific include files are handed in NetBSD. Currently grabs files of the platform by following the symbolic that points to the platform specific directory (pmax, mac68k, macppc, shark) instead of the architecture specific directory (mips, m68k, powerpc, arm).

This has two advantages: the first is that would look identical to all platforms of that architecture; the second is that existing includes that just do #include can go away since they are no longer needed.

The disadvantage is that files that are platform specific like would need to be truly made platform independent which wouldn't be a bad thing.

Monday, August 3, 2009

TLB Miss (& Mod) Lookup

One critical thing for good performance on MIPS is the speed of the TLB handler. Since a 64 entry TLB with a 8KB page size can address a maximum of 1MB of address space, you really want this to be as efficient as possible since you will likely reloading TLB quite often.

My current idea is to have per-cpu 32K entry caches of PVP tables. As the number that are mapped, so will the number of allocated PVP tables. If all 32K are allocated, that will mean a total of 128MB will be used to lookup between 16 to 32 GB of virtual address space. Given that ASIDs are per-cpu, the ASID of the PV entry will placed in the upper bits of the PV address and masked out before use. As the system starts, all 32K entries will point to a common PVP table and as entries are added, additional PVP pages will be allocated and PVs distributed among them.

To look up a VPN one would do (via xkseg or kseg0):
randomizer = (vpn <>pm_randomizer)
idx = ((vpn ^ randomizer) >> 10) & 0x3fff;
pvp = pvps[idx];
idx = (vpn ^ ramdomizer) & 0x3ff ;
pv = pvp[idx];

Note that ASIDs are not used to compute the PVP or PV index since they are fleeting and would require updating of the PVPs. It could be done but might incur more overhead that I'd like.

if the pv doesn't matter the asid/vpn, a look at a secondary location in the PV page at TBD relatively offset will done.
If that, fails then fallback to a pmap_lookup or uvm_fault.

This is less an inverted page table than a "global" page table.