Wednesday, August 19, 2009

Vnode cache design

I found a development quest in Jari OS: all begins from device drivers services design, some drivers are actually is a library that linking to service that support some hardware part, in this case driver developer folks need to have a dlopen()-family functions to link this libraries at runtime, for example if block service find a specific host it should be linked runtime, not statically. Well, OS hasn't dlopen-family support functions, saying more - dynamically linked ELF support are very early (it works, but it works slowly and takes many service time slice) - the reason is a poor file mapping support, you can map file, but when you will close file descriptor you will got a segfault on page fault in this mapped area. This is a complex problem, to support dlopen() , ELF support should be improved i.e. work with libraries and binaries should be going outside of process manager service and, not be done via read/write calls - one of the benefits will be a dlopen() support.
I know, that one of the weak place of the file subsystem layer is a libv2(it's a VFS-like layer, but implemented as a library that linked to each file system(don't confuse - VFS service is a filesystem manager, resource storage, redirection mechanism and so on)). Libv2 supports many POSIX features already, and operates via abstract vnode_t (in linux it's inode called) and there are too poor vnode cache, so every mapping is connected to vnode and if cache decide to destroy it - mapping can be lost. Like a solution is a map count handling, but it's a very simple - more complex solution is a creating more sophiscated cache.
Assume a file system with a high load, you are always performs lookups, maps, read, write operations - there are possible situation when vnode might not have positive map and open counters for a short amount of time, and there are possible rarely used vnode i.e. some vnode may be mapped sometime, but not accessible for a day - and it will waste you data structure and memory - and you cannot delete it at all. On other side - you cannot use vnode metadata that connected to real file system node, because lookup calls don't modify accessed time of this. Well, what the solution could be - first I determined several vnode stages with it's own features, second - I designed a table for vnodes that mapped, but doesn't accessed - it will not take resources as much as regular vnode takes, but virtually will present.
The next vnode stages might be:
  • strong vnode: not a candidate to move to the virtual vnode list, has a positive open and mapped count, and was recently used
  • potential oldies vnodes: has only mapped count or opened count positive and wasn't recently used
  • oldies vnode: vnodes in the virtual vnodes list
  • died vnodes: vnodes that was removed via unlink call
At defined period some thread will lookup the cache and decide what to do with vnode, this rate will be calculated by the file system activity (calls per last period). But what about died vnodes? Assume that you mapped file, somebody remove it, vnode id was taken by root only file ... well, died vnodes list as a low cost list that storage vnode of this types and when you will try to read or write to it, or you will unmap it, or got a page fault - information will be updated - you will got an error (or seg fault), file system will decrement the counters, and when this counters will zeroed - died vnode will go to heaven, and free it's id ;)
I'm not sure that I will implement all features at near time, but anyway this functionality will be added to libv2 - and I will continue my quest - I will improve page cache, improve memory events protocol, design and implement linker with dlopen ;)

No comments: