Deeper than deep: architecture

Hi all,
After spending some time with testing, prototypes implementation, investigation and so on, new vfs layer was born. Actually it's going from my last prototype, but there are many stuff to implement (a huge set of small things). But from now it's possible to tell - it's completely new and completely different from old vfs.
The question is why the project needed a new vfs layer, it might be explained.
First reason is architecture design changes killing separate vfs service (it moves to domain holder), actually vfs service has a search tree and task resource storage and it was a dirty job to keep it sync with name service and procmgr service events. It was a really small, but design was not unclear and each extension was a headache. Also, the old architecture design required an additional fslink service and special tool to have a deal with it, instead of running filesystem service itself. As the last argument - it was too hard to collect stats about vfs resources.
Second reason is a moving from unirpc to rpcv7 object model that changes all concept of rpc layer design and implementation i.e. to adopt old vfs layer you will need to completely rewrite all rpc related parts.
Third reason is a bulky design of old implementation, it was growing from very limited design and wasn't ready to extend in a normal way. I mean implementation of extended things such as async i/o, sockets and other IPC objects was difficult. Also, old vfs layer was not optimized by design, that's because vfs layer has a low priority while project began. There are many problems with it - really slow cache, memory mapping was screwed ugly after general parts was implemented, name resolving was made quickly and it's a potential hole to make a DDoS for vfs service. Well, there are many bad things might be enlisted here.
Conclusion - I didn't find any reason to adopt old vfs layer instead I decided to design and implement a completely new one.

Design was going around the principles explained below.
Optimization via architecture design and extensive use of the all advanced features provided by the microkernel. µString has a good IPC feature - message forwarding. It was implemented while old vfs architecture design was complete and using this feature was able to create a dramatically changes in old vfs implementation. VFSv2 (I will call old vfs layer implementation so) use a generic IPC scheme i.e. receive - resolve - reply where resolve creates new IPC cycle send - wait for reply for each filesystem (sure, in case of symlinks only), that made vfs service thread to work with one request while resolving, opening and so on - simply said vfs thread was blocked by the client request, while this request will be done. VFSx (I will call new vfs layer as VFSx from here) is built on the transaction model, that mean each client request is a transaction with unique ID, vfsx thread will create a transaction, resolve the name, append a special header to the message and forward it to the file system, in case of file system return vfsx thread will update special header, resolve again and forward again. File system take care about transactions too, if request is processed filesystem will commit or decline the transaction and return data to the client, if request resolve should continue (in case of symlink for example) filesystem will modify special header with a symlink data and forward it to vfsx. This approach keeps vfsx threads from blocking, because transaction operations, resolving is a very fast, especially if we will compare it to vfsv2. Another point is a double buffering of client i/o metadata to communicate with opened resources and huge syncing operations set. VFSx was included to the domain holder service, this service is a task manager as well as name resolver, domain holder contains all special entries for each task as well as task tree (µString provide very basic functionality and doesn't know anything about terminals and other service side OS implementations), one of those information is the task iolinks. iolink is an abstraction make client connection to the resolved resource, to keep it in a required way domain holder must know all client connections i.e. iolinks to submit an events to services holds client iolinks (for example while client forking you need to clone all of its own iolinks). To avoid additional copying and IPC cycles, each task has a shared pages with domain holder (it made in security way), this area is served by the client and by domain holder. Finally, this modification allows to ease implement a failsafe feature. VFSx is designed to be tolerate for file system crash, in case of this event domain holder marks iolinks dead, when the filesystem was rebooted the old file system entry on domain holder marks restored, and if client will try to interact with iolink it will be restored and pointed to a recently rebooted filesystem instead of old crashed one. Going more deeper - vfsx allows async i/o mode and provide events on vfs objects, that was totally absent in vfsv2 (yep, excluding network socket implementation that was ugly screwed to the vfsv2 layer).

Tolerate to the µString(it's a microkernel if you don't know) modifications: yes, I decide to implement a new IPC type avoiding copy_from/copy_to and double/triple buffering. This feature moving its own roots to the vfsx block backend and memory mapping implementation. Briefly, filesystem didn't send some buffer with data copied from page cache, instead of it filesystem just send an event with information about page location to the block device service, block device service just "map" it and do all read/write operations. Later (in newer releases) Jari OS will provide an additional read/write functions works as mentioned for the libc to extend POSIX read/write functions. What about mappings, well µString now will send all required events on mmap()/msync()/munmap()/etc.

Provide clean and easy customizing API - it will allows to deploy filesystem porting really faster than on vfsv2. This also affects old known vnode cache - in vfsv2 it was vnode and dentry cache united together, vfsx has the dir_entry cache and fnode cache in such way as it implemented in many systems (em, Linux like example).

Overall, vfsx is more faster, cleaner and feature rich than vfsv2. I cannot describe all changes and all implementation specifics. However, I will try to explain more with the next posts, and documentation.

See you,

Backing to old days the first tree of microkernel was targeted to the x86 (32bit), going deeper the parent of all post projects (ilix) was targeted to embedded hardware i.e. arm and one internal architecture that was 24 bit.
Nowadays we're have a public accessible and cheep 64bit long architecture, x86 continuing but with 64bit long extension, but on real - it's looks different.
AMD64 allows many good extended features compairing with basic x86 32bit long structure. To use AMD64 features on full filled hand we're must operate in long mode (amd64 specific mode).
On truth, long mode is a mixed mode i.e. it operates on 64-bit mode and compatibility mode at one time. It has self minuses - it's a flat memory model. But it has 64-bit addressing and we're don't need for tricks with extension from intel to use more than 4Gb address space addressing, that relatively ugly and looks like an ugly hack.
Like a microkernel developer and low developer at one time - the first problem is initialization.
I've read AMD64 documentation directly from AMD, but there are no anything to really help with it, not at all, but if you want to make a sense quickly it's not usable.
I've designed a trick with it, saying simply it consist from following steps :
32bit code:

- initialize stack pointer (regarding your boot method and loader, I'm using grub and multiboot)

- init bootstrap GDT

- jump to 'meet point' within existing GDT

- save parameters from grub (will be need on if you using grub)

- we're falled on 32bit protected mode (legacy mode on AMD64)

- check for varios CPU features (it must be made if you want to made all like it must be in good kernels)

- check for AMD64 long mode support is a really needed (you should do it, otherwise you can except some bug - I don't know why - check it if I correct)

- all is ok, - enabling 64bit page translations (regarding to documentation set - cr4.pae=1)

- setup pages tables

- enable long mode (via EFER reg - setting LME to 1)

- enable paging in long mode (it will activate long mode and we're falling to compatibility mode)

- just jump to your 64bit code

64bit code:

- do your stuff, have a lot of fun ... ;)

I'm sure that there are direct long mode switching, but for me it was more quickly to make it like I've describe.

Deeper than deep

Pages

Thursday, July 26, 2012

New vfs for Jari OS (in general)

Friday, June 20, 2008

AMD64 #1 - Long Mode

Who is here ?

Blog Archive

Active projects