Thursday, July 26, 2012

New vfs for Jari OS (in general)

Hi all,
After spending some time with testing, prototypes implementation, investigation and so on, new vfs layer was born. Actually it's going from my last prototype, but there are many stuff to implement (a huge set of small things). But from now it's possible to tell - it's completely new and completely different from old vfs.
The question is why the project needed a new vfs layer, it might be explained.
First reason is architecture design changes killing separate vfs service (it moves to domain holder), actually vfs service has a search tree and task resource storage and it was a dirty job to keep it sync with name service and procmgr service events. It was a really small, but design was not unclear and each extension was a headache. Also, the old architecture design required an additional fslink service and special tool to have a deal with it, instead of running filesystem service itself. As the last argument - it was too hard to collect stats about vfs resources.
Second reason is a moving from unirpc to rpcv7 object model that changes all concept of rpc layer design and implementation i.e. to adopt old vfs layer you will need to completely rewrite all rpc related parts.
Third reason is a bulky design of old implementation, it was growing from very limited design and wasn't ready to extend in a normal way. I mean implementation of extended things such as async i/o, sockets and other IPC objects was difficult. Also, old vfs layer was not optimized by design, that's because vfs layer has a low priority while project began. There are many problems with it - really slow cache, memory mapping was screwed ugly after general parts was implemented, name resolving was made quickly and it's a potential hole to make a DDoS for vfs service. Well, there are many bad things might be enlisted here.
Conclusion - I didn't find any reason to adopt old vfs layer instead I decided to design and implement a completely new one.

Design was going around the principles explained below.
Optimization via architecture design and extensive use of the all advanced features provided by the microkernel. µString has a good IPC feature - message forwarding. It was implemented while old vfs architecture design was complete and using this feature was able to create a dramatically changes in old vfs implementation. VFSv2 (I will call old vfs layer implementation so) use a generic IPC scheme i.e. receive - resolve - reply where resolve creates new IPC cycle send - wait for reply for each filesystem (sure, in case of symlinks only), that made vfs service thread to work with one request while resolving, opening and so on - simply said vfs thread was blocked by the client request, while this request will be done. VFSx (I will call new vfs layer as VFSx from here) is built on the transaction model, that mean each client request is a transaction with unique ID, vfsx thread will create a transaction, resolve the name, append a special header to the message and forward it to the file system, in case of file system return vfsx thread will update special header, resolve again and forward again. File system take care about transactions too, if request is processed filesystem will commit or decline the transaction and return data to the client, if request resolve should continue (in case of symlink for example) filesystem will modify special header with a symlink data and forward it to vfsx. This approach keeps vfsx threads from blocking, because transaction operations, resolving is a very fast, especially if we will compare it to vfsv2.  Another point is a double buffering of client i/o metadata to communicate with opened resources and huge syncing operations set. VFSx was included to the domain holder service, this service is a task manager as well as name resolver, domain holder contains all special entries for each task as well as task tree (µString provide very basic functionality and doesn't know anything about terminals and other service side OS implementations), one of those information is the task iolinks. iolink is an abstraction make client connection to the resolved resource, to keep it in a required way domain holder must know all client connections i.e. iolinks to submit an events to services holds client iolinks (for example while client forking you need to clone all of its own iolinks). To avoid additional copying and IPC cycles, each task has a shared pages with domain holder (it made in security way), this area is served by the client and by domain holder. Finally, this modification allows to ease implement a failsafe feature. VFSx is designed to be tolerate for file system crash, in case of this event domain holder marks iolinks dead, when the filesystem was rebooted the old file system entry on domain holder marks restored, and if client will try to interact with iolink it will be restored and pointed to a recently rebooted filesystem instead of old crashed one. Going more deeper - vfsx allows async i/o mode and provide events on vfs objects, that was totally absent in vfsv2 (yep, excluding network socket implementation that was ugly screwed to the vfsv2 layer).

Tolerate to the µString(it's a microkernel if you don't know) modifications: yes, I decide to implement a new IPC type avoiding copy_from/copy_to and double/triple buffering. This feature moving its own roots to the vfsx block backend and memory mapping implementation. Briefly, filesystem didn't send some buffer with data copied from page cache, instead of it filesystem just send an event with information about page location to the block device service, block device service just "map" it and do all read/write operations. Later (in newer releases) Jari OS will provide an additional read/write functions works as mentioned for the libc to extend POSIX read/write functions. What about mappings, well µString now will send all required events on mmap()/msync()/munmap()/etc.

Provide clean and easy customizing API - it will allows to deploy filesystem porting really faster than on vfsv2. This also affects old known vnode cache - in vfsv2 it was vnode and dentry cache united together, vfsx has the dir_entry cache and fnode cache in such way as it implemented in many systems (em, Linux like example).

Overall, vfsx is more faster, cleaner and feature rich than vfsv2. I cannot describe all changes and all implementation specifics. However, I will try to explain more with the next posts, and documentation.

See you,

2 comments:

Admin said...

Hi Alfeiks,
Long time no see!
Good luck with your project!

Supervisor said...

Thank you!