Deeper than deep: Jari OS

Showing posts with label Jari OS. Show all posts

Friday, September 14, 2012

Small step in documenting is a major step in the fight against my laziness

Hello all,
well, today I forgot some stuff from libvfs and I spent about 20 minutes to refresh it in my head.
conclusion - documenting is started - check out here - libvfs documentation embryos

see you,

Thursday, July 26, 2012

New vfs for Jari OS (in general)

Hi all,
After spending some time with testing, prototypes implementation, investigation and so on, new vfs layer was born. Actually it's going from my last prototype, but there are many stuff to implement (a huge set of small things). But from now it's possible to tell - it's completely new and completely different from old vfs.
The question is why the project needed a new vfs layer, it might be explained.
First reason is architecture design changes killing separate vfs service (it moves to domain holder), actually vfs service has a search tree and task resource storage and it was a dirty job to keep it sync with name service and procmgr service events. It was a really small, but design was not unclear and each extension was a headache. Also, the old architecture design required an additional fslink service and special tool to have a deal with it, instead of running filesystem service itself. As the last argument - it was too hard to collect stats about vfs resources.
Second reason is a moving from unirpc to rpcv7 object model that changes all concept of rpc layer design and implementation i.e. to adopt old vfs layer you will need to completely rewrite all rpc related parts.
Third reason is a bulky design of old implementation, it was growing from very limited design and wasn't ready to extend in a normal way. I mean implementation of extended things such as async i/o, sockets and other IPC objects was difficult. Also, old vfs layer was not optimized by design, that's because vfs layer has a low priority while project began. There are many problems with it - really slow cache, memory mapping was screwed ugly after general parts was implemented, name resolving was made quickly and it's a potential hole to make a DDoS for vfs service. Well, there are many bad things might be enlisted here.
Conclusion - I didn't find any reason to adopt old vfs layer instead I decided to design and implement a completely new one.

Design was going around the principles explained below.
Optimization via architecture design and extensive use of the all advanced features provided by the microkernel. µString has a good IPC feature - message forwarding. It was implemented while old vfs architecture design was complete and using this feature was able to create a dramatically changes in old vfs implementation. VFSv2 (I will call old vfs layer implementation so) use a generic IPC scheme i.e. receive - resolve - reply where resolve creates new IPC cycle send - wait for reply for each filesystem (sure, in case of symlinks only), that made vfs service thread to work with one request while resolving, opening and so on - simply said vfs thread was blocked by the client request, while this request will be done. VFSx (I will call new vfs layer as VFSx from here) is built on the transaction model, that mean each client request is a transaction with unique ID, vfsx thread will create a transaction, resolve the name, append a special header to the message and forward it to the file system, in case of file system return vfsx thread will update special header, resolve again and forward again. File system take care about transactions too, if request is processed filesystem will commit or decline the transaction and return data to the client, if request resolve should continue (in case of symlink for example) filesystem will modify special header with a symlink data and forward it to vfsx. This approach keeps vfsx threads from blocking, because transaction operations, resolving is a very fast, especially if we will compare it to vfsv2. Another point is a double buffering of client i/o metadata to communicate with opened resources and huge syncing operations set. VFSx was included to the domain holder service, this service is a task manager as well as name resolver, domain holder contains all special entries for each task as well as task tree (µString provide very basic functionality and doesn't know anything about terminals and other service side OS implementations), one of those information is the task iolinks. iolink is an abstraction make client connection to the resolved resource, to keep it in a required way domain holder must know all client connections i.e. iolinks to submit an events to services holds client iolinks (for example while client forking you need to clone all of its own iolinks). To avoid additional copying and IPC cycles, each task has a shared pages with domain holder (it made in security way), this area is served by the client and by domain holder. Finally, this modification allows to ease implement a failsafe feature. VFSx is designed to be tolerate for file system crash, in case of this event domain holder marks iolinks dead, when the filesystem was rebooted the old file system entry on domain holder marks restored, and if client will try to interact with iolink it will be restored and pointed to a recently rebooted filesystem instead of old crashed one. Going more deeper - vfsx allows async i/o mode and provide events on vfs objects, that was totally absent in vfsv2 (yep, excluding network socket implementation that was ugly screwed to the vfsv2 layer).

Tolerate to the µString(it's a microkernel if you don't know) modifications: yes, I decide to implement a new IPC type avoiding copy_from/copy_to and double/triple buffering. This feature moving its own roots to the vfsx block backend and memory mapping implementation. Briefly, filesystem didn't send some buffer with data copied from page cache, instead of it filesystem just send an event with information about page location to the block device service, block device service just "map" it and do all read/write operations. Later (in newer releases) Jari OS will provide an additional read/write functions works as mentioned for the libc to extend POSIX read/write functions. What about mappings, well µString now will send all required events on mmap()/msync()/munmap()/etc.

Provide clean and easy customizing API - it will allows to deploy filesystem porting really faster than on vfsv2. This also affects old known vnode cache - in vfsv2 it was vnode and dentry cache united together, vfsx has the dir_entry cache and fnode cache in such way as it implemented in many systems (em, Linux like example).

Overall, vfsx is more faster, cleaner and feature rich than vfsv2. I cannot describe all changes and all implementation specifics. However, I will try to explain more with the next posts, and documentation.

See you,

Wednesday, June 20, 2012

Is Jari OS progress stalled?

Generally not. But I didn't made the planned things yet.
Well, I will try to explain why.
First reason - I spent about one year with development/research/testing and investigation about IDL design that might be used system-wide. It was a wrong way, however Jari OS has IDL now.
Second reason - Developers lost an interest, I've tried to find new mad ones, but success was not follow me.
Third reason - I spent a lot of time with other activity, such as non-IT hobbies, contract-job in Nokia and other things.
However, it was a break in Jari OS development, I was need to take a big vacation with my position in Jari OS.
Now it's time to have a hard work on project, for the past two years a lot of good ideas accumulated and it's time to implement many of them.
Also, there are few applications found which can be powered by the project.
Well, see you later.

Tuesday, October 18, 2011

Something more: about general system high-level design

Well, I want to repeat something again, but with more words.

Today I want to introduce intercommunication design within the
whole system.
Firstly, I want to tell - IDL is a good idea in theory and is a bad idea
on practice.
The main reason - it's implementation. IDL usually describe interfaces and,
in addition, creates some code, that packs the request, get it, call it, and, finally,
reply. I.e. receive, call and reply to the message, nothing more.
But, in our system we have something more complex, let's enlist it:

IPC message forwarding
Postponed calls (blocking operations)
IPC message modification operations

Well, on that point you need something more featured than receive - call - reply
cycle, you need to determine when you need to forward a message, on which
point you need to do it, what is immutable message parts and what parts
should be modified and/or cut off.

The other "funny" thing is postponed messages, in this case IDL generated code
must contain many stuff like quick allocation of memory and other stuff.
Finally, generated code is coming to be huge, and in addition you need to modify
it in some different cases.

Another problem is a problem with interfaces, i will try to describe it briefly.
For each instance you have an interface with a predefined set of functions,
many functions are similar, or identical: for example, device has a read() function,
and file has a read() function, but you have a one set of interfaces for file system
and other set of interfaces for devices. That mean that you need to try develop some
generic interface, and extend it every time with your own functions again and again.
On practice you will have a big set of libraries with interfaces and it will be a real headache.

From other point of view, you can resolve the problem with it on the libc client side,
just determine the resource and call specific function i.e. POSIX read() in our case
will call read_file() or read_device() functions depends on the opened resource.
I think - the last idea is very very ugly, I don't think that it's a good idea to change libc
every time when you adding something new to the system, also, we should be modular,
in some cases I can turn off support for pipes of sockets - and I don't want to recompile
libc, all the other system parts for it.

That's why IDL sucks, I spent many time to solve the problems with nontrivial things like
message advanced control and I don't see the sense. But, the old one - uni_rpc_t is sucks too,
on practice uni_rpc_t creates more problems that it's should solve.

And, I faced that pretty solution going from other way to solve this problem.
Each time while you design RPC/IDL/something_else_... you should think more closely
to the system and it's objects, my error was in the way to solve this problem, I think
about RPC/IDL/etc ... only, but you need to think about system also.
What I mean: system might be presented like a set of objects hosted on different servers and
interacting each one with other, and in this case there are no difference between
file and task, i.e. file, task, device, ipc object - it's node, with predefined set of functions.
Well, yes, task and file has a different operations, but all possible operation might
be represented via one set.
Anyway you will have a very similar operations, in example, changing file owner uid and changing
task effective uid is very similar, on RPC level there are no difference between it.
What I decide: I decide to represent all objects (or it's high-level representation) as a node,
each node has a 2 groups of operations, first one for control the node and manage it attributes,
second one for data i/o (yep, task will doesn't have data i/o interface, but we can do it, if needed).
In that case RPC is going to be simple, you have RPC signature in every message,
this signature points to the following things:

Function group
Function within group
IDs to select right node for operation

Other, after RPC signature, is going to the implementation.

How are ipcbox and sbuf abstractions used ?
It's a good question with a brief and simple answer, ipcbox is used by the
rpc library routines to operate with IPC, sbuf is used for data, because implementation
gets the sbuf like a data, rpc code don't allocating anything for implementation,
just sbuf.
Why is it pretty ?
Because it's simple and this solution doesn't require to implement a huge set of
API for each task, i.e. good old getvfspid() getsomeothercrap() bla bla bla.
Each task, has the special system reserved iolink, depends on task role (file system,
regular task, translator, resource carrier, etc ...) its node has a set of operations.
For example, to make a fork() you just need to make a control request via system
iolink to your node with fork() related data, to change your effective uid you just
send a stat request to your node representation, but if you are a regular task (i.e. doesn't have a rights
to link file system onto namespace tree) your node don't have the operation of such kind,
otherwise you will able to make a special call to your system iolink to do it.
Is it simple ? I guess it's very simple solution.

Friday, October 07, 2011

Jari OS RPCv7 and Vnode

I wrote before why IDL failed. And I know that old uni_rpc_t RPC is failed too, it's not flexible, ugly and slow.

I'm working on this system about 5-6 years, well, and there are 6 attempts to create good IPC/RPC chain:

- v1 create a lot of structures and calls and implement it on the microkernel side

- v2 create one generic RPC in microkernel side

- v3 remove it from microkernel

- v4/v5 uni_rpc_t and related, try to assign all operation to the standard POSIX calls

- v6 IDL

None of this approaches are applicable.

I guess it's all about a way to resolve the problem. Each time I was try to solve one problem, not the set of those.

But, let's back to my favorite method of problem solving: divide and solve.

When you have a one big problem, it's better to divide it to many little ones, and solve it - one by one, and finally your big problem will be solved. Simple, isn't it ?

And now, we have a system, with many objects, with many problems, but ... make a guess, all of operations within this complex system is requests to the objects.

In that case, you just need to create the one generic object that may represent anything you want and provide generic interface to this object.

Well, you are welcome to the RPCv7.

Usually you can divide all your operations between several groups, to take a data, to control object and so on. Also, in Jari OS you need to have session with this object. That's all.

All your requests might be described with one generic header, not the data of the requests, not the some specific flags like it was done in v4/v5 versions, the header will describe request only.

Checkout this http://jarios.org/gitweb/?p=jari_os.git;a=blob;f=libs/uclibc/libc/sysdeps/jarios/common/sys/rpcv7.h;h=578b978d69fd2d2b6d0f0b990b231114d1058dc2;hb=HEAD

struct rpc_signature.

Other stuff, are not RPCv7 stuff, it might be your data, headers and so on.

It was a first small problem solved. The second one is object representation.

And I got it, vnode, all things might be represented as vnode, it might have a memory mapping, set of many specific operations, many IDs, relations ... and RPC callbacks, the standard set

of it.

Here you can look at this ones: http://jarios.org/gitweb/?p=jari_os.git;a=blob;f=libs/uclibc/libc/sysdeps/jarios/common/sys/vnode.h;h=f466d9b1e8542ab7e4e54dff8ada5a24890c0761;hb=HEAD

All you need - just implement this functions and forget about IPC/RPC headache.

That's an example of the fork() - http://jarios.org/gitweb/?p=jari_os.git;a=blob;f=libs/uclibc/libc/sysdeps/jarios/common/fork.c;h=b18fd40e95f8a1016736cf83aada8689e9432375;hb=HEAD

And carrier function:

http://jarios.org/gitweb/?p=jari_os.git;a=blob;f=services/nscarrier/vnode_task.c;h=006d0f7573b7771b7c140e1262c2f39d16c5a671;hb=HEAD

All data going via sbuf, just forget about IPC/RPC chain.

Well, I will write later about this implementation more, but you can always checkout the fresh sources.

Tuesday, December 08, 2009

The venture should always have a logical end.

Recent months have become a complete disappointment in my life, but my life is not so interesting, by the way my life has a feedback to my projects.
I'm a little upset, to be honest, I'm defeated with project development, and mostly with it goals.
I'm afraid that it will be not so possible to complete the Jari OS as a platform, but the hope remains.
I have a look at the project from the other side, from the side of engineer need to choose a base platform for the final solution. Jari OS isn't ready, and will not ready for several years (I don't think that good news happens) and it's very sad. Anyway, I'm learn again and I have a couple of real problems in the design architecture, management and so on, I know - there are many stuff written and works now. I cannot told that the project failed in case of it's basics concepts or other more deeper reasons, and I cannot told that project is succeful - this venture project hasn't a logical end, but it should has.
Well, I decide reorganize my works on the project and try to give a new life to Jari OS as soon as it possible.
First, I will write a documentation about platform, fix a design and so on.
Second, I will try to have a wide view of the future requirements.

When Jari OS will finished (it might be fail, might be not) - this will be the logic end of the venture, if it will has a continue - it will became a serious project.

Tuesday, November 03, 2009

Interest and finance always diverge.

I always tried to combine the work on interesting (and even better on my own) project, and to capitalize on this.
I was never interested in large amounts of money.
I can say that it turned out, several times and with varying degrees of success. And I can do what the findings of these cases, which may be useful in the future.
First, if you managed to establish your own small company for money sponsoring organization, then you should not relax. Financing institution may devour you with your project. In such cases it makes sense to have several organizations concerned with various equity participation.
In general, your own company is a separate conversation.You will not be able to form a small company and develop further in most cases. I'm not a businessman, and therefore can't discuss other ways of creating a company. From the viewpoint of an engineer is better to go a different way. For example, you already have a project that is developed. It's not a secret that the development of the project rather resource-intensive exercise, and intensive development without funding is unlikely. In this case it's necessary to find company, which is interested in such kind of a project. If such company was found, then you hire to work and will assemble a team of specialists. But you shouldn't think then that life was a success, you will have a short time and insatiable wishes of management, and these suggestions will tend to rise.
The deal with the closed project makes no sense in this case - your exciting project will turn into another ugly products, and work on it will be the usual routine. Make your project licensed under free software license (GNU GPL/GNU LGPL), companies offer a product, not a software in most cases. Also, take an existing open and free source code to your project - no one is interested with development all things from scratch.
But don't forget - this symbiosis is not eternal. Ultimately, when your design will come from the door of the research department, the company will not expand the work on your project, instead, your work will be directed to the specific requirements of company management. At this stage it is necessary to separate the project from the company and develop it in an open community of developers. It follows from this - to work on the project is open, and in the broader framework than require your employers.
Well, Jari OS is on this stage now, and I hope that all works on this project will be helpful for community of free software. Also, I will continue works on project with smaller team.
In case of Jari OS, I had a several mistakes: a small open community, not documented internals.

Tuesday, September 01, 2009

Second stage of Jari OS development is one year old!

Hi readers, I cannot keep silent about it. Here the full story - http://jarios.org/node/35.

Thanks.

Wednesday, August 19, 2009

Vnode cache design

I found a development quest in Jari OS: all begins from device drivers services design, some drivers are actually is a library that linking to service that support some hardware part, in this case driver developer folks need to have a dlopen()-family functions to link this libraries at runtime, for example if block service find a specific host it should be linked runtime, not statically. Well, OS hasn't dlopen-family support functions, saying more - dynamically linked ELF support are very early (it works, but it works slowly and takes many service time slice) - the reason is a poor file mapping support, you can map file, but when you will close file descriptor you will got a segfault on page fault in this mapped area. This is a complex problem, to support dlopen() , ELF support should be improved i.e. work with libraries and binaries should be going outside of process manager service and, not be done via read/write calls - one of the benefits will be a dlopen() support.
I know, that one of the weak place of the file subsystem layer is a libv2(it's a VFS-like layer, but implemented as a library that linked to each file system(don't confuse - VFS service is a filesystem manager, resource storage, redirection mechanism and so on)). Libv2 supports many POSIX features already, and operates via abstract vnode_t (in linux it's inode called) and there are too poor vnode cache, so every mapping is connected to vnode and if cache decide to destroy it - mapping can be lost. Like a solution is a map count handling, but it's a very simple - more complex solution is a creating more sophiscated cache.
Assume a file system with a high load, you are always performs lookups, maps, read, write operations - there are possible situation when vnode might not have positive map and open counters for a short amount of time, and there are possible rarely used vnode i.e. some vnode may be mapped sometime, but not accessible for a day - and it will waste you data structure and memory - and you cannot delete it at all. On other side - you cannot use vnode metadata that connected to real file system node, because lookup calls don't modify accessed time of this. Well, what the solution could be - first I determined several vnode stages with it's own features, second - I designed a table for vnodes that mapped, but doesn't accessed - it will not take resources as much as regular vnode takes, but virtually will present.
The next vnode stages might be:

strong vnode: not a candidate to move to the virtual vnode list, has a positive open and mapped count, and was recently used
potential oldies vnodes: has only mapped count or opened count positive and wasn't recently used
oldies vnode: vnodes in the virtual vnodes list
died vnodes: vnodes that was removed via unlink call

At defined period some thread will lookup the cache and decide what to do with vnode, this rate will be calculated by the file system activity (calls per last period). But what about died vnodes? Assume that you mapped file, somebody remove it, vnode id was taken by root only file ... well, died vnodes list as a low cost list that storage vnode of this types and when you will try to read or write to it, or you will unmap it, or got a page fault - information will be updated - you will got an error (or seg fault), file system will decrement the counters, and when this counters will zeroed - died vnode will go to heaven, and free it's id ;)
I'm not sure that I will implement all features at near time, but anyway this functionality will be added to libv2 - and I will continue my quest - I will improve page cache, improve memory events protocol, design and implement linker with dlopen ;)

Saturday, May 09, 2009

SDK tools for OS development

New ideas came to my head, all about operating system features and development.

On last friday I decide to make some modifications on autotools stuff in GNU sed and try to compile it with Jari OS libc headers and functions. Yep - it's works! This means that our libc is ready for most of cases, many of them working now. This event gave me some new ideas about development tools for Jari OS: firstly I will port GNU gcc to Jari OS (I mean, that gcc will handle our ld scripts and libraries right, without many keys and other things), secondly I will construct SDK template (already done, but not publicied yet).
Also, to have a better support and big community additional feature needed - IDL integrated to SDK. Well, it's mean - that you have set of the templates for each type of service i.e. block device driver library, char device driver, file system and so on ...
In most cases you shouldn't known about specific internal things of system libraries, also you shouldn't care about interface changes (headers renaming, new functions, new order of calls and so on). This feature will generate libraries specific code in compile time and link your specific implementation. I.e. what will be written one time - will live all time ;)
For this feature it's better to use scheme (for templates, configs and other stuff) and some core of this will be written traditionally on C.

Wednesday, April 22, 2009

Jari OS going to be a completely OS

My last task in Jari OS project was symlinks supports, now I'm working on postponed logic in libv2 (general library for file system services).
I cannot be on track in changes ;) Now project has a POSIX, fork(), execve(), file systems, initial networking support.
On background I'm porting GNU coreutils to Jari OS, progress going fast, I think that alpha release will has many interesting stuff.
To be a fully featured OS not so many features need, our ext2fs support and IDE drivers are 80% completed.
I think that year of active development that will finished on 1-sep-2009 will made a first beta version with few drivers and networking support.
Now prject has one network device support (e1000) and Jari OS host (network service) replies on ICMP requests.
What about ELF support, now we're working on dynamic libraries support - it coming soon, currently only static ones supported.
Files i/o is ready, but maybe some functions (not often used ones) doesn't works properly. To make a fully featured shell pipes requered, and I'm working on it now.
I'm planning to implement shared memory fs and POSIX functions for it, in addition I will add Jari OS specific API for this, - for some internal purposes, for GIX(GUI service of Jari) like example.
After alpha release (1-jun-2009) I will take time for GIX implementation. Also - there are many porting tasks.
I hope community of open-source developers will be interested in new microkernel OS - it's a good field for research, development and it's good to take skills up-to-date.
So,

Thursday, March 12, 2009

Jari OS: fork() in microkernel based OS

Well, this day makes huge parts of my brains works.
Designing microkernel systems you have a good skills base, designing microkernel and multiservice system you have an excellent skills base.
Today was a day of fork() call - it has many things to discuss, think, imaginate and so on.
At the end of a day, after all problems was detected - I found a solution to solve fork() call.
Let's begin and I will explain fork() nature as is...
Fork is a POSIX call that creates a clone of a calling process - with one thread, but the clone should have the same resources - i.e. memory (not at all), opened file descriptors, IPC stuff.
In Jari OS resources are isolated - VFS know about files, kernel knowns about IPC and so on - you will have a headache about syncronization, performance - always while architecture design - welcome to microkernel-based OS.
In this case you need to develop non-trivial trick with things that monolithic kernels made easly, but ... microkernel-based OSes is clever and better at all.
Well, you have a task, you have a resources and you need to fork() POSIX call - the first solution is simple - just delegate this job to process server and deal is made, within server timeslice - and this is bad (but QNX does it in this way). Other solution is more sophiscated - and you need to make a stages for fork():

Request to delegate a fork() for myself
Make a fork() skel within my timeslice
Send a request about fork() call made by me

In this big deal you will affect microkernel, process service, virtual file system service and your task.
Going deeper I will explain each stage of fork() and after those I will explain how it works on different sides.
Your first request is a request to the process service where you tell that you wants to make a fork(). Process service will create a mould on VFS of your resources, toggle bit in kernel that allow you to make a sys_fork() syscall, and if all is ok it will reply with ok message :)
After this you are going to make a syscall to fork - be warn! kernel toggle off bit that allows you to make this call, and you cannot do it yourself - only trusted process service does.
Call is done - you have a skel - you are making a second call to process service - i.e. - "fork is done - please make my child runnable" - process service tells VFS to make mould active and working - and changes status of the child - and replies you with "OK all is done".
This scheme allows task to make a one fork() call at a time (i.e. you are able to make other child after old child is created) - I don't think that this is a bad, other architecture going to make other applications design.
On this I want to explain some details.
"Forkable bit" can be toggled on only by trusted process, and it's switched off while fork called.
Task has a timestamp - and both kernel and process service known it at the start of all.
VFS's mould has a parent and while parent is exist - mould will not be deleted, orphaned mould will be deleted with timeout.
Checking for a valid child pid makes not only on parent pid - also on timestamp.

Uff, I can explain other details in comments.

Monday, March 09, 2009

File systems on Jari OS

There was a great job on last two weeks. File subsystem are designed in Jari OS - since we have a system with separated services - libraries is a good approach to avoid a huge amount of code and errors with changes it.
Actually file subsystem is divided by several general parts:

VFSv2 service
set of libraries for each filesystem

I don't want to include all stuff for filesystem in one library, because it's a bad idea, for example tmpfs doesn't has backend to block device, pagecache, postponed calls and so on. But anyway this set of libraries can be completed by one, generic library that should be included to each file system - libv2.
In this case we're have the following list:

libv2 (general library)
libv2backend (backend to the block device layer)
libv2pgcache (page cache library)
libv2ppcall (library serves postponed calls)

Each file system architecture should determine what will be used, generally , regular file system will always use all set of libraries - i.e. it works with block device and use page cache in this case, and it will not replies immediately on all cases - it will have postponed calls list to reply.
For libv2 - there are nothing difference - it will always support everything on logic layer.
I can't see other solutions - our VFS service should be overloaded - and many calls going directly to file system service.
So, while implementation tmpfs and initfs - implementation hasn't any complexities.

Deeper than deep

Pages