Deeper than deep: July 2008

Thursday, July 31, 2008

ReadLeaf: research pane

Today, how I've wrote later, readleaf 0.1 was released and finished lazy stage - now it's a platform for research and implementation.
Both, MuiString and ReadLeaf has a one general idea, but MuiString is more deeper in implementation and some decisions need to be tested on other layer - quick and fast before this innovation will be implemented within MuiString and Jari overally.
ReadLeaf is a good start point for this type of research, I will use new way in development primary here - in area of the project.
I'm too lazy for repeations - you can see the general overview here - http://readleaf.berlios.de/concept.html
Currently already implemented:

redleafd
internal mm
basic research

Version 0.2 will be already with new features, now it's on the testing and development stage, but not so long from finish.

Monday, July 21, 2008

MuiString: Introduction to the architecture #1

Today we're have a many implementations of the microkernel architectures, but there are too similar and have many problems.
The first problem is a speed, the second one is a security.
Generally microkernel more secure than monolithic-based operating system, but with some restrictions, if you grow up speed, you are loosing in security.
But it's a general problems - more complicated thing - we're using C/C++/ObjC to implement architecture, and there are language takes a many time and limits in the implementing.
My general idea is to use a functional programming language - and I've choose scheme - it's better, simply.
Look for the block diagram:

MuiString separated to the three logical parts, the first low two layers implemented on C (or on Cext in future), the third layer implemented on C/Cext and on scheme itself.
All hardware specific things are hidden in the MuiString.
All services should be implemented on scheme, yes, we're can use C-written servers and libs (it should be done for liblinux/libposix), but in general case all other will be implemented on scheme and ran on scheme VM.

All IPC will be operate via scheme forms - in this case all security models will be implemented without IPC calls restrictions, and it can be made on the VM (or language) level.
There are several security models and several objects on the higher level:

Trusted servers
Key signed
Group politics

Trusted servers can allow secure transmit via determined servers, for example - you trust to the TCP/IP server, LAN server also trust it, but there are no directions from you to the LAN server, but LAN server and you can get access to execute scheme forms from/on LAN server, because there are trusted via TCP/IP server.

I will write more researching results in the next post.

Friday, July 18, 2008

Redleaf: new plans and ways

As of redleafd 0.1beta release (now it can be used for some purposes with sure) I'm take an idea, I want to implement some features for my work - it's a time sharing, job statistics, projects states etc ...
I've decided to make it via web, yes, in case of public (but restricted use) using.
What the deal I'm talking about ?
There are a complete service for each one who cares and working on open-source project, scheduling community events, working time and want to make a proof of participating and real developer state.
For example - I have a many mails from people who wants me to take they like a developers to open source project, in the past I've including everybody, but there are no effect on this - nobody works, people was enlisted and nothing else.
On real practice - I want to take a view to each of candidate to participate in project via this service. There are no service for it yet.
Also, there are a good thing for employer to avoid time loose with non right mans.
eh, you can told me about linkedin - linkedin is not really make a proof of your skills and jobs - it's suck.
So, it was a some non-technical introduction ...
I'm reserving 4-5 hours per week for redleaf, and I want to introduce technologies will used for this project:

- httpd - redleafd (small and fast)
- scheme_modula (module for the redleaf that suppose to run scheme code as quick as it possible)
- web ui software written on scheme
- parsing/collecting server side daemons on python || scheme (it depends on my research with libs)

In this case I will release in near time redleaf 0.1 (beta version can be found here - http://prdownload.berlios.de/readleaf/redleaf-0.1beta.tar.gz ), that will supports basic http stuff only.
After those I'm planning to reorganize architecture and rewrite some code, to support dynamic modules and so on for flexibility, the next stage will be a set of modules. After this stage I will implement scheme_modula and starts to write web ui and server side daemons.
In this case I'm need to anybody who knows any of the following - or all of this stuff:

- C and UNIX IPC
- Data structures and memory management techniques
- Scheme
- any of - cvs/git/svn/arch/etc ... on really good level
- Python
- html/css and optionally javascript
- Gimp and some creative techniques for image/icons creating

Benefits you will get working on this project: you can become famous (maybe, who knows?), you can update your skills on programming and especially on scheme programming - on real practice, you will have a short and cool name for imap account - and invitation to the newly created service ... hmm like authors.
I'm looking for you - contact me via jabber - tirra at njs.netlab.cz

Thank you.

Tuesday, July 15, 2008

Tips'n'Tricks: GCC useful macros for debugging

Continuing my series for newbies/beginners I want to told about gcc macros useful for debugging.
Everytime on C you have deal with memory, and very big part of bugs are memory managing related.
You must know differents ways to decide question with it, wrappers to general mmap/malloc/munmap/free functions, implement your own allocator (if you really know what you're doing).
Anyway, it's better to know without debugger (just use stdout to make this deal) when you are allocate and freed memory. GCC has a several really useful macros for this.
For example let's write a simple example with free() function:


void *__my_free(char *file,int line,char *function,void *p)
{
 if(p)    free(p);
 else     {
   fprintf(stderr,"Trying to free nil pointer.\n");
   fprintf(stderr,"At `%s:%d' via '%s'\n",file,line,function);
 }

 return NULL;
}

It's really a big shit deal to point everywhere info about function, line and file ;)
Just use this macro:


#define my_free(p)  __my_free(__FILE__,__LINE__,(char *)__FUNCTION__,p)

This is all, for more detailed documentation see gcc manual.

Wednesday, July 02, 2008

Research: new ideas to system development

Doing some research on operating systems design you can choose the kernel model and in this case you can expect that microkernel is more stable, powerful and flexible compairing with monolithic design, exokernels are too platform specific and more complicable for application level platforms (i.e. embedded systems with too limited functionality). But like a bonus you will got the new area of research in this area - choosing and developing an instruments for implementation.
On this area, in my opinion, are old and good instrument is a pure C language for microkernel, for low level. But in case of many reasons C development has a long time for something bigger than microkernel, for microkernel you can extend C with macros or write your own preproccessor to extend C (like my project extC). This is good and right, but for microkernel servers and user level is better to use something else, more powerful with paradigms and more easy to develop.
It's not C++, not only in case of my hate to pluses, there are no reason to use C++ in any kind of development, C++ contain many ugly stuff and it has an ugly design overally. There are ObjectiveC, but this language will not decide our wishes.
I'm looking to functional programming languages - this way is more powerfull, well-designed lisp-like language is really simple to implement, they all has a simple syntax - and learning didn't takes a long time - it's intended to be meta-level development.
With its powerful I'm thinking about crazy things - what about to design a several DSL that can be simply implemented on Lisp/Scheme/Haskell , for different types of microkernel servers.
In example, you can research for often used language features and constructions that required for implement a filesystem, or block device driver, or network device driver etc...
I'm thinking to go deeper in this area after MuiString will ran. I promise to make a deep research, implement expiremental DSLs for this purposes and test it for perfomance and other things.
Be updated, take a look to functional programming of your research - it's a wonderful world of new features and paradigms.

Memory management - buddy system trick.

How I wrote before, on low level we're need a simple and fast allocator with minimal external fragmentation. It was buddy system. I've used a binary buddy in case of simple and clean code and perfomance.
But we're need to store a free buddies list, but we haven't any allocators lower and we're cannot allocate something for it, and we're cannot create linked list of pointers to the blocks and chunks - and it's not so clever - the really big overdraft on it.
Theory:
I've decide that I can use a bitmap in this case, for grow up speed I will use a bitmap for each buddy list level in this case for first level we're need just 1 bit per attribute, for second 2 bits per attribute ... and so on. Also we're need to make a simple initialization and keep information about separating. In this case we're using 2 bits per buddy block, I've called them - the first one is negative bitmap, the second one is positive bitmap.
What are they means? The first bit attribute (negative) points that if the bit is set it's a free block, if not it's used (or unmapped) block. Second bit attribute (positive) points that if the bit set it's not separated block, otherwise is. In this model we're simply full negative bitmap with 0x0and positive bitmap with 0xffffffff, and how you can understand we will use unsigned integer 32bit sized ones for bitmap.
Make a simple calculations and you will get a good news - for map buddy with 16-parts you need 64bits or 8 bytes, isn't pretty?
Implementation:
So, the first I will declare the general structure that will describe buddy system:


typedef struct __bbuddy_type {
  uint32_t *pbmp;
  uint32_t *nbmp;
  uint32_t pn;
} bbuddy_t;

How you can see there are two pointers and what the neck ?
So, I'll explain it - on low level you need to take a pointer (determining the size you need) and use it for your first buddy - it's simply than something else. Also, I store buddy_t first and after it I'm storing uint32_t data.
Also we're need an macro that will calculate the overall size to store the buddy system, let's take it:


#define bbuddy_size(n)  (((((n/32)==0) ? 1 : ((n/32>1) ? (n/32)+2 : n/32+1))*(2*sizeof(uint32_t)))+\
                         sizeof(bbuddy_t))

Explaining it, for parts that bigger than 1/32 of buddy mapped area we're need just one uint32_t , for 1/32 yet another one, and for smaller parts working a general rule to calculate size. Like in squares roots calculation ;)
Also you will need for several useful macros that will calculates index of pointer and bit position within bitmap area:


#define bbuddy_indexp(p,n)  ((p/32 > 1) ? ((p/32)+(n/32)) : p/32)
#define bbuddy_indexbn(n)   ((n/32>0) ? n%32 : n)

The first macro will calculate index to choose correct uint32_t data, and second one to choose a bit position, be careful using this macros as is not a good practice, I will show examples of using its.
In the implementation explanation I don't want to show you all the source code, it's a simple - you can implement it yourself without any kind of problems.
But there are some useful code snippets I'll show.
The first point on implementation is initialization of buddy system, assign a correct pointers for bitmaps, init first block free. Here be careful with pointers shifting deals, I will show you the simple one, that works correctly, but don't forgot that C operator '+' on pointers shifts pointers with the size of pointer type.
I've used char* - it's a clean and transparent to understand:


uint8_t bbuddy_init(bbuddy_t *b,uint32_t max_part)
{
  int yy=(max_part/32)*2,i;
  char *ptr=(char*)b;

  if(b && max_part) {
    ptr+=sizeof(bbuddy_t);
    b->pbmp=(uint32_t*)ptr;
    ptr+=(sizeof(uint32_t)*((max_part/32 > 1) ? (max_part/32)+2 : (max_part/32)+1));
    b->nbmp=(uint32_t*)ptr;
  } else
    return 1;

  b->pn=max_part;

  max_part/=32;

  for(i=0;i<yy;i++) {
    b->nbmp[i]=nil;
    b->pbmp[i]=fil;
  }

  /* init first */
  b->nbmp[0] |= (1 << 0);
  b->nbmp[0] |= (1 << 1);

  return 0;
}

NOTE: nil it's a macro defined to 0x0 and fil is a macro defined to 0xffffffff.
Here we go ... we're a correctly init buddy system structure.
Now, I want to wrote about allocations/splitting/freeing specifics of this implementation.
I've implement an allocation function with mind that I've giving a part number of buddy system and it's trying to return me a number of avialable and allocated part. Look below:


static uint32_t __bbuddy_block_alloc(bbuddy_t *b,uint32_t align,uint8_t m_flag)
{
  uint32_t p_indx=align/32;
  uint32_t n_indx=0,i=0,o=0;

  if(p_indx) {
    for(i=0;i<p_indx;i++) {
      if(b->nbmp[p_indx+i]!=0x0)
        goto __is_free_long;
    }
    m_flag++;
    return __bbuddy_block_alloc(b,align/2,m_flag);
  }

  if(!p_indx) { /* checking small layers, big blocks */
    o=align*2;
    for(i=align;i<o;i++) 
      if((b->nbmp[p_indx] & (1 << i)) && !m_flag) {
        b->nbmp[p_indx] &= ~(1 << i); /* used */
        b->pbmp[p_indx] |= (1 << i); /* not separated */
        return i-align;
      } else if((b->nbmp[p_indx] & (1 << i)) && m_flag) {
        b->nbmp[p_indx] &= ~(1 << i); /* used */
        b->pbmp[p_indx] &= ~(1 << i); /* separated */
        /* mark childs free */
        align*=2;         i*=2;
        p_indx=align/32;
        n_indx=bbuddy_indexbn(i);
        b->nbmp[p_indx] |= (1 << n_indx);
        b->nbmp[p_indx] |= (1 << (n_indx+1));
        b->pbmp[p_indx] |= (1 << n_indx);
        b->pbmp[p_indx] |= (1 << (n_indx+1));

        m_flag--;
        return __bbuddy_block_alloc(b,align,m_flag);
      }
    if(i<2)
      return ENOBLOCK;
    else {
      m_flag++;
      return __bbuddy_block_alloc(b,align/2,m_flag);
    }
  }

 __is_free_long: 
  while(o<32) {
    if((b->nbmp[p_indx+i] & (1 << o)) && !m_flag) { /* yep, found */
      b->nbmp[p_indx+i] &= ~(1 << o); /* used */
      b->pbmp[p_indx+i] |= (1 << o); /* not separated */
      return (p_indx > 1) ? (((p_indx+i)-2)*32)+o : o;
    }
    else if((b->nbmp[p_indx+i] & (1 << o)) && m_flag) { /* make sep */
      b->nbmp[p_indx+i] &= ~(1 << o); /* used */
      b->pbmp[p_indx+i] &= ~(1 << o); /* separated */
      /* mark childs free */
      align*=2;        p_indx=align/32; o*=2;
      p_indx=bbuddy_indexp(align,o);
      n_indx=bbuddy_indexbn(o);
      b->nbmp[p_indx] |= (1 << n_indx);
      b->nbmp[p_indx] |= (1 << (n_indx+1));
      b->pbmp[p_indx] |= (1 << n_indx);
      b->pbmp[p_indx] |= (1 << (n_indx+1));

      m_flag--;
      return __bbuddy_block_alloc(b,align,m_flag);
    }
    o++;
  }


  return 0;
}

How you can see there are difference between 1/16 and bigger parts - I've showed it, it's separated via source code implementation in case of making additional checking and different calculation in both situations, you can split there parts into one to make code lines counter smaller , but it doesn't give any perfomance effect (tested).
Freeing buddy block must be assigned with splitting buddy into bigger block if possible, I've implement this via recursive separate function to make a lightweight to read release function, also I will point to indexes calculations here.
And I want you to understand, I'm giving to release function number of smaller blocks like offset to release, after those I'm calculating possible variants of blocks can be and if found I'm releasing it, and only after it I'm calls my splitter function. Also, I don't include additonal checking code, make it yourself ;), look below:


static uint32_t __bbuddy_block_release(bbuddy_t *b,uint32_t num,uint32_t i)
{
  uint32_t p_indx,n_indx;
  uint32_t layer=b->pn,ls=layer,a=0;

  /* first look up on higher possible layer
   * if there are now, look deeper
   */
  if(num>b->pn)
    return EINVALIDINDX; /* error encount */

  while(ls) {    ls/=2; a++;  } 
  if(i==0) {
    for(i=(a-1);i>0;i--) {
      if(num%2) break;
      else num/=2;
    }
  } 

  layer=(1 << i);

  n_indx=(layer<32) ? (num+layer) : bbuddy_indexbn(num);
  p_indx=bbuddy_indexp(layer,num);
  if(!(b->nbmp[p_indx] & (1 << n_indx)) && (b->pbmp[p_indx] & (1 << n_indx))) { 
    b->nbmp[p_indx] |= (1 << n_indx); /* mark free */
    return __bbuddy_split_up(b,layer,num);
  } else if(!(b->nbmp[p_indx] & (1 << n_indx)) && !(b->pbmp[p_indx] & (1 << n_indx))) { /* going deep */
    i++;
    if(i>=a)
      return EBUDDYCORRUPED;
    else
      return __bbuddy_block_release(b,num*2,i);

  }

  return 0;
}

In case of my initialization and implementation of bitmap, I'm starting to look up for the block from the biggest blocks, because in othercase there are will be errors on cleanly initied bitmap and buddy will corruped.
Also, I don't want to describe my splitter function and I will not show it's source code - it's a trivial too like others.
Be patient, on my code (I wrote it and tested within 1-2 hours) some checks is absent, and some code I've specially removed - take your brains on working way ;)
Like API functions I've made a wrapper to hide all recursion in static functions, and with assign of initial counters values.
On next post I will try to explain how we're can use this abstraction to allocate real pointers, or you can think yourself and assume this method - it's a too trivial ;)

Deeper than deep

Pages