The performance of dynamic strings implementations

Let’s say you want to create a nice string abstract data type to use in your C code. You can end with a structure like this:

typedef struct mstring mstring;

struct mstring {
      
int buflen;
      
int len;
      
char *buf;
};

mstring *mstring_new() {
      
mstring *result;

// insert memory allocation errors check here
      
result = (mstring*)malloc(sizeof(struct mstring));
      
result->len = 0;
      
result->buflen = 128;
      
result->buf = (char *)malloc(result->buflen);

return result;
}

void mstring_append(mstring *dest, const char *src) {
      
…
}

It sound nice and you can also have manipulation functions that reallocate the string when necessary. They will be really useful! Let’s try:

mstring *m = mstring_new();

mstring_append(m, "Hi this is me! I’m ");
mstring_append(m, myNameHere);
printf("%s", m->buf);

mstring_delete(m); m=NULL;

This is nice indeed! But is really not that efficient from a memory point of view. In your heap you will have:

Your implementation of the holy tuple malloc and free must keep track of the string descriptor and of the strings contents, which can be (and will be) non-contiguous in memory.

You can do really better allocating the string descriptor directly before the string contents and having mstring_new return the address of the string contents instead of the string descriptor. Your strings will be C-compatible and more efficient. Let’s try again:

typedef char *fstring;

struct fstring_header {
      
int buflen;
      
int len;
};

fstring fstring_new() {
      
struct fstring_header *header;
      
fstring result;

header = (struct fstring_header*) malloc(128);
      
header->buflen=128-sizeof(struct fstring_header);
      
header->len=0;

result = ((char*)header)+sizeof(struct fstring_header);
      
result[0]=0;

return result;
}

But now your fstring_append function can’t simply change the buf field when a buffer reallocation is needed. You must to reallocate the entire string, descriptor included.

fstring* fstring_append(fstring *dest, const char *src) {
      
….
}

So your code will change like this:

fstring *m = fstring_new();

m = fstring_append(m, "Hi this is me! I’m ");
m = fstring_append(m, myNameHere);
printf("%s", m);

fstring_delete(m); m=NULL;

You may say that this is premature optimization but wait! You saved:

and now the string descriptor is also attached to the string, and your cache will be happy. Really more efficient. In my tests this string implementation is two times more efficient that the previous.

Want to try a really good dynamic string implementation? Try SDS. This is used in Redis. You can also try my C all-purpose library here: CommonLib.