IOWOW

C11 key/value database engine

https://github.com/Softmotions/iowow

IOWOW - The C11 persistent key/value storage based on skip list data structure

Features

  • Support of multiple key-value databases within a single file.
  • Native support of integer keys
  • Support of record values represented as sorted array of integers
  • Write Ahead Logging (WAL)
  • Ultra-fast sequential traversal of database records
  • Good performance comparing its main competitors: lmdb, leveldb, kyoto cabinet
  • Tiny C11 library (only 150Kb) can be easily embedded into any software

Limitations

  • Maximum iwkv storage file size: 512 GB (0x7fffffff80) (since v1.0.6)
  • Total size of a single key+value record must not be greater than 255Mb (0xfffffff)
  • In-memory cache for every opened database takes ~130Kb. Cache can be disposed by iwkv_db_cache_release()

Key components API

  • iwkv.h Persistent key/value database engine: iwkv
  • iwfsmfile.h File blocks allocation manager like malloc() on files

Supported platforms

Linux

Ubuntu/Debian

umask 022
sudo add-apt-repository ppa:adamansky/iowow
sudo apt-get update
sudo apt-get install iowow

Building debian packages

umask 022
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DPACKAGE_DEB=ON
make package

RPM based Linux distributions

umask 022
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DPACKAGE_RPM=ON
make package

FreeBSD

Successfully tested on FreeBSD 10/11

https://www.freshports.org/databases/iowow/

OSX

Successfully tested on OSX 10.12/10.13

Windows

Cross-compilation for Windows x86-64 platform

iowow-1.2.9-RelWithDebInfo-Windows-x86_64.zip

iowow-1.2.9_MSVC_2017_sample_solution.zip

MIPS,PowerPC (big-endian)

Successfully tested on

  • Debian 9.4, MIPS 32, gcc 6.x compiler.
  • PowerPC FreeBSD

Benchmarks

It is quite hard to provide fair database benchmarks since benchmarks are highly dependent on hardware configuration, database parameters, compiler, extra libraries, build flags, file system fragmentation and so on. So it will be good to build and run benchmarks by self and test all of interesting cases. IOWOW is also available in ioarena benchmarking suite https://github.com/pmwkaa/ioarena

How to build and run benchmarks

Reuirements: python3 bokeh parse

pip3 install bokeh parse
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DBUILD_BENCHMARKS=ON
make

...

cd ./src/kv/benchmark/
ls -1  *benchmark
 iwkv_benchmark
 kyc_benchmark
 leveldb_benchmark
 lmdb_benchmark

Every benchmark executable have broad range of run options:

Usage ./iwkv_benchmark [options]

Options:
  -h Show this help
  -n <num> Number of stored records
  -r <num> Number of records to read (equals to number of stored records 
           if not specified. Not for readseq,readreverse)
  -vz <size> Size of a single record value in bytes
  -b <comma separated benchmarks to run>
  -db <file/dir> Database file/directory
  -rs <random seed> Random seed used for iwu random generator

Available benchmarks:
  fillseq        write N fixed length values in sequential key order in async mode
  fillseq2       write N random length values in sequential key order in async mode
  fillrandom     write N fixed length values in random key order in async mode
  fillrandom2    write N random length values in random key order in async mode
  overwrite      overwrite N values in random key order in async mode
  fillsync       write N/100 values in random key order in sync mode
  fill100K       write N/100 100K values in random order in async mode
  deleteseq      delete N keys in sequential order
  deleterandom   delete N keys in random order
  readseq        read N times sequentially
  readreverse    read N times in reverse order
  readrandom     read N times in random order
  readmissing    read N missing keys in random order
  readhot        read N times in random order from 1% section of DB
  seekrandom     N random seeks

You can run a bunch of benchmarks then show results as funny plots:

 cd ./src/kv/benchmark/
 python3 ./runbench.py


As demonstration, there is a set of benchmarks on the following hardware

  • CPU 8x Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
  • RAM 16320MB
  • SSD Micron C400 RealSSD 256GB
  • Filesystem ext4
  • OS Ubuntu 16.04.4 LTS
Database size MB after random insertion 1M records, key=16 bytes value=[random 0-1000] bytes
Database size MB after random insertion 1M records, key=16 bytes value=[random 0-1000] bytes
Mixed operations over 1M records, value=[random 0-400] bytes
Mixed operations over 1M records, value=[random 0-400] bytes
Sequential write/overwite/delete of 1M records
Sequential write/overwite/delete of 1M records
Write random of 10M records then read random 10M times
Write random of 10M records then read random 10M times
Read sequentially of 10M records
Read sequentially of 10M records
Write 10K random [0 - 20Kb] values
Write 10K random [0 - 20Kb] values

Example

#include <iowow/iwkv.h>
#include <string.h>
#include <stdlib.h>

int main() {
  IWKV_OPTS opts = {
    .path = "example1.db",
    .oflags = IWKV_TRUNC // Cleanup database before open
  };
  IWKV iwkv;
  IWDB mydb;
  iwrc rc = iwkv_open(&opts, &iwkv);
  if (rc) {
    iwlog_ecode_error3(rc);
    return 1;
  }
  // Now open mydb
  // - Database id: 1
  rc = iwkv_db(iwkv, 1, 0, &mydb);
  if (rc) {
    iwlog_ecode_error2(rc, "Failed to open mydb");
    return 1;
  }
  // Work with db: put/get value
  IWKV_val key, val;
  key.data = "foo";
  key.size = strlen(key.data);
  val.data = "bar";
  val.size = strlen(val.data);
  
  fprintf(stdout, "put: %.*s => %.*s\n",
          (int) key.size, (char *) key.data,
          (int) val.size, (char *) val.data);
                  
  rc = iwkv_put(mydb, &key, &val, 0);
  if (rc) {
    iwlog_ecode_error3(rc);
    return rc;
  }
  // Retrive value associated with `foo` key
  val.data = 0;
  val.size = 0;
  rc = iwkv_get(mydb, &key, &val);
  if (rc) {
    iwlog_ecode_error3(rc);
    return rc;
  }
  
  fprintf(stdout, "get: %.*s => %.*s\n",
          (int) key.size, (char *) key.data,
          (int) val.size, (char *) val.data);
                    
  iwkv_val_dispose(&val); 
  iwkv_close(&iwkv);
  return 0;
}

Compile and run

 gcc -std=gnu11 -Wall -pedantic -c -o example1.o example1.c
 gcc -o example1 example1.o -liowow

 ./example1 
  put: foo => bar
  get: foo => bar

Internals

High level overview

  • IWKV databases stored in a single file. Free space managed by bitmap index (iwfsmfile.h).
  • All Free space chunks stored in memory BTree index. BTree index constructed from bitmap after iwkv file opened.
  • Minimal allocation amount: 128 bytes
  • Max number of skip list levels: 24
  • Write Ahead Log (WAL)
Free space map file (FSM)
Free space map file (FSM)
  • H FSM file header custom user header together with auxiliary system data.
  • D Data block. Data block is allocated chunk of file address space whose allocation state stored in free space bitmap
  • F Free space map tree

IWKV storage pool consists mostly of two kinds of blocks:

  • SBLK Skip list node with pointers to next nodes and pointer to KVBLK (key/value pairs block). SBLK has fixed size (256 bytes). SBLK file position (block address) within a file is fixed and cannot be changed.
  • KVBLK Storage block for a set of key value pairs. Up to 32 adjacent KV pairs can be stored within a single KVBLK block.
IWKV skip list + key value blocks
IWKV skip list + key value blocks

SBLK

 
 [flags:u1,lvl:u1,lkl:u1,pnum:u1,p0:u4,kblk:u4,pi:u1[32],n:u4[24],lk:u116]:u256
                                         \
                                        KVBLK
  1. flags Persistent block flags (1 byte)
  2. lvl Skiplist level of this block (1 byte)
  3. lkl Length of the lowest key prefix in this block (1 byte)
  4. kblk Block number of associated KVBLK (4 bytes)
  5. pi[32] Array of key/value pair indexes in KVBLK block. Indexes are sorted by keys. (32 bytes)
  6. n[24] Pointers to next SBLK blocks in skiplist (96 bytes)
  7. lk Buffer for the lowest key prefix among all key/value pairs stored in KVBLK

KVBLK

 
 [szpow:u1,idxsz:u2,KVI[32] ___free space___ [[KV],...]]


  1. szpow KVBLK length as power of 2
  2. idxsz Length of KVI array in bytes
  3. KVI[32] Key/value pair index [ps:vn,pl:vn]
      ps Key/value pair block offset on i-th place variable length encoded number. This offset is relative to end of KVBLK block
      pl Key/value pair block length on i-th place variable length encoded number
  4. KV Key/value pair [klen:vn,key,value]
      klen Key length as variable length encoded number
      key Key data buffer
      value Value data buffer

MIT License


Copyright (c) 2012-2018 Softmotions Ltd <info@softmotions.com>

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.