C11 key/value database engine
https://github.com/Softmotions/iowow
IOWOW - The C11 persistent key/value storage based on skip list data structure
Features
- Support of multiple key-value databases within a single file.
- Native support of integer keys
- Support of record values represented as sorted array of integers
- Write Ahead Logging (WAL)
- Ultra-fast sequential traversal of database records
- Good performance comparing its main competitors: lmdb, leveldb, kyoto cabinet
- Tiny C11 library (only 150Kb) can be easily embedded into any software
Limitations
- Maximum iwkv storage file size:
512 GB (0x7fffffff80)
(since v1.0.6) - Total size of a single key+value record must not be greater than
255Mb (0xfffffff)
- In-memory cache for every opened database takes
~130Kb
. Cache can be disposed byiwkv_db_cache_release()
Key components API
- iwkv.h Persistent key/value database engine:
iwkv
- iwfsmfile.h File blocks allocation manager like
malloc()
on files
Supported platforms
Linux
Ubuntu/Debian
umask 022 sudo add-apt-repository ppa:adamansky/iowow sudo apt-get update sudo apt-get install iowow
Building debian packages
umask 022 mkdir build && cd build cmake .. -DCMAKE_BUILD_TYPE=Release -DPACKAGE_DEB=ON make package
RPM based Linux distributions
umask 022 mkdir build && cd build cmake .. -DCMAKE_BUILD_TYPE=Release -DPACKAGE_RPM=ON make package
FreeBSD
Successfully tested on FreeBSD 10/11
https://www.freshports.org/databases/iowow/
OSX
Successfully tested on OSX 10.12/10.13
Windows
Cross-compilation for Windows x86-64 platform
iowow-1.2.9-RelWithDebInfo-Windows-x86_64.zip
iowow-1.2.9_MSVC_2017_sample_solution.zip
MIPS,PowerPC (big-endian)
Successfully tested on
- Debian 9.4, MIPS 32, gcc 6.x compiler.
- PowerPC FreeBSD
Benchmarks
It is quite hard to provide fair database benchmarks since benchmarks are highly dependent on hardware configuration, database parameters, compiler, extra libraries, build flags, file system fragmentation and so on. So it will be good to build and run benchmarks by self and test all of interesting cases. IOWOW is also available in ioarena benchmarking suite https://github.com/pmwkaa/ioarena
How to build and run benchmarks
Reuirements: python3
bokeh
parse
pip3 install bokeh parse
mkdir build && cd build cmake .. -DCMAKE_BUILD_TYPE=Release -DBUILD_BENCHMARKS=ON make ... cd ./src/kv/benchmark/ ls -1 *benchmark iwkv_benchmark kyc_benchmark leveldb_benchmark lmdb_benchmark
Every benchmark executable have broad range of run options:
Usage ./iwkv_benchmark [options] Options: -h Show this help -n <num> Number of stored records -r <num> Number of records to read (equals to number of stored records if not specified. Not for readseq,readreverse) -vz <size> Size of a single record value in bytes -b <comma separated benchmarks to run> -db <file/dir> Database file/directory -rs <random seed> Random seed used for iwu random generator Available benchmarks: fillseq write N fixed length values in sequential key order in async mode fillseq2 write N random length values in sequential key order in async mode fillrandom write N fixed length values in random key order in async mode fillrandom2 write N random length values in random key order in async mode overwrite overwrite N values in random key order in async mode fillsync write N/100 values in random key order in sync mode fill100K write N/100 100K values in random order in async mode deleteseq delete N keys in sequential order deleterandom delete N keys in random order readseq read N times sequentially readreverse read N times in reverse order readrandom read N times in random order readmissing read N missing keys in random order readhot read N times in random order from 1% section of DB seekrandom N random seeks
You can run a bunch of benchmarks then show results as funny plots:
cd ./src/kv/benchmark/ python3 ./runbench.py
As demonstration, there is a set of benchmarks on the following hardware
- CPU 8x Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
- RAM 16320MB
- SSD Micron C400 RealSSD 256GB
- Filesystem ext4
- OS Ubuntu 16.04.4 LTS
Example
#include <iowow/iwkv.h> #include <string.h> #include <stdlib.h> int main() { IWKV_OPTS opts = { .path = "example1.db", .oflags = IWKV_TRUNC // Cleanup database before open }; IWKV iwkv; IWDB mydb; iwrc rc = iwkv_open(&opts, &iwkv); if (rc) { iwlog_ecode_error3(rc); return 1; } // Now open mydb // - Database id: 1 rc = iwkv_db(iwkv, 1, 0, &mydb); if (rc) { iwlog_ecode_error2(rc, "Failed to open mydb"); return 1; } // Work with db: put/get value IWKV_val key, val; key.data = "foo"; key.size = strlen(key.data); val.data = "bar"; val.size = strlen(val.data); fprintf(stdout, "put: %.*s => %.*s\n", (int) key.size, (char *) key.data, (int) val.size, (char *) val.data); rc = iwkv_put(mydb, &key, &val, 0); if (rc) { iwlog_ecode_error3(rc); return rc; } // Retrive value associated with `foo` key val.data = 0; val.size = 0; rc = iwkv_get(mydb, &key, &val); if (rc) { iwlog_ecode_error3(rc); return rc; } fprintf(stdout, "get: %.*s => %.*s\n", (int) key.size, (char *) key.data, (int) val.size, (char *) val.data); iwkv_val_dispose(&val); iwkv_close(&iwkv); return 0; }
Compile and run
gcc -std=gnu11 -Wall -pedantic -c -o example1.o example1.c gcc -o example1 example1.o -liowow ./example1 put: foo => bar get: foo => bar
Internals
High level overview
- IWKV databases stored in a single file. Free space managed by bitmap index (iwfsmfile.h).
- All Free space chunks stored in memory BTree index. BTree index constructed from bitmap after iwkv file opened.
- Minimal allocation amount:
128 bytes
- Max number of skip list levels:
24
- Write Ahead Log (WAL)
H
FSM file header custom user header together with auxiliary system data.D
Data block. Data block is allocated chunk of file address space whose allocation state stored in free space bitmapF
Free space map tree
IWKV storage pool consists mostly of two kinds of blocks:
SBLK
Skip list node with pointers to next nodes and pointer toKVBLK
(key/value pairs block). SBLK has fixed size (256 bytes). SBLK file position (block address) within a file is fixed and cannot be changed.KVBLK
Storage block for a set of key value pairs. Up to 32 adjacent KV pairs can be stored within a single KVBLK block.
SBLK
[flags:u1,lvl:u1,lkl:u1,pnum:u1,p0:u4,kblk:u4,pi:u1[32],n:u4[24],lk:u116]:u256 \ KVBLK
flags
Persistent block flags (1 byte)lvl
Skiplist level of this block (1 byte)lkl
Length of the lowest key prefix in this block (1 byte)kblk
Block number of associatedKVBLK
(4 bytes)pi[32]
Array of key/value pair indexes inKVBLK
block. Indexes are sorted by keys. (32 bytes)n[24]
Pointers to nextSBLK
blocks in skiplist (96 bytes)lk
Buffer for the lowest key prefix among all key/value pairs stored inKVBLK
KVBLK
[szpow:u1,idxsz:u2,KVI[32] ___free space___ [[KV],...]]
szpow
KVBLK length as power of 2idxsz
Length of KVI array in bytesKVI[32]
Key/value pair index[ps:vn,pl:vn]
-
ps
Key/value pair block offset on i-th place variable length encoded number. This offset is relative to end of KVBLK block -
pl
Key/value pair block length on i-th place variable length encoded number
-
KV
Key/value pair[klen:vn,key,value]
-
klen
Key length as variable length encoded number -
key
Key data buffer -
value
Value data buffer
-
MIT License
Copyright (c) 2012-2018 Softmotions Ltd <info@softmotions.com> Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.