Database Backend Redesign - Phase 4


Date: 2021 Mar, 4th

Staus: Draft



See backend redesign (initial phases)

High level summary

This document is the continuation of ludwig’s work about Berkeley database removal. backend redesign (initial phases)

It focus on:

Current state

Working on Design (in other words very unstable draft)


plugin directory: …/back-ldbm/db-mdb sources files are prefixed with mdb_ lmdb lib: are got from packages: lmdb-devel lmdb-doc lmdb-libs

Note: lmdb documentation is in file:///usr/share/doc/lmdb-doc/html/index.html once lmdb-doc is installed.


The plan with this phase is to minimize as much as possible impacts outside of the mdb plugin so we do not provide pointer on db mmap outside of the plugin The global plugin design is similar to bdb plugin

Architecteure choices

creating a single MDB_env versus one MDB_env per backend

- Max Dbs question: (txn cost depends linearly of the max number of dbs )
  if we split per suffix we can keep it smaller 
- Consistency of txn across suffixes
  The question is important for write txn as there is no way to commit them
  in a single step  (Is there write txn across different suffixes ?)
- Consistency of changelog (Not an issue as the changelog is already per suffix)
- Consistency with existing bdb model (today bdb_make_env  is called once:
  (with the <INSTALLDIR>/var/lib/dirsrv/slapd-<INSTANCE>/db path )

==> I suspect that we will have to go for a single MDB_env in first phase

db filename list

The whole db is a single mmap file and lmdb does not provide any interface to list the db names two solutions

==> This will impact dbstat tool too as we cannot looks for file anymore and we needs a way to list existings suffix and existsing files in suffixes

mdb specific config parameters

- MAXDBS  (cf mdb_env_set_maxdb)
- DBMAXSIZE (cf mdb_env_set_maxdbs)
- mdb_env_set_maxreaders: should be around 1 per working threads +
   1 per agmt 
      so we could have an auto tuning value that will use the number ofr
          working threads + 30 

Note: changing these parameters requires db env closure (i.e: restart the instance in first implementation)

mdb limitations

Here are the limits That I measured in my test.

Database type Key max Data max
No dup Support 511 > 6 GB
Dup Support 511 511

511 is the mdb_env_get_maxkeysize(env) hardcoded limit Got a bit more than 6 GB in a db with size = 10GB

** Note from Thierry : ** pierre: regarding LMDB keylen, IPA is using ‘eq’ index on attributes (usercertificate, publickey,..) with keys that >500bytes

mdb-env-open flags

- db2ldif/db2bak MDB_RDONLY
- ns-slapd 0
- offline bak2db/ldif2db MDB_NOSYNC  use mdb_env_sync() and fflush(mdb_env_get_fd()) before closing the env (depending of the ldif file or the mmap size we may also use MDB_WRITEMAP flag)
- online bak2db (and maybe online db2ldif) duplicate the environment
- reindex (should probably be 0 to avoid breaking the whole db in case of error)    Note we may be in trouble if ldif2db fails and there is multiple bakends 
and the single env strategy id used ...

db format

db open flags key value
id2entry MDB_INTEGERKEY + MDB_CREATE entryId entry or ‘HUGE entryId nbParts’
entrydn MDB_DUPSORT normalized dn Flag.entryId
index MDB_DUPSORT + MDB_DUPFIXED + MDB_CREATE PrefixIndex Flag.entryId
vlvindex MDB_DUPSORT + MDB_DUPFIXED + MDB_CREATE PrefixIndex Flag.entryId
changelog/retrochangelog MDB_CREATE csn change
#dbname MDB_CREATE bename/dbfilename openFlags
#huge MDB_CREATE bename’/’dbname’:’ContKeyId’.’n EntryId.complete Key value
    #maxKeyId max ContKeyId value

PrefixIndex is the usual index type prefix (ie: ‘=’ or ‘*’ or ‘?’ or ‘:MatchingRuleOID:’) concat with the key value Flag.entryId is:

Type Mapping

TXN and value Handling

is: if a txn is provided by dblayer: it is a write txn and should be used for the operation in the other case:

in all cases, data that are read from the db are strdup/copied in dbi_val_t
(in phase 4a no pointers towards memory mmap are kept outside db-mdb plugin)


ldbm does not implement a recno lookup as bdb does so we cannot use that . We will have to use a cursor starting from first records and counting the records we could have a cache that store count -> key every thousand of entries or so

bulk operation

There is no support for bulk read operation (not very surprising because read operations are pretty fast anyway) The interresting point is that we could avoid copy overhead for bulk operation (because in bdb the returned data are stored in a local buffer and no more used once the cursor is released so:


Here are available values

Here are what openldap monitors:

Attribute Description IMHO Notes
olmDbDirectory Path name of the directory where the database environment resides should not be a monitored value but a config one
olmMDBPagesMax Maximum number of pages  
olmMDBPagesUsed Number of pages in use  
olmMDBPagesFree Number of free pages  
olmMDBReadersMax Maximum number of readers Is also a config attribute
olmMDBReadersUsed Number of readers in use  
olmMDBEntries Number of entries in DB  

Config entry

Entry: cn=bdb,cn=config,cn=ldbm database,cn=plugins,cn=config

Parameter similar to bdb one:

Name Default Value Comment
nsslapd-db-home-directory /var/lib/dirsrv/slapd-/dbDIR>  
nsslapd-search-bypass-filter-test on More a backend parameter than a bdb one
nsslapd-serial-lock on More a backend parameter than a bdb one

mdb specific parameters:

Name Default Value Comment
nsslapd-mdb-max-size 0 0 means disk remaining size (when creating the db)
    supponted value: a number followed by a suffix (typically M/G/T)
    note: value is rounded down to a multiple of 10Mb
nsslapd-mdb-max-readers 0 0 means number of working threads + 30
nsslapd-mdb-max-dbs 128  


Ideas about future improvements

These are raw ideas (that would needs some benefit/cost evaluation)

Last modified on 6 April 2021