The Berkley Database: Difference between revisions

Content deleted Content added

Inline

Latest revision as of 23:05, 5 May 2009

Introduction to Berkeley DB

Berkeley DB is an open source library which provides high performance embedded data management with various programming languages. This library provides a simple function-call API for data access and management. The Berkeley DB originated at the University of California, Berkeley. From the beginning it has been distributed under an open source licensing agreement; the complete source code for is freely available for download and use.

Berkeley DB provides an embedded database management because it directly links into the application itself and runs in the same address space. This eliminates the need to inter-process communications in the internal machine and between external machines over the network. The cost of maintaining a communications between processes is replaced with the cost of making function calls since the latter one is much less costly. Once Berkeley DB is linked to the application the end user generally does not know that there's a database present at all.

Berkeley DB provides a simple function-call API for the programming languages like C, C++, Java, Perl, Tcl, Python, and PHP. For that any system built in these language can easily reference the Berkeley DB library manipulate the database. The library handles all of the database operations, including low-level services like locking, transaction logging, shared buffer management and memory management. This enables multiple processes or threads to use the database at the same time.

Data Management with Berkeley DB

Berkeley DB provides a relatively simple data management when compared to the commonly used modern database management software products.

All the records in Berkeley DB is as treated as key-value pairs in which value part is simply the payload of the key. Berkeley DB only operates on the key part of the record. There are only a few logical record-operations can be performed in Berkeley DB. These are:

   * Inserting a record
   * Deleting a record
   * Finding a record by its key
   * Updating a record

There is no specific record format in Berkeley DB. Key-value pairs are byte strings that can be either in fixed or variable length. Database developers can put data structures into the database before converting them to any record format. However, in order to perform storage and retrieval operations applications should know what the structure of key and value is. That is because Berkeley DB does not holds this information and should always be fed by the application. Even if Berkeley DB has this disadvantage of not providing the programmer the information on the contents or the structure of the values, it literally is limitless on the data types that can be stored in a Berkeley DB database. Berkeley is able to work on any data type determined by the programmer no matter how complex it is.

In Berkeley DB the size of the keys and values can be up to four gigabytes for that a single record can store images, audio/video streams or other large data types. Management of this larger values requires no specific management. They are simply broken into page-sized chunks, and reassembled on demand when needed.

What Berkeley DB is not

Berkeley DB is not a relational database it does not support SQL queries. Data access can only be managed by the Berkeley DB API function calls.

In relational databases, users simply can write queries in a high level language to reach the data stored in the database since database knows everything about the content and the structure of the data. This makes database management simpler and eliminates to need of programming. However, if the programmer can be supplied with enough information about how an application will access data, writing a program to manage the database considerably fastens the operations, since the overhead of query parsing, optimization, and execution is removed. This puts the workload on the programmer however when the program is written, application performs much faster.

Berkeley DB does not holds a schema in the way that relational databases do. In relational databases, schema provides data about the tables and the relationships between tables. However in Berkeley DB, there is no such a storage since every single table is treated as a database and relations between tables should be maintained by the programmer.

Berkeley DB does not know about the structure of the value part of the record for that cannot divide the value part into its constituent parts. In order to use the data stored in different parts of the value an application, which knows the data structures, should be provided. Unlike relational databases, Berkeley DB does not support indexing on the tables or any automatic management is not provided. If the programmer needs indexing s/he should implement the routines responsible for index management.

Relational databases high-level database access on the other hand Berkeley DB is a high-performance, transactional library for data storage. It is possible for the programmer to built a relational system on the top of Berkeley after creating routines managing all the relations between different record types.

Berkeley DB is not a standalone database server, it is a library running in the same address space of the application that it is called by. However, this does not prevents different applications using the same database at the same time since the library itself handles all the threads and coordination among different applications. Berkeley DB guarantees that different applications linked with the same database do not interfere each others work.

Berkeley DB can be used to build a data management server application. For example, many Lightweight Directory Access Protocol (LDAP) servers uses Berkeley DB for record storage. When LDAP clients connect to these servers to ask for records servers make Berkeley DB API calls to find records and return them to the clients. However, again it is programmer to implement the code to perform these server operations.

Access Methods in Berkeley DB

Berkeley DB provides four access methods which are Btree, Hash, Queue and Recno.

* Btree: This is a method of sorted, balances tree structure. All the operations in the tree take O(log base_b N) time where base_b is the average number of keys per page, and N is the total number of keys stored.

* Hash: This access method is an implementation of Extended Linear Hashing.

* Queue: This access method stores fixed-length records where the logical record numbers are the keys.

* Recno: This access method stores both fixed and variable-length records where the logical record numbers are the keys.

Selecting a proper access method is an important part of database management in Berkeley DB since different access methods may have different efficiency results among different applications.

Most applications using Berkeley DB chooses between Btree or Hash access methods or between Queue and Recno. These two pairs mostly provides similar functionality.

The Btree and Hash access methods should be used when logical record numbers are not the primary key used for data access. Btrees store keys in sorted manner for that there is a relationship determined by that sort order. For that, the Btree access method should be used when there is any local relation among keys.

The performance of the Hash and Btree access methods are mostly similar on small data sets, However when a data set becomes larger, the Hash access method can perform better since it contains less metadata pages than Btree databases. In Btree the metadata pages can begin to dominate the cache.

The Queue or Recno access methods are prefered when logical record numbers are the primary key used for data access. Queue access method provides record level locking and so that it supports considerably higher levels of concurrency the Recno. On the other, hand Recno access method supports variable record length and databases having a permanent flat text file storage.

Berkeley DB Databases

In Berkeley DB a database is a collection of records(key-value) pairs. Unlike high level database management systems Berkeley DB handles each collection of records as a single database. For that, when we say database we are referring to a single table of records. A database is like a table having two columns one for key one for value. Both of these columns are are managed by DBT structures. These structures are used to putting, getting and deleting records from the database they are assigned.

Opening and Closing Databases

In order to open a database, first a DB handle should be initialized with the function db_create(). After that you can open the database by calling the method open(). DB does not creates databases if they do not already exist in order to specify this behavior you should use the flag DB_CREATE flag on the open() method.

Below is a an example code on opening a Berkeley DB database:

    DB *dbp;			/* Database structure handle */
    u_int32_t flags;		/* Database open flags */
    int ret			/* Return value to check error */

    /*Initialize the DB */
    ret = db_create(&dbp, NULL, 0);
    if(ret != 0)
    { 
       /* if creating db handle failed print an error message */    
    }

    /* if database does not exist create it */
    flags = DB_CREATE;
    /* Open the database */    
    ret = dbp->open(dbp,     	/* DB structure pointer */
                       NULL,           /* Transaction pointer */
                       DBDBCELLMBR,	/* File that holds the database. */
                       NULL,           /* Optional logical database name */
                       DBTREE,         /* Database access method */
                       flags,          /* Open flags */
                       0);             /* Default file mode (using defaults) */
       if(ret != 0)
       {
           printf(MSG05);       /* if opening database failed print an error message */
           return 5;
       }

In order to close a database the DB handle close method should be called:

       ret = dbp->close(dbp, 0);
       if (ret != 0)
       {
         printf(MSG01);	/* if closing dbp failed print an error message */
         return 1;			/* Return error value of the message*/
       }

Database Open Flags

Below is a list of some of the flags that you may want to use at database open:

   * DB_CREATE: If the database does not currently exists, create it.
   * DB_EXCL: Create database exclusively. If the database the database already exists open fails. This flag is only meaningful when used with DB_CREATE.
   * DB_RDONLY: Opens the database only for read operations. Any database write attempt fails.
   * DB_TRUNCATE: Empty the on-disk file that contains the database. All databases physically contained in that file are deleted.

Databases in Environments

Environments are widely used in many Berkeley DB applications. Since all the tables are treated as different databases in Berkeley DB a different structure is needed to encapsulate multiple databases. When the databases are linked with an environment they are created in the environment's home directory. It is important to note that an environment directory must be created before any open attempts. Below is an example of a database management in environments:

       DB_ENV *db_env;        /* Environment structure handle */
       DB *dbp;               /* Database structure handle */
       u_int32_t flags;       /* Database open flags */
       u_int32_t env_flags    /* Env open flags */

       /* Create environment */
       ret = db_env_create(&db_env, 0);
       if(ret != 0)
       {
           printf(MSG02);	/* If creating environment failed print an error message */
           return 2;		/* Return error value of the message*/
       }

       /* set environment flags */
       env_flags = DB_CREATE |				/* If the environment does not exist, create it. */
                   DB_INIT_MPOOL;			/* Initialize the in-memory cache. */

       /* Open environment */
       ret = db_env->open(db_env,	/* DB_ENV ptr */
                          DBNOMENV,	/* env home directory */
                          env_flags,	/* Open flags */
                          0);          /* File mode (default) */
       if((ret != 0)
       {
           printf(MSG03);      /* If opening environment failed print an error message */
           return 3;
       }
       /* Create database handler */
       ret = db_create(&dbp, db_env, 0);
       if(ret != 0)
       {
         printf(MSG04);	/* if creating db handle failed print an error message */
         return 4;				/* Return error value of the message*/
       }
       
       flags = DB_CREATE;		/* if database does not exist create it*/
       ret = dbp->open(dbp,     	/* DB structure pointer */
                       NULL,           /* Transaction pointer */
                       DBDBCELLMBR,	/* File that holds the database. */
                       NULL,           /* Optional logical database name */
                       DBTREE,         /* Database access method */
                       flags,          /* Open flags */
                       0);             /* Default file mode (using defaults) */
       if(ret != 0)
       {
           printf(MSG05);       /* if opening database failed print an error message */
           return 5;
       }

Berkeley DB Database Records

In Berkeley DB each record is composed of key and value pairs. Both of these pairs requires DBT structures to enable the reach to data. Certain operations -using DBT structures- on database records are detailed in the following sections.

Writing Records to Database

Berkeley DB is able to store any kind of data structure in a key-value format. For that, it is up to the programmer to determine the record structures. However the common usage is to use structures to clarify the fields of records. The examples used in this document are all in C language for that we will keep on our examples using C structures.

First we need to define the structures that will form our records in the database. Below is a structure that holds information about function records:

       typedef struct _function
              {
              int	f_id;		/* Uniuqe function ID */	
              char	name[LNAME];	/* Name of the function */ 
              char	alias[LALIAS];	/* Alias for the function */
              char	description[LDESC];	/* Description of the function */ 
              int	nparams;	/* Number of parameters of the function */ 
              }s_function;

Below is the example code that can be used to write this record. Note data before performing any operations on records database should be opened successfully.

         DB_ENV *db_env_f;
         DB *dbp_f;
         DBT key, data;    /* DBT structures*/

         /* Open environment and database */
         
         /* Zero out the DBTs before using them. */
         memset(&key, 0, sizeof(DBT));
         memset(&data, 0, sizeof(DBT));
         
         key.data =&(func.name);  /* set the key of the record in this case function name 
         key.size = LNAME;
         
         data.data = &func;  /*set the value part of the record in this case the function itself */
         data.size = sizeof(s_function);

         /* Put record to the database */ 
         ret = dbp_f->put(dbp_f, NULL, &key, &data, DB_NOOVERWRITE); 
         if(ret != 0)
         {
               printf(MSG01);	/* If writing record failed print an error message */ 
               return 1;		/* Return error value of the message*/
         }
         /* Close database and environment */

Reading Records from Database

One way to retrieve records from a database is to use DB->get(). If the database supports duplicate records this method, by default, only returns the first record with the key match. Retrieving multiple records in duplicate databases will be explained in the section "Reading Records with Cursors". Below is an example code to read the record that we wrote to the database in the previous section.

    /* Declare structures holdig the records to be add */
    s_function	 func;	/* The structure holding function record */
    /* DB Structures */
    DB_ENV *db_env_f;
    DB *dbp_f;
    DBT key, data;
    char *fname = "Boltzman";  /* The key that is stored for that record */	
   
    /* Open environment and database */

    /* Initialize the C structure */
    memset(&func, 0, sizeof(s_function));
    /* Zero out the DBTs before using them. */
    memset(&key, 0, sizeof(DBT)); 
    memset(&data, 0, sizeof(DBT));

    strncpy(func.name, fname, strlen(fname)+1); /* initialize the record key */
    key.data = &func.name;     /* record key is function name */
    key.size = LNAME;
    data.data = &func;         /* record data is the structure itself */
    data.ulen = sizeof(s_function); /* data size should be set to the size of the retrieved data*/
    data.flags = DB_DBT_USERMEM;

    /* Read the function from the database */
    ret = dbp_f->get(dbp_f, NULL, &key, &data,0);  
    if (ret != 0){
        printf(MSG01);	        /* If reading record failed print an error message */
        return 1;		/* Return error value of the message*
    }
    /* Close database and environment */

If data with the specified key is found this method returns 0, if not it returns DB_NOTFOUND.

Deleting Records from the Database

You can use the method DB->del() to delete records from the database. If your database supports duplicates then this method will delete all the records with the key match. The more efficient way of deleting duplicate records is to use cursor. The section Deleting Records with Cursors will explain the cursor behavior on this issue. Below is an example code to delete the record that we previously read and write.

    /* DB Structures */
    DB_ENV *db_env;
    DB *dbp;
    DBT key, data;
    char *fname = "Boltzman";  /* The key that is stored for that record */	
   
    /* Open environment and database */

    memset(&key, 0, sizeof(DBT));
    key.data =&fname;
    key.size = LNAME;  /*note that LNAME is a predefined structure holding the length of name string*/ 
    ret = dbptr->del(dbptr, NULL, &key, 0);
    if (ret != 0){
               printf(MSG01);	/* deleting record from database failed print error message */
               return 1;		/* return error value */
    }
    /* Close database and environment */

Duplicate Record Management and Cursors

Cursor are the structures that enables you to iterate over the records in a database. In databases allowing duplicity the regular put, get methods only allows access to the first record with matching key however cursors provide an easy way to iterate over duplicate records in. This section describes the use of cursor in duplicate databases. It is also important to note that duplicate records are supported only with Btree or Hash access methods.

Allowing Duplicates in a Database

In Berkeley DB cursors are managed using DBC structures and to get a cursor for the DB handle DB->cursor() method should be called. Below is an example code on opening cursors.

Say we have a new record type parameter and it has a field function ID which is the key parameter records. Since a function may have multiple parameters the database of parameters should allow duplicate records.

       typedef struct _param{
               int 	f_id;		/* Function ID that parameter belongs to*/
               char	name[LNAME];	/* Name of the parameter */	
               Treal 	def_val;	/* Default value of the parameter */
               char	unit[LUNIT];	/* Units of the parameter */
               }s_param;

A database should be configured to support duplicates only at the database creation time. And this is done by specifying the duplicate flags to DB->set_flags() before the database is opened for the first time. Below is a list of flags that can be used to set database to support duplicates:

     * DB_DUP: The database supports non-sorted duplicate records.
     * DB_DUPSORT: The database supports sorted duplicate records.

Below is a code sample of a database configuration which supports sorted duplicate records:

       DB_ENV *db_env;        /* Environment structure handle */
       DB *kdbp;               /* Database structure handle */
       u_int32_t flags;       /* Database open flags */
       u_int32_t env_flags    /* Env open flags */

       /* Create environment */
       ret = db_env_create(&db_env, 0);
       if(ret != 0)
       {
           printf(MSG02);	/* If creating environment failed print an error message */
           return 2;		/* Return error value of the message*/
       }

       /* set environment flags */
       env_flags = DB_CREATE |			/* If the environment does not exist, create it. */
                   DB_INIT_MPOOL;		/* Initialize the in-memory cache. */

       /* Open environment */
       ret = db_env->open(db_env,	/* DB_ENV ptr */
                          DBNOMENV,	/* env home directory */
                          env_flags,	/* Open flags */
                          0);          /* File mode (default) */
       if((ret != 0)
       {
           printf(MSG03);      /* If opening environment failed print an error message */
           return 3;
       }
       ret = db_create(&kdbp, db_env, 0);
       if(ret != 0){ 
           printf(MSG04); /* if creating db handle failed print an error message */
           return 4;	  /* Return error value of the message*/
           }
       /* Configure the database for sorted duplicates */
       ret = kdbp->set_flags(kdbp, DB_DUPSORT);
       if(ret != 0){
          printf(MSG06);	/* if setting duplicate records failed print an error message */
          kdbp->close(kdbp, 0);
          return 6;	     /* Return error value of the message*/
          }
      /* Open the database determined  */
      ret = kdbp->open(kdbp,    /* DB structure pointer */
                       NULL,	 /* Transaction pointer */
                       DBPARAM, /* File that holds the database. */
                       NULL,	 /* Optional logical database name */
                       DB_BTREE,	/* Database access method */
                       flags,   	/* Open flags */
                       0);		/* Default file mode (using defaults) */

Opening and Closing Cursors

In order to use a cursor DB->cursor() method should be called. Below is an example code on how to open cursors and close cursors. before a opening cursor the database should be opened first and cursors should always be closed before closing the database.

     /* open the environment and the database */

     DBC *cursor_p;   /* Cursor needed to write duplicate parameter records */
     /* Get a cursor */
     ret = dbp_p->cursor(dbp_p, NULL, &cursor_p, 0);
     if(ret != 0){
          printf(MSG07);	/* getting cursor failed print error message */
          return 7;		/* return error value */
     }

     /* Close the cursor */
     if(cursor_p != NULL)
         cursor_p->close(cursor_p);
     /* Close the database and the environment*/

Writing Records with Cursors

Cursors provide a way of iterating over data in duplicate and non-duplicate databases. Moreover, using cursors are the easiest solution to the data management for duplicate records.The behavior of the cursor in writing is determined by the flags used in the put() method. Below is a list of flags that can be used for the DBC->put() method.

   * DB_NODUPDATA: This is for data sets supporting sorted duplicates. 
     if the key already exists this the put() returns DB_KEYEXIST. 
     If not not the record is written with the provided sort method. 
     If there is no sorting supplied to the database lexicographical sorting is used.
     This flag can only be used for the BTree and Hash access methods
   * DB_KEYFIRST: In the database supporting sorted duplicates, the record is added 
    to the location determined by the sorting method. 
     If no sort function specified then the record is added as the first of the records 
    sharing the same key.
   * DB_KEYLAST: Behaves just like DB_KEYFIRST. However, this time if no sorting function
    specified then the record is added as the last of the records sharing the same key.

When adding records into the database the cursor is the cursor is positioned at the record being inserted. Below is a simple code can be used to write duplicate records using cursors. Note that, there is no sorting method supplied for that the Berkeley DB will use lexicographical sorting as we set our Parameters database to support duplicate records in the section Allowing Duplicates in a Database.

     DB_ENV *db_env_p;
     DB *dbp_p;
     DBC *cursor_p;

     /* Open the environment and the database */
     
     /* Get a cursor */

     /* write each parameter  into the database */
     for(i = 0; i<nparams ; i++)
         {
              /* Zero out the DBTs before using them. */
              memset(&key, 0, sizeof(DBT));	
              memset(&data, 0, sizeof(DBT));
              
              key.data =&(params[i].f_id);	/* parameter key is the function ID */
              key.size = sizeof(int);	
              data.data = &params[i];
              data.size = sizeof(s_param);

              /* Put record to the database */
              ret = cursor_p->put(cursor_p, &key, &data, DB_KEYFIRST); 
              if(ret != 0){
                  printf(MSG04);   /* If writing record failed print an error message */
                  return 4;	   /* Return error value of the message*/
                } 
          }

Reading Records with Cursors

Berkeley DB cursor provide to iterate over records and read them with the matching key.Cursor are the solution in duplicate databases to read the records sharing the same key. Using cursor, you can perform a search based on key, or key and data or partial matches with key in duplicate data sets. Again, there are several flags determining the behavior of the cursor over the records. Below is a list of some of them.

   * DB_SET: Moves the cursor to the first record in the database with the key specified.
   * DB_SET_RANGE: Moves the cursor to the first record in the database whose key is greater 
   than or equal to the key specified. The comparison of keys is done with lexicographical sorting
   if any other sorting method is not provided.
   * DB_GET_BOTH: Moves the cursor to the first record in the database that uses the   
   specified key and data.
   * DB_GET_BOTH_RANGE: Moves the cursor to the first record in the database whose key     
   matches the specified key and whose data is greater than or equal to the specified data. In 
   duplicate data sets the cursor is moved to the duplicate record with the smallest data that is 
   greater than or equal to the data specified.

In order to read duplicate records from a Berkeley DB database again DBC->get() method should be called however there are several flags that can provide the desired behavior on duplicate data sets. Below is a list of some of them.

   * DB_NEXT, DB_PREV: Shows the next/previous record in the database, regardless of         
   whether it is a duplicate or not.
   * DB_NEXT_NODUP, DB_PREV_NODUP: Gets the next/previous non-duplicate record in the 
   database. So that you can skip the duplicate records.
   * DB_NEXT_DUP: Gets the next duplicate record. If the cursor is positioned at the 
   duplicate record then the get method returns DB_NOTFOUND.

Below is an example code on reading duplicate records. Here, we aim to read the duplicate parameter records that we write into the database in the previous section.

  include <db.h>
  /* include other header files required */

      DB_ENV *db_env_p;
      DB *dbp_p;
      DBT key, data;
      DBC *cursor_p;
      int f_id, int ret, int nparams;
      s_param	params[NPARAMS];	/* The list holding steady state  function parameters */

     /* initialize a value to the f_id that is the key of the parameters you are searching for*/
     /* initialize a value to the nparams for number of parameters you want to read */
 
     /* open the environment and the database */
     /* get a cursor */

     /* Set up our DBTs */
     key.data = &f_id;
     key.size = sizeof(int);
     
     /* Position the cursor to the first record with the key match*/
     ret = cursor_p->get(cursor_p, &key, &data, DB_SET);

     for(i=0; i<nparams ; i++ )
     {
          /* Zero out the DBTs before using them. */	
          memset(&data, 0, sizeof(DBT));
          data.data = &params[i];
          data.ulen = sizeof(s_param);
          data.flags = DB_DBT_USERMEM;
          /* read  record */
          ret = cursor_p->get(cursor_p, &key, &data, DB_NEXT_DUP);
          if (ret != 0){
               printf(MSG02,i);	/* If reading record failed print an error message */
               return 2;		/* Return error value of the message*/
          }  
          printf("Name: %s\n",params[i].name);
          printf("Default Value: %f\n",params[i].def_val);
          printf("Unit: %s\n",params[i].unit);
     }

Deleting Records with Cursors

In order to delete a record using a cursors just position the cursor to the record that you want to delete and call DBC->del().

  include <db.h>
  /* include other header files required */

      DB_ENV *db_env_p;
      DB *dbp_p;
      DBT key, data;
      DBC *cursor_p;
      int f_id, int ret;
      s_param	params[NPARAMS];	/* The list holding steady state  function parameters */

     /* initialize a value to the f_id that is the key of the parameters you want to delete */
     /* open the environment and the database */
     /* get a cursor */

     /* Zero out the DBTs before using them. */	
     memset(&key, 0, sizeof(DBT));
     memset(&data, 0, sizeof(DBT));

     /* initialize DBTs */
     key.data = &f_id;
     key.size = sizeof(int);

     /* Iterate over the database and deleting each record */
    while ((ret = cursor_p->get(cursor_p, &key, &data, DB_SET)) == 0){
          cursor_p->del(cursor_p, 0);
    }
    /* Close the cursor */
    /* close the database and environment */