The Berkley Database: Difference between revisions
| Line 61: | Line 61: | ||
The Queue or Recno access methods are prefered when logical record numbers are the primary key used for data access. Queue access method provides record level locking and so that it supports considerably higher levels of concurrency the Recno. On the other, hand Recno access method supports variable record length and databases having a permanent flat text file storage. |
The Queue or Recno access methods are prefered when logical record numbers are the primary key used for data access. Queue access method provides record level locking and so that it supports considerably higher levels of concurrency the Recno. On the other, hand Recno access method supports variable record length and databases having a permanent flat text file storage. |
||
== Berkeley DB Databases == |
|||
Revision as of 02:30, 4 May 2009
Introduction to Berkeley DB
Berkeley DB is an open source library which provides a high performance embedded data management with various programming languages. This library provides a simple function-call API for data access and management. Berkeley DB is originated at the University of California, Berkeley and from the beginning it is distributed under open source license agreement for that the complete source code for is freely available for download and use.
Berkeley DB provides an embedded database management because it directly links into the application itself and runs in the same address space. This eliminates the need to inter-process communications in the internal machine and between external machines over the network. For that the cost of maintaining a communication between processes is replaced with the cost of making function calls since the latter one is much less costly. Once Berkeley DB is linked to the application the end user generally does not know that there's a database present at all.
Berkeley DB provides a simple function-call API for the programming languages like C, C++, Java, Perl, Tcl, Python, and PHP. For that any system built in these language can easily reference to Berkeley DB library and create a database management. This library handles all the database operations inside, even the low-level services like locking, transaction logging, shared buffer management, memory management. This enables multiple processes or threads to use the database at the same time.
Data Management with Berkeley DB
Berkeley DB provides a relatively simple data management when compared to the commonly used modern database management software products.
All the records in Berkeley DB is as treated as key-value pairs in which value part is simply the payload of the key. Berkeley DB only operates on the key part of the record. There are only a few logical record-operations can be performed in Berkeley DB. These are:
* Inserting a record * Deleting a record * Finding a record by its key * Updating a record
There is no specific record format in Berkeley DB. Key-value pairs are byte strings that can be either in fixed or variable length. Database developers can put data structures into the database before converting them to any record format. However, in order to perform storage and retrieval operations applications should know what the structure of key and value is. That is because Berkeley DB does not holds this information and should always be fed by the application. Even if Berkeley DB has this disadvantage of not providing the programmer the information on the contents or the structure of the values, it literally is limitless on the data types that can be stored in a Berkeley DB database. Berkeley is able to work on any data type determined by the programmer no matter how complex it is.
In Berkeley DB the size of the keys and values can be up to four gigabytes for that a single record can store images, audio/video streams or other large data types. Management of this larger values requires no specific management. They are simply broken into page-sized chunks, and reassembled on demand when needed.
What Berkeley DB is not
Berkeley DB is not a relational database it does not support SQL queries. Data access can only be managed by the Berkeley DB API function calls.
In relational databases, users simply can write queries in a high level language to reach the data stored in the database since database knows everything about the content and the structure of the data. This makes database management simpler and eliminates to need of programming. However, if the programmer can be supplied with enough information about how an application will access data, writing a program to manage the database considerably fastens the operations, since the overhead of query parsing, optimization, and execution is removed. This puts the workload on the programmer however when the program is written, application performs much faster.
Berkeley DB does not holds a schema in the way that relational databases do. In relational databases, schema provides data about the tables and the relationships between tables. However in Berkeley DB, there is no such a storage since every single table is treated as a database and relations between tables should be maintained by the programmer.
Berkeley DB does not know about the structure of the value part of the record for that cannot divide the value part into its constituent parts. In order to use the data stored in different parts of the value an application, which knows the data structures, should be provided. Unlike relational databases, Berkeley DB does not support indexing on the tables or any automatic management is not provided. If the programmer needs indexing s/he should implement the routines responsible for index management.
Relational databases high-level database access on the other hand Berkeley DB is a high-performance, transactional library for data storage. It is possible for the programmer to built a relational system on the top of Berkeley after creating routines managing all the relations between different record types.
Berkeley DB is not a standalone database server, it is a library running in the same address space of the application that it is called by. However, this does not prevents different applications using the same database at the same time since the library itself handles all the threads and coordination among different applications. Berkeley DB guarantees that different applications linked with the same database do not interfere each others work.
Berkeley DB can be used to build a data management server application. For example, many Lightweight Directory Access Protocol (LDAP) servers uses Berkeley DB for record storage. When LDAP clients connect to these servers to ask for records servers make Berkeley DB API calls to find records and return them to the clients. However, again it is programmer to implement the code to perform these server operations.
Access Methods in Berkeley DB
Berkeley DB provides four access methods which are Btree, Hash, Queue and Recno.
* Btree: This is a method of sorted, balances tree structure. All the operations in the tree take O(log base_b N) time where base_b is the average number of keys per page, and N is the total number of keys stored.
* Hash: This access method is an implementation of Extended Linear Hashing.
* Queue: This access method stores fixed-length records where the logical record numbers are the keys.
* Recno: This access method stores both fixed and variable-length records where the logical record numbers are the keys.
Selecting a proper access method is an important part of database management in Berkeley DB since different access methods may have different efficiency results among different applications.
Most applications using Berkeley DB chooses between Btree or Hash access methods or between Queue and Recno. These two pairs mostly provides similar functionality.
The Btree and Hash access methods should be used when logical record numbers are not the primary key used for data access. Btrees store keys in sorted manner for that there is a relationship determined by that sort order. For that, the Btree access method should be used when there is any local relation among keys.
The performance of the Hash and Btree access methods are mostly similar on small data sets, However when a data set becomes larger, the Hash access method can perform better since it contains less metadata pages than Btree databases. In Btree the metadata pages can begin to dominate the cache.
The Queue or Recno access methods are prefered when logical record numbers are the primary key used for data access. Queue access method provides record level locking and so that it supports considerably higher levels of concurrency the Recno. On the other, hand Recno access method supports variable record length and databases having a permanent flat text file storage.