Database
A database is an integrated collection of logically-related records or files consolidated into a common pool that provides data for one or more multiple uses. One way of classifying databases involves the type of content, for example: bibliographic, full-text, numeric, image. Other classification methods start from examining database models or database architectures: see below. Software organizes the data in a database according to a database model. As of 2010 the relational model occurs most commonly. Other models such as the hierarchical model and the network model use a more explicit representation of relationships.
Architecture
A number of database architectures exist. Many databases use a combination of strategies.
Databases consist of software-based "containers" that are structured to collect and store information so users can retrieve, add, update or remove such information in an automatic fashion. Database programs are designed for users so that they can add or delete any information needed. The structure of a database is tabular, consisting of rows and columns of information.
Online Transaction Processing systems (OLTP) often use a "row oriented" or an "object oriented" data store architecture, whereas data-warehouse and other retrieval focused applications like Google's BigTable, or bibliographic database (library catalog) systems may use a Column oriented DBMS architecture.
Document-Oriented, XML, knowledgebases, as well as frame databases and RDF-stores (also known as triple stores), may also use a combination of these architectures in their implementation.
Not all databases have or need a database schema ("schema-less databases").
Over many years general-purpose database systems have dominated the database industry. These offer a wide range of functions, applicable to many, if not most circumstances in modern data processing. These have been enhanced with extensible datatypes (pioneered in the PostgreSQL project) to allow development of a very wide range of applications.
There are also other types of databases which cannot be classified as relational databases. Most notable is the object database management system, which stores language objects natively without using a separate data definition language and without translating into a separate storage schema. Unlike relational systems, these object databases store the relationship between complex data types as part of their storage model in a way that does not require runtime calculation of related data using relational algebra execution algorithms.
Database management systems
Main article: Database management system
A database management system (DBMS) consists of software that organizes the storage of data. A DBMS controls the creation, maintenance, and use of the database storage structures of social organizations and of their users. It allows organizations to place control of organization wide database development in the hands of Database Administrators (DBAs) and other specialists. In large systems, a DBMS allows users and other software to store and retrieve data in a structured way.
Database management systems are usually categorized according to the database model that they support, such as the network, relational or object model. The model tends to determine the query languages that are available to access the database. One commonly used query language for the relational database is SQL, although SQL syntax and function can vary from one DBMS to another. A common query language for the object database is OQL, although not all vendors of object databases implement this, majority of them do implement this method. A great deal of the internal engineering of a DBMS is independent of the data model, and is concerned with managing factors such as performance, concurrency, integrity, and recovery from hardware failures. In these areas there are large differences between the products.
A relational database management system (RDBMS) implements features of the relational model. In this context, Date's "Information Principle" states: "the entire information content of the database is represented in one and only one way. Namely as explicit values in column positions (attributes) and rows in relations (tuples). Therefore, there are no explicit pointers between related tables." This contrasts with the object database management system (ODBMS), which does store explicit pointers between related types.
Components of DBMS
According to the wikibooks open-content textbooks, "Design of Main Memory Database System/Overview of DBMS", most DBMS as of 2009 implement a relational model. Other less-used DBMS systems, such as the object DBMS, generally operate in areas of application-specific data management where performance and scalability take higher priority than the flexibility of ad hoc query capabilities provided via the relational-algebra execution algorithms of a relational DBMS.
RDBMS components
• Interface drivers - A user or application program initiates either schema modification or content modification. These drivers[which?] are built on top of SQL. They provide methods to prepare statements, execute statements, fetch results, etc. Examples include DDL, DCL, DML, ODBC, and JDBC. Some vendors provide language-specific proprietary interfaces. For example MySQL provides drivers for PHP, Python, etc.
• SQL engine - This component interprets and executes the SQL query. It comprises three major components (compiler, optimizer, and execution engine).
• Transaction engine - Transactions are sequences of operations that read or write database elements, which are grouped together.
• Relational engine - Relational objects such as Table, Index, and Referential integrity constraints are implemented in this component.
• Storage engine - This component stores and retrieves data records. It also provides a mechanism to store metadata and control information such as undo logs, redo logs, lock tables, etc.
[edit] ODBMS components
• Language drivers - A user or application program initiates either schema modification or content modification via the chosen programming language. The drivers then provide the mechanism to manage object lifecycle coupling of the application memory space with the underlying persistent storage. Examples include C++, Java, .NET, and Ruby.
• Query engine - This component interprets and executes language-specific query commands in the form of OQL, LINQ, JDOQL, JPAQL, others. The query engine returns language specific collections of objects which satisfy a query predicate expressed as logical operators e.g. >, <, >=, <=, AND, OR, NOT, GroupBY, etc.
• Transaction engine - Transactions are sequences of operations that read or write database elements, which are grouped together. The transaction engine is concerned with such things as data isolation and consistency in the driver cache and data volumes by coordinating with the storage engine.
• Storage engine - This component stores and retrieves objects in an arbitrarily complex model. It also provides a mechanism to manage and store metadata and control information such as undo logs, redo logs, lock graphs,
Primary tasks of DBMS packages
• Database Development: used to define and organize the content, relationships, and structure of the data needed to build a database.
• Database Interrogation: can access the data in a database for information retrieval and report generation. End users can selectively retrieve and display information and produce printed reports and documents.
• Database Maintenance: used to add, delete, update, correct, and protect the data in a database.
• Application Development: used to develop prototypes of data entry screens, queries, forms, reports, tables, and labels for a prototyped application. Or use 4GL or 4th Generation Language or application generator to develop program codes.
Types
Operational database
These databases store detailed data needed to support the operations of an entire organization. They are also called subject-area databases (SADB), transaction databases, and production databases. For example:
• customer databases
• personal database
• inventory databases
• accounting databases
Analytical database
These databases store data and information extracted from selected operational and external databases. They consist of summarized data and information most needed by an organization's management and other[which?] end-users. Some people refer to analytical databases as multidimensional databases, management databases, or information databases.
Data warehouse
A data warehouse stores data from current and previous years — data extracted from the various operational databases of an organization. It becomes the central source of data that has been screened, edited, standardized and integrated so that it can be used by managers and other end-user professionals throughout an organization. Data warehouses are characterized by being slow to insert into but fast to retrieve from. Recent developments in data warehousing have led to the use of a Shared nothing architecture to facilitate extreme scaling.
Distributed database
These are databases of local work-groups and departments at regional offices, branch offices, manufacturing plants and other work sites. These databases can include segments of both common operational and common user databases, as well as data generated and used only at a user’s own site.
End-user database
These databases consist of a variety of data files developed by end-users at their workstations. Examples of these are collections of documents in spreadsheets, word processing and even downloaded files.
External database
These databases provide access to external, privately-owned data online — available for a fee to end-users and organizations from commercial services. Access to a wealth of information from external database is available for a fee from commercial online services and with or without charge from many sources in the Internet.
Hypermedia databases on the web
These are a set of interconnected multimedia pages at a web-site. They consist of a home page and other hyperlinked pages[citation needed] of multimedia or mixed media such as text, graphic, photographic images, video clips, audio etc.
Navigational database
In navigational databases, queries find objects primarily by following references from other objects. Traditionally navigational interfaces are procedural, though one could characterize some modern systems like XPath as being simultaneously navigational and declarative.
In-memory databases
In-memory databases primarily rely on main memory for computer data storage. This contrasts with database management systems which employ a disk-based storage mechanism. Main memory databases are faster than disk-optimized databases since[citation needed] the internal optimization algorithms are simpler and execute fewer CPU instructions. Accessing data in memory provides faster and more predictable performance than disk. In applications where response time is critical, such as telecommunications network equipment that operates emergency systems, main memory databases are often used.
Document-oriented databases
Main article: Document-oriented database
Document-oriented databases are computer programs designed for document-oriented applications. These systems may be implemented as a layer above a relational database or an object database. As opposed to relational databases, document-based databases do not store data in tables with uniform sized fields for each record. Instead, they store each record as a document that has certain characteristics. Any number of fields of any length can be added to a document. Fields can also contain multiple pieces of data.
Real-time databases
A real-time database is a processing system designed to handle workloads whose state may change constantly. This differs from traditional databases containing persistent data, mostly unaffected by time. For example, a stock market changes rapidly and dynamically. Real-time processing means that a transaction is processed fast enough for the result to come back and be acted on right away. Real-time databases are useful for accounting, banking, law, medical records, multi-media, process control, reservation systems, and scientific data analysis. As computers increase in power and can store more data, real-time databases become integrated into society and are employed in many applications.
A database is an integrated collection of logically-related records or files consolidated into a common pool that provides data for one or more multiple uses. One way of classifying databases involves the type of content, for example: bibliographic, full-text, numeric, image. Other classification methods start from examining database models or database architectures: see below. Software organizes the data in a database according to a database model. As of 2010 the relational model occurs most commonly. Other models such as the hierarchical model and the network model use a more explicit representation of relationships.
Architecture
A number of database architectures exist. Many databases use a combination of strategies.
Databases consist of software-based "containers" that are structured to collect and store information so users can retrieve, add, update or remove such information in an automatic fashion. Database programs are designed for users so that they can add or delete any information needed. The structure of a database is tabular, consisting of rows and columns of information.
Online Transaction Processing systems (OLTP) often use a "row oriented" or an "object oriented" data store architecture, whereas data-warehouse and other retrieval focused applications like Google's BigTable, or bibliographic database (library catalog) systems may use a Column oriented DBMS architecture.
Document-Oriented, XML, knowledgebases, as well as frame databases and RDF-stores (also known as triple stores), may also use a combination of these architectures in their implementation.
Not all databases have or need a database schema ("schema-less databases").
Over many years general-purpose database systems have dominated the database industry. These offer a wide range of functions, applicable to many, if not most circumstances in modern data processing. These have been enhanced with extensible datatypes (pioneered in the PostgreSQL project) to allow development of a very wide range of applications.
There are also other types of databases which cannot be classified as relational databases. Most notable is the object database management system, which stores language objects natively without using a separate data definition language and without translating into a separate storage schema. Unlike relational systems, these object databases store the relationship between complex data types as part of their storage model in a way that does not require runtime calculation of related data using relational algebra execution algorithms.
Database management systems
Main article: Database management system
A database management system (DBMS) consists of software that organizes the storage of data. A DBMS controls the creation, maintenance, and use of the database storage structures of social organizations and of their users. It allows organizations to place control of organization wide database development in the hands of Database Administrators (DBAs) and other specialists. In large systems, a DBMS allows users and other software to store and retrieve data in a structured way.
Database management systems are usually categorized according to the database model that they support, such as the network, relational or object model. The model tends to determine the query languages that are available to access the database. One commonly used query language for the relational database is SQL, although SQL syntax and function can vary from one DBMS to another. A common query language for the object database is OQL, although not all vendors of object databases implement this, majority of them do implement this method. A great deal of the internal engineering of a DBMS is independent of the data model, and is concerned with managing factors such as performance, concurrency, integrity, and recovery from hardware failures. In these areas there are large differences between the products.
A relational database management system (RDBMS) implements features of the relational model. In this context, Date's "Information Principle" states: "the entire information content of the database is represented in one and only one way. Namely as explicit values in column positions (attributes) and rows in relations (tuples). Therefore, there are no explicit pointers between related tables." This contrasts with the object database management system (ODBMS), which does store explicit pointers between related types.
Components of DBMS
According to the wikibooks open-content textbooks, "Design of Main Memory Database System/Overview of DBMS", most DBMS as of 2009 implement a relational model. Other less-used DBMS systems, such as the object DBMS, generally operate in areas of application-specific data management where performance and scalability take higher priority than the flexibility of ad hoc query capabilities provided via the relational-algebra execution algorithms of a relational DBMS.
RDBMS components
• Interface drivers - A user or application program initiates either schema modification or content modification. These drivers[which?] are built on top of SQL. They provide methods to prepare statements, execute statements, fetch results, etc. Examples include DDL, DCL, DML, ODBC, and JDBC. Some vendors provide language-specific proprietary interfaces. For example MySQL provides drivers for PHP, Python, etc.
• SQL engine - This component interprets and executes the SQL query. It comprises three major components (compiler, optimizer, and execution engine).
• Transaction engine - Transactions are sequences of operations that read or write database elements, which are grouped together.
• Relational engine - Relational objects such as Table, Index, and Referential integrity constraints are implemented in this component.
• Storage engine - This component stores and retrieves data records. It also provides a mechanism to store metadata and control information such as undo logs, redo logs, lock tables, etc.
[edit] ODBMS components
• Language drivers - A user or application program initiates either schema modification or content modification via the chosen programming language. The drivers then provide the mechanism to manage object lifecycle coupling of the application memory space with the underlying persistent storage. Examples include C++, Java, .NET, and Ruby.
• Query engine - This component interprets and executes language-specific query commands in the form of OQL, LINQ, JDOQL, JPAQL, others. The query engine returns language specific collections of objects which satisfy a query predicate expressed as logical operators e.g. >, <, >=, <=, AND, OR, NOT, GroupBY, etc.
• Transaction engine - Transactions are sequences of operations that read or write database elements, which are grouped together. The transaction engine is concerned with such things as data isolation and consistency in the driver cache and data volumes by coordinating with the storage engine.
• Storage engine - This component stores and retrieves objects in an arbitrarily complex model. It also provides a mechanism to manage and store metadata and control information such as undo logs, redo logs, lock graphs,
Primary tasks of DBMS packages
• Database Development: used to define and organize the content, relationships, and structure of the data needed to build a database.
• Database Interrogation: can access the data in a database for information retrieval and report generation. End users can selectively retrieve and display information and produce printed reports and documents.
• Database Maintenance: used to add, delete, update, correct, and protect the data in a database.
• Application Development: used to develop prototypes of data entry screens, queries, forms, reports, tables, and labels for a prototyped application. Or use 4GL or 4th Generation Language or application generator to develop program codes.
Types
Operational database
These databases store detailed data needed to support the operations of an entire organization. They are also called subject-area databases (SADB), transaction databases, and production databases. For example:
• customer databases
• personal database
• inventory databases
• accounting databases
Analytical database
These databases store data and information extracted from selected operational and external databases. They consist of summarized data and information most needed by an organization's management and other[which?] end-users. Some people refer to analytical databases as multidimensional databases, management databases, or information databases.
Data warehouse
A data warehouse stores data from current and previous years — data extracted from the various operational databases of an organization. It becomes the central source of data that has been screened, edited, standardized and integrated so that it can be used by managers and other end-user professionals throughout an organization. Data warehouses are characterized by being slow to insert into but fast to retrieve from. Recent developments in data warehousing have led to the use of a Shared nothing architecture to facilitate extreme scaling.
Distributed database
These are databases of local work-groups and departments at regional offices, branch offices, manufacturing plants and other work sites. These databases can include segments of both common operational and common user databases, as well as data generated and used only at a user’s own site.
End-user database
These databases consist of a variety of data files developed by end-users at their workstations. Examples of these are collections of documents in spreadsheets, word processing and even downloaded files.
External database
These databases provide access to external, privately-owned data online — available for a fee to end-users and organizations from commercial services. Access to a wealth of information from external database is available for a fee from commercial online services and with or without charge from many sources in the Internet.
Hypermedia databases on the web
These are a set of interconnected multimedia pages at a web-site. They consist of a home page and other hyperlinked pages[citation needed] of multimedia or mixed media such as text, graphic, photographic images, video clips, audio etc.
Navigational database
In navigational databases, queries find objects primarily by following references from other objects. Traditionally navigational interfaces are procedural, though one could characterize some modern systems like XPath as being simultaneously navigational and declarative.
In-memory databases
In-memory databases primarily rely on main memory for computer data storage. This contrasts with database management systems which employ a disk-based storage mechanism. Main memory databases are faster than disk-optimized databases since[citation needed] the internal optimization algorithms are simpler and execute fewer CPU instructions. Accessing data in memory provides faster and more predictable performance than disk. In applications where response time is critical, such as telecommunications network equipment that operates emergency systems, main memory databases are often used.
Document-oriented databases
Main article: Document-oriented database
Document-oriented databases are computer programs designed for document-oriented applications. These systems may be implemented as a layer above a relational database or an object database. As opposed to relational databases, document-based databases do not store data in tables with uniform sized fields for each record. Instead, they store each record as a document that has certain characteristics. Any number of fields of any length can be added to a document. Fields can also contain multiple pieces of data.
Real-time databases
A real-time database is a processing system designed to handle workloads whose state may change constantly. This differs from traditional databases containing persistent data, mostly unaffected by time. For example, a stock market changes rapidly and dynamically. Real-time processing means that a transaction is processed fast enough for the result to come back and be acted on right away. Real-time databases are useful for accounting, banking, law, medical records, multi-media, process control, reservation systems, and scientific data analysis. As computers increase in power and can store more data, real-time databases become integrated into society and are employed in many applications.