Types of database models
File systems of varying degrees of sophistication satisfied the need for information storage and processing for several years. However, large enterprises tended to build many independent files containing related and even overlapping data, and data-processing activities frequently required the linking of data from several files. It was natural, then, to design data structures and database management systems that supported the automatic linkage of files. Three database models were developed to support the linkage of records of different types. These are: (1) the hierarchical model, in which record types are linked in a treelike structure (e.g., employee records might be grouped under a record describing the departments in which employees work); (2) the network model, in which arbitrary linkages of record types may be created (e.g., employee records might be linked on one hand to employees’ departments and on the other hand to their supervisors—that is, other employees); and (3) the relational model, in which all data are represented in simple tabular form.
In the relational model, the description of a particular entity is provided by the set of its attribute values, stored as one row of the table, or relation. This linkage of n attribute values to provide a meaningful description of a real-world entity or a relationship among such entities forms a mathematical n-tuple; in database terminology, it is simply called a tuple. The relational approach also supports queries (requests for information) that involve several tables by providing automatic linkage across tables by means of a “join” operation that combines records with identical values of common attributes. Payroll data, for example, could be stored in one table and personnel benefits data in another; complete information on an employee could be obtained by joining the tables on the employee’s identification number. To support any of these database structures, a large piece of software known as a database management system (DBMS) is required to handle the storage and retrieval of data (via the file management system, since the data are physically stored as files on magnetic disk) and to provide the user with commands to query and update the database. The relational approach is currently the most popular, as older hierarchical data management systems, such as IMS, the information management system produced by IBM, are being replaced by relational database management systems such as IBM’s large mainframe system DB2 or the Oracle Corporation’s DBMS, which runs on large servers. Relational DBMS software is also available for workstations and personal computers.
The need for more powerful and flexible data models to support nonbusiness applications (e.g., scientific or engineering applications) has led to extended relational data models in which table entries need not be simple values but can be programs, text, unstructured data in the form of binary large objects (BLOBs), or any other format the user requires. Another development has been the incorporation of the object concept that has become significant in programming languages. In object-oriented databases, all data are objects. Objects may be linked together by an “is-part-of” relationship to represent larger, composite objects. Data describing a truck, for instance, may be stored as a composite of a particular engine, chassis, drive train, and so forth. Classes of objects may form a hierarchy in which individual objects may inherit properties from objects farther up in the hierarchy. For example, objects of the class “motorized vehicle” all have an engine; members of subclasses such as “truck” or “airplane” will then also have an engine. Furthermore, engines are also data objects, and the engine attribute of a particular vehicle will be a link to a specific engine object. Multimedia databases, in which voice, music, and video are stored along with the traditional textual information, are becoming increasingly important and also are providing an impetus toward viewing data as objects, as are databases of pictorial images such as photographs or maps. The future of database technology is generally perceived to be a merging of the relational and object-oriented views.
Data integrity
Integrity is a major database issue. In general, integrity refers to maintaining the correctness and consistency of the data. Some integrity checking is made possible by specifying the data type of an item. For example, if an identification number is specified to be nine digits, the DBMS may reject an update attempting to assign a value with more or fewer digits or one including an alphabetic character. Another type of integrity, known as referential integrity, requires that an entity referenced by the data for some other entity must itself exist in the database. For example, if an airline reservation is requested for a particular flight number, then the flight referenced by that number must actually exist. Although one could imagine integrity constraints that limit the values of data items to specified ranges (to prevent the famous “computer errors” of the type in which a $10 check is accidentally issued as $10,000), most database management systems do not support such constraints but leave them to the domain of the application program.
Access to a database by multiple simultaneous users requires that the DBMS include a concurrency control mechanism to maintain the consistency of the data in spite of the possibility that a user may interfere with the updates attempted by another user. For example, two travel agents may try to book the last seat on a plane at more or less the same time. Without concurrency control, both may think they have succeeded, while only one booking is actually entered into the database. A key concept in studying concurrency control and the maintenance of database correctness is the transaction, defined as a sequence of operations on the data that transform the database from one consistent state into another. To illustrate the importance of this concept, consider the simple example of an electronic transfer of funds (say $5) from bank account A to account B. The operation that deducts $5 from account A leaves the database inconsistent in that the total over all accounts is $5 short. Similarly, the operation that adds $5 to account B in itself makes the total $5 too much. Combining these two operations, however, yields a valid transaction. The key to maintaining database correctness is therefore to ensure that only complete transactions are applied to the data and that multiple concurrent transactions are executed (under a concurrency control mechanism) in such a way that a serial order can be defined that would produce the same results. A transaction-oriented control mechanism for database access becomes difficult in the case of so-called long transactions—for example, when several engineers are working, perhaps over the course of several days, on a product design that may not reach a consistent state until the project is complete. The best approach to handling long transactions is a current area of database research.
As discussed above, databases may be distributed, in the sense that data reside at different host computers on a network. Distributed data may or may not be replicated, but in any case the concurrency-control problem is magnified. Distributed databases must have a distributed database management system to provide overall control of queries and updates in a manner that ideally does not require that the user know the location of the data. The attainment of the ideal situation, in which various databases fall under the unified control of a distributed DBMS, has been slowed both by technical problems and by such practical problems as heterogeneous hardware and software and database owners who desire local autonomy. Increasing mention is being made of more loosely linked collections of data, known by such names as multidatabases or federated databases. A closely related concept is interoperability, meaning the ability of the user of one member of a group of disparate systems (all having the same functionality) to work with any of the systems of the group with equal ease and via the same interface. In the case of database management systems, interoperability means the ability of users to formulate queries to any one of a group of independent, autonomous database management systems using the same language, to be provided with a unified view of the contents of all the individual databases, to formulate queries that may require fetching data via more than one of the systems, and to be able to update data stored under any member of the group. Many of the problems of distributed databases are the problems of distributed systems in general. Thus distributed databases may be designed as client-server systems, with middleware easing the heterogeneity problems.
Database security
Security is another important database issue. Data residing on a computer is under threat of being stolen, destroyed, or modified maliciously. This is true whenever the computer is accessible to multiple users but is particularly significant when the computer is accessible over a network. The first line of defense is to allow access to a computer only to authorized, trusted users and to authenticate those users by a password or similar mechanism. But clever programmers have learned how to evade such mechanisms, designing, for example, so-called computer viruses—programs that replicate themselves and spread among the computers in a network, “infecting” systems and potentially destroying files. Data can be stolen by devices such as “Trojan horses”—programs that carry out some useful task but contain hidden malicious code—or by simply eavesdropping on network communications. The need to protect sensitive data (e.g., for national security) has led to extensive research in cryptography and the development of encryption standards for providing a high level of confidence that the data is safe from decoding by even the most powerful computer attacks. The term computer theft, however, usually refers not to theft of information from a computer but rather to theft by use of a computer, typically by modifying data. If a bank’s records are not adequately secure, for example, someone could set up a false account and transfer money into it from valid accounts for later withdrawal.