Information systems and databases
File storage
Computers have been used since the 1950s for the storage and processing of data. An important point to note is that the main memory of a computer provides only temporary storage; any data stored in main memory is lost when the power is turned off. For the permanent storage of data, one must turn to auxiliary storage, primarily magnetic and optical media such as tapes, disks, and CDs. Data is stored on such media but must be read into main memory for processing. A major goal of information-system designers has been to develop software to locate specific data on auxiliary storage and read it efficiently into main memory for processing. The underlying structure of an information system is a set of files stored permanently on some secondary storage device. The software that comprises a file management system supports the logical breakdown of a file into records. Each record describes some thing (or entity) and consists of a number of fields, where each field gives the value of some property (or attribute) of the entity. A simple file of records is adequate for uncomplicated business data, such as an inventory of a grocery store or a collection of customer accounts.
Early file systems were always sequential, meaning that the successive records had to be processed in the order in which they were stored, starting from the beginning and proceeding down to the end. This file structure was appropriate and was in fact the only one possible when files were stored solely on large reels of magnetic tape and skipping around to access random data was not feasible. Sequential files are generally stored in some sorted order (e.g., alphabetic) for printing of reports (e.g., a telephone directory) and for efficient processing of batches of transactions. Banking transactions (deposits and withdrawals), for instance, might be sorted in the same order as the accounts file, so that as each transaction is read the system need only scan ahead (never backward) to find the accounts record to which it applies.
When so-called direct-access storage devices (DASDs; primarily magnetic disks) were developed, it became possible to access a random data block on the disk. (A data block is the unit of transfer between main memory and auxiliary storage and usually consists of several records.) Files can then be indexed so that an arbitrary record can be located and fetched (loaded into the main memory). An index of a file is much like an index of a book; it consists of a listing of identifiers that distinguish the records (e.g., names might be used to identify personnel records), along with the records’ locations. Since indexes might be long, they are usually structured in some hierarchical fashion and are navigated by using pointers, which are identifiers that contain the address (location in memory) of some item. The top level of an index, for example, might contain locations of (point to) indexes to items beginning with the letters A, B, etc. The A index itself may contain not locations of data items but pointers to indexes of items beginning with the letters Ab, Ac, and so on. Reaching the final pointer to the desired record by traversing such a treelike structure is quite rapid. File systems making use of indexes can be either purely indexed, in which case the records need be in no particular order and every individual record must have an index entry that points to the record’s location, or they can be “indexed-sequential.” In this case a sort order of the records as well as of the indexes is maintained, and index entries need only give the location of a block of sequentially ordered records. Searching for a particular record in a file is aided by maintaining secondary indexes on arbitrary attributes as well as by maintaining a primary index on the same attribute on which the file is sorted. For example, a personnel file may be sorted on (and maintain a primary index on) employee identification numbers, but it might also maintain indexes on names and departments. An indexed-sequential file system supports not only file search and manipulation commands of both a sequential and index-based nature but also the automatic creation of indexes.