Computer Programming - Overview
Why Programming?
You may already have used software, perhaps for word processing or spreadsheets, to solve problems. Perhaps now you are curious to learn how programmers write software. A program is a set of step-by-step instructions that directs the computer to do the tasks you want it to do and produce the results you want.
There are at least three good reasons for learning programming:
A set of rules that provides a way of telling a computer what operations to perform is called a programming language. There is not, however, just one programming language; there are many. In this chapter you will learn about controlling a computer through the process of programming. You may even discover that you might want to become a programmer.
An important point before we proceed: You will not be a programmer when you finish reading this chapter or even when you finish reading the final chapter. Programming proficiency takes practice and training beyond the scope of this book. However, you will become acquainted with how programmers develop solutions to a variety of problems.
What Programmers Do
In general, the programmer's job is to convert problem solutions into instructions for the computer. That is, the programmer prepares the instructions of a computer program and runs those instructions on the computer, tests the program to see if it is working properly, and makes corrections to the program. The programmer also writes a report on the program. These activities are all done for the purpose of helping a user fill a need, such as paying employees, billing customers, or admitting students to college.
The programming activities just described could be done, perhaps, as solo activities, but a programmer typically interacts with a variety of people. For example, if a program is part of a system of several programs, the programmer coordinates with other programmers to make sure that the programs fit together well. If you were a programmer, you might also have coordination meetings with users, managers, systems analysts, and with peers who evaluate your work-just as you evaluate theirs.
Let us turn to the programming process.
The Programming Process
Developing a program involves steps similar to any problem-solving task. There are five main ingredients in the programming process:
Let us discuss each of these in turn.
Defining the Problem
Planning the Solution
Coding the Program
Testing the Program
Documenting the Program
Programming as a Career
There is a shortage of qualified personnel in the computer field. Before you join their ranks, consider the advantages of the computer field and what it takes to succeed in it.
The Joys of the Field
Although many people make career changes into the computer field, few choose to leave it. In fact, surveys of computer professionals, especially programmers, consistently report a high level of job satisfaction. There are several reasons for this contentment. One is the challenge-most jobs in the computer industry are not routine. Another is security, since established computer professionals can usually find work. And that work pays well-you will probably not be rich, but you should be comfortable. The computer industry has historically been a rewarding place for women and minorities. And, finally, the industry holds endless fascination since it is always changing.
What It Takes
You need, of course, some credentials, most often a two- or four-year degree in computer information systems or computer science. The requirements and salaries vary by the organization and the region, so we will not dwell on these here. Beyond that, the person most likely to land a job and move up the career ladder is the one with excellent communication skills, both oral and written . These are also the qualities that can be observed by potential employers in an interview. Promotions are sometimes tied to advanced degrees (an M.B.A. or an M.S. in computer science).
Open Doors
The overall outlook for the computer field is promising. The Bureau of Labor Statistics shows, through the year 2010, a 72 percent increase in programmers and a 69 percent increase in system use today, and we will discuss the most popular ones later In the chapter. Before we turn to specific languages, however, we need to discuss levels of language.
Levels of Language
Programming languages are said to be "lower" or "higher," depending on how close they are to the language the computer itself uses (Os and 1s = low) or to the language people use (more English-like-high). We will consider five levels of language. They are numbered 1 through 5 to correspond to levels, or generations. In terms of ease of use and capabilities, each generation is an improvement over its predecessors. The five generations of languages are
Let us look at each of these categories.
Machine Language
Humans do not like to deal in numbers alone-they prefer letters and words. But, strictly speaking, numbers are what machine language is. This lowest level of language, machine language, represents data and program instructions as 1s and Os-binary digits corresponding to the on and off electrical states in the computer. Each type of computer has its own machine language. In the early days of computing, programmers had rudimentary systems for combining numbers to represent instructions such as add and compare. Primitive by today's standards, the programs were not convenient for people to read and use. The computer industry quickly moved to develop assembly languages.
Assembly Languages
Today, assembly languages are considered very low level-that is, they are not as convenient for people to use as more recent languages. At the time they were developed, however, they were considered a great leap forward. To replace the Is and Os used in machine language, assembly languages use mnemonic codes, abbreviations that are easy to remember: A for Add, C for Compare, MP for Multiply, STO for storing information in memory, and so on. Although these codes are not English words, they are still- from the standpoint of human convenience-preferable to numbers (Os and 1s) alone. Furthermore, assembly languages permit the use of names- perhaps RATE or TOTAL-for memory locations instead of actual address numbers. just like machine language, each type of computer has its own assembly language.
The programmer who uses an assembly language requires a translator to convert the assembly language program into machine language. A translator is needed because machine language is the only language the computer can actually execute. The translator is an assembler program, also referred to as an assembler. It takes the programs written in assembly language and turns them into machine language. Programmers need not worry about the translating aspect; they need only write programs in assembly language. The translation is taken care of by the assembler.
Although assembly languages represent a step forward, they still have many disadvantages. A key disadvantage is that assembly language is detailed in the extreme, making assembly programming repetitive, tedious, and error prone. This drawback is apparent in the program in Figure 2. Assembly language may be easier to read than machine language, but it is still tedious.
High-Level Languages
The first widespread use of high-level languages in the early 1960s transformed programming into something quite different from what it had been. Programs were written in an English-like manner, thus making them more convenient to use. As a result, a programmer could accomplish more with less effort, and programs could now direct much more complex tasks.
These so-called third-generation languages spurred the great increase in data processing that characterized the 1960s and 1970s. During that time the number of mainframes in use increased from hundreds to tens of thousands. The impact of third-generation languages on our society has been enormous.
Of course, a translator is needed to translate the symbolic statements of a high-level language into computer-executable machine language; this translator is usually a compiler. There are many compilers for each language and one for each type of computer. Since the machine language generated by one computer's COBOL compiler, for instance, is not the machine language of some other computer, it is necessary to have a COBOL compiler for each type of computer on which COBOL programs are to be run. Keep in mind, however, that even though a given program would be compiled to different machine language versions on different machines, the source program itself-the COBOL version-can be essentially identical on each machine.
Some languages are created to serve a specific purpose, such as controlling industrial robots or creating graphics. Many languages, however, are extraordinarily flexible and are considered to be general-purpose. In the past the majority of programming applications were written in BASIC, FORTRAN, or COBOL-all general-purpose languages. In addition to these three, another popular high-level language is C, which we will discuss later.
Very High-Level Languages
Languages called very high-level languages are often known by their generation number, that is, they are called fourth-generation languages or, more simply, 4GLs.
Definition
Will the real fourth-generation languages please stand up? There is no consensus about what constitutes a fourth-generation language. The 4GLs are essentially shorthand programming languages. An operation that requires hundreds of lines in a third-generation language such as COBOL typically requires only five to ten lines in a 4GL. However, beyond the basic criterion of conciseness, 4GLs are difficult to describe.
Characteristics
Fourth-generation languages share some characteristics. The first is that they make a true break with the prior generation-they are basically non-procedural. A procedural language tells the computer how a task is done: Add this, compare that, do this if something is true, and so forth-a very specific step-by-step process. The first three generations of languages are all procedural. In a nonprocedural language, the concept changes. Here, users define only what they want the computer to do; the user does not provide the details of just how it is to be done. Obviously, it is a lot easier and faster just to say what you want rather than how to get it. This leads us to the issue of productivity, a key characteristic of fourth-generation languages.
Productivity
Folklore has it that fourth-generation languages can improve productivity by a factor of 5 to 50. The folklore is true. Most experts say the average improvement factor is about 10-that is, you can be ten times more productive in a fourth-generation language than in a third-generation language. Consider this request: Produce a report showing the total units sold for each product, by customer, in each month and year, and with a subtotal for each customer. In addition, each new customer must start on a new page. A 4GL request looks something like this:
TABLE FILE SALES
SUM UNITS BY MONTH BY CUSTOMER BY PRODUCT
ON CUSTOMER SUBTOTAL PAGE BREAK
END
Even though some training is required to do even this much, you can see that it is pretty simple. The third-generation language COBOL, however, typically requires over 500 statements to fulfill the same request. If we define productivity as producing equivalent results in less time, then fourth-generation languages clearly increase productivity.
Downside
Fourth-generation languages are not all peaches and cream and productivity. The 4GLs are still evolving, and that which is still evolving cannot be fully defined or standardized. What is more, since many 4GLs are easy to use, they attract a large number of new users, who may then overcrowd the computer system. One of the main criticisms is that the new languages lack the necessary control and flexibility when it comes to planning how you want the output to look. A common perception of 4GLs is that they do not make efficient use of machine resources; however, the benefits of getting a program finished more quickly can far outweigh the extra costs of running it.
Benefits
Fourth-generation languages are beneficial because
It was not long ago that few people believed that 4GLs would ever be able to replace third-generation languages. These 4GL languages are being used, but in a very limited way.
Query Languages
A variation on fourth-generation languages are query languages, which can be used to retrieve information from databases. Data is usually added to databases according to a plan, and planned reports may also be produced. But what about a user who needs an unscheduled report or a report that differs somehow from the standard reports? A user can learn a query language fairly easily and then be able to input a request and receive the resulting report right on his or her own terminal or personal computer. A standardized query language, which can be used with several different commercial database programs, is Structured Query Language, popularly known as SQL. Other popular query languages are Query-by-Example, known as QBE, and Intellect.
Natural Languages
The word "natural" has become almost as popular in computing circles as it has in the supermarket. Fifth-generation languages are, as you may guess, even more ill-defined than fourth-generation languages. They are most often called natural languages because of their resemblance to the "natural" spoken English language. And, to the manager new to computers for whom these languages are now aimed, natural means human-like. Instead of being forced to key correct commands and data names in correct order, a manager tells the computer what to do by keying in his or her own words.
A manager can say the same thing any number of ways. For example, "Get me tennis racket sales for January" works just as well as "I want January tennis racket revenues." Such a request may contain misspelled words, lack articles and verbs, and even use slang. The natural language translates human instructions-bad grammar, slang, and all-into code the computer understands. If it is not sure what the user has in mind, it politely asks for further explanation.
Natural languages are sometimes referred to as knowledge-based languages, because natural languages are used to interact with a base of knowledge on some subject. The use of a natural language to access a knowledge base is called a knowledge-based system.
Consider this request that could be given in the 4GL Focus: "SUM ORDERS BY DATE BY REGION." If we alter the request and, still in Focus, say something like "Give me the dates and the regions after you've added up the orders," the computer will spit back the user-friendly version of "You've got to be kidding" and give up. But some natural languages can handle such a request. Users can relax the structure of their requests and increase the freedom of their interaction with the data.
Here is a typical natural language request:
REPORT THE BASE SALARY, COMMISSIONS AND YEARS OF
SERVICE BROKEN DOWN BY STATE AND CITY FOR SALESCLERKS
IN NEW JERSEY AND MASSACHUSETTS.
You can hardly get closer to conversational English than that.
An example of a natural language is shown in Figure 3. Natural languages excel at easy data access. Indeed, the most common application for natural languages is interacting with databases.
Choosing a Language
How do you choose the language with which to write your program?
There are several possibilities:
Major Programming Languages
The following sections on individual languages will give you an overview of the third-generation languages in common use today: FORTRAN (a scientific language), COBOL (a business language), BASIC (simple language used for education and business), Pascal (education), Ada (military), and C (general purposed).
This chapter will present programs written in some of these languages. You will also see output produced by each program. Each program is designed to find the average of three numbers; the resulting average is shown in the sample output matching each program. Since all programs perform the same task, you will see some of the differences and similarities among the languages. We do not expect you to understand these programs; they are here merely to let you glimpse each language. Figure 4 presents the flowchart and pseudocode for the task of averaging numbers. As we discuss each language, we will provide a program for averaging numbers that follows the logic shown in this figure.
FORTRAN: The First High-Level Language
Developed by IBM and introduced in 1954, FORTRAN-for FORmula TRANslator-was the first high-level language. FORTRAN is a scientifically oriented language-in the early days use of the computer was primarily associated with engineering, mathematical, and scientific research tasks.
FORTRAN is noted for its brevity, and this characteristic is part of the reason why it remains popular. This language is very good at serving its primary purpose, which is execution of complex formulas such as those used in economic analysis and engineering. Although in the past it was considered limited in regard to file processing or data processing, its capabilities have been greatly improved.
Not all programs are organized in the same way. Organization varies according to the language used. In many languages (such as COBOL), programs are divided into a series of parts. FORTRAN programs are not composed of different parts (although it is possible to link FORTRAN programs together); a FORTRAN program consists of statements one after the other. Different types of data are identified as the data is used. Descriptions for data records appear in format statements that accompany the READ and WRITE statements. Figure 5 shows a FORTRAN program and a sample output from the program.
COBOL: The Language of Business
In the 1950s FORTRAN had been developed, but there was still no accepted high-level programming language appropriate for business. The U.S. Department of Defense in particular was interested in creating such a standardized language, and so it called together representatives from government and various industries, including the computer industry. These representatives formed CODASYL-COnference of DAta SYstem Languages. In 1959 CODASYL introduced COBOL-for COmmon BusinessOriented Language.
The U.S. government offered encouragement by insisting that anyone attempting to win government contracts for computer-related projects had to use COBOL. The American National Standards Institute first standardized COBOL in 1968 and, in 1974, issued standards for another version known as ANSI-COBOL. After more than seven controversial years of industry debate, the standard known as COBOL 85 was approved, making COBOL a more usable modern-day software tool. The principal benefit of standardization is that COBOL is relatively machine independent- that is, a program written for one type of computer can be run with only slight modifications on another type for which a COBOL compiler has been developed.
COBOL is very good for processing large files and performing relatively simple business calculations, such as payroll or interest. A noteworthy feature of COBOL is that it is English-like-far more so than FORTRAN or BASIC. The variable names are set up in such a way that, even if you know nothing about programming, you can still understand what the program does. For example:
IF SALES-AMOUNT IS GREATER THAN SALES-QUOTA
COMPUTE COMMISSION = MAX-RATE * SALES-AMOUNT
ELSE
COMPUTE COMMISSION = MIN-RATE * SALES-AMOUNT.
Once you understand programming principles, it is not too difficult to add COBOL to your repertoire. COBOL can be used for just about any task related to business programming; indeed, it is especially suited to processing alphanumeric data such as street addresses, purchased items, and dollar amounts-the data of business. However, the feature that makes COBOL so useful-its English-like appearance and easy readability-is also a weakness because a COBOL program can be incredibly verbose. A programmer seldom knocks out a quick COBOL program. In fact, there is hardly such a thing as a quick COBOL program; there are just too many program lines to write, even to accomplish a simple task. For speed and simplicity, BASIC, FORTRAN, and Pascal are probably better bets.
As you can see in Figure 6, a COBOL program is divided into four parts called divisions. The identification division identifies the program by name and often contains helpful comments as well. The environment division describes the computer on which the program will be compiled and executed. It also relates each file of the program to the specific physical device, such as the tape drive or printer, that will read or write the file. The data division contains details about the data processed by the program, such as type of characters (whether numeric or alphanumeric), number of characters, and placement of decimal points. The procedure division contains the statements that give the computer specific instructions to carry out the logic of the program.
It has been fashionable for some time to criticize COBOL: It is old-fashioned, cumbersome, and inelegant. In fact, some companies, devoted to fast, nimble program development, are converting to the more trendy language C. But COBOL, with more than 30 years of staying power, is still famous for its clear code, which is easy to read and debug.
BASIC: For Beginners and Others
BASIC-Beginners' All-purpose Symbolic Instruction Code-is a common language that is easy to learn. Developed at Dartmouth College, BASIC was introduced by John Kemeny and Thomas Kurtz in 1965 and was originally intended for use by students in an academic environment. In the late 1960s it became widely used in interactive time-sharing environments in universities and colleges. The use of BASIC has extended to business and personal computer systems.
The primary feature of BASIC is one that may be of interest to many readers of this book: BASIC is easy to learn, even for a person who has never programmed before. Thus, the language is used often to train students in the classroom. BASIC is also used by non-programming people, such as engineers, who find it useful in problem solving. For many years, BASIC was looked down on by "real programmers," who complained that it had too many limitations and was not suitable for complex tasks. Newer versions, such as Microsoft's QuickBASIC, include substantial improvements. An example of a BASIC program and its output are shown in Figure 7.
Pascal: The Language of Simplicity
Named for Blaise Pascal, the seventeenth-century French mathematician, Pascal was developed as a teaching language by a Swiss computer scientist, Niklaus Wirth, and first became available in 1971. Since that time it has become quite popular, first in Europe and now in the United States, particularly in universities and colleges offering computer science programs.
The foremost feature of Pascal is that it is simpler than other languages -it has fewer features and is less wordy than most. In addition to the popularity of Pascal in college computer science departments, the language has also made large inroads in the personal computer market as a simple yet sophisticated alternative to BASIC. Over the years new versions have improved on the original capabilities of Pascal. Today, Borland's Turbo Pascal leads the Pascal world because its designers eliminated most of the drawbacks of the original Pascal. Turbo Pascal is used by the business community and is often the choice of nonprofessional programmers who need to write their own programs.
Ada: Named for the Countess
Is any software worth over $25 billion? Not any more, according to Defense Department experts. In 1974 the U.S. Department of Defense had spent that amount on all kinds of software for a hodgepodge of languages for its needs. The answer to this problem turned out to be a new language called Ada-named for Countess Ada Lovelace, "the first programmer" (see Appendix B). Sponsored by the Pentagon, Ada was originally intended to be a standard language for weapons systems, but it has also been used successfully for commercial applications. Introduced in 1980, Ada has the support not only of the defense establishment but also of such industry heavyweights as IBM and Intel, and Ada is even available for some personal computers. Although some experts have said Ada is too complex, others say that it is easy to learn and that it will increase productivity. Indeed, some experts believe that it is by far a superior commercial language to such standbys as COBOL and FORTRAN.
Widespread use of Ada is considered unlikely by many experts. Although there are many reasons for this (the military services, for instance, have different levels of enthusiasm for it), probably its size- which may hinder its use on personal computers-and complexity are the greatest barriers. Although the Department of Defense is a market in itself, Ada has not caught on to the extent that Pascal and C have, especially in the business community.
C, C++, Java, and Javascript
A language invented by Dennis Ritchie at Bell Labs in 1972, C produces code that approaches assembly language in efficiency while still offering high-level language features. C was originally designed to write systems software but is now considered a general-purpose language. C contains some of the best features from other languages, including Pascal. C compilers are simple and compact. A key attraction is that it is independent of the architecture of any particular machine, a fact that contributes to the portability of C programs. That is, a C program can be run on more than one type of computer after it has been compiled for that machine.
Although C is simple and elegant, it is not simple to learn. It was developed for gifted programmers, and the learning curve may be steep. Straightforward tasks may be solved easily in C, but complex problems require mastery of the language.
An interesting side note is that the availability of C on personal computers has greatly enhanced the value of personal computers for budding software entrepreneurs. A cottage software industry can use the same basic tool-the language C-used by established software companies such as Microsoft and Borland. Today C is has been replaced by its enhanced cousin, C++. C++ in turn is being challenged by web-aware languages like Java and Javascript, that look and act a lot like C++, but add features to support working with networked computers, among other things.