Testing Basics

Basics: In this chapter we will learn the details of testing a IC during and after fabrication. We will also learn that some of the chips which are sold with different speeds such as 2.6 GHz or 2.8 GHz may come from the same wafer. At the end of manufacturing line, each chip will be thoroughly tested. There is no one test which says that the chip is functioning or not functioning. We know that a chip contains millions of transistors. These transistors are not tested one by one, but instead the chip is divided into blocks and each block is tested thoroughly.  For example a processor chip may be divided into few blocks called memory, logic, timer or clock. And in few other chips there is also a part called analog. This logic block or the memory block can also be subdivided into smaller blocks and tested separately. We can use an example to understand the testing and sorting or separating process. Consider a T-shirt manufacturing facility for a particular design. In this facitly, for a particular design,  large, medium and small T-shirts may be made. The large will need a lot of cloth, medium, less amount of cloth and the small, the least cloth. Hence the large will be sold at a higher price, the medium at intermediate price and the small at a lowest price. If thousands of large sized shirts are made, all of them will be checked for quality and the ones with slight damage may be sold under discounted price. Similarly, quality check will be done for the small and the medium sized T-shirts. The shirts which are passing all the tests (i.e. without even the slightest damage) will be sold at high price; the ones with slight damage will be sold at discounted price. If any shirt has lot of damage, it has to be thrown away. For example, if a large size T-shirt is severely damaged, it cannot be sold as a medium size T-shirt (even though the medium size T-shirt is lower priced). A “poor quality large sized T-shirt” cannot be sold as an “acceptable quality medium sized T-shirt”.  Next, if we find that, on a particular day, many T-shirts are damaged, then we have to find out the cause of this. Then we can take necessary action to eliminate or at least minimize the damage.

Now let us consider testing in the context of IC manufacturing. A company such as Intel or AMD makes processor chips. They are sold with specifications such as a speed of 3 GHz, 3.2 GHz or 2.8 GHz. A chip may be designed to run at 3.2 GHz. During testing, if it runs as per design, it will be sold as 3.2 GHz. If it does not run well but it is able to run at 3GHz, then the same chip will be sold as a 3 GHz. It is equivalent to selling as slightly damaged T-shirt, but still of acceptable quality, at the discounted price. But remember that one cannot sell a severely damaged T-shirts. Similarly if the 3.2 GHz chip does not work at 3.2 or 3, then the manufacturing company cannot sell it as 2.8 GHz chip. It has to thrown away.  The company will have a separate design for a chip running at 2.8 GHz. There good chips will be sold as 2.8 GHz,  and slightly poorer quality will be sold as 2.6  GHz and anything which does not satisfy either one speed will be thrown away. Please note that the speed values in GHz are given as examples. The manufacturing companies may not reveal the details. Now the good chips can be packaged and sold. What is the use of the severely damaged chip? While one cannot use it as a processor, one can analyze it and find the cause of the failure.  In the T-shirt analogy, if we find that the damage occur mainly one area, then we can go to the section that area is manufactured, monitor it and find out the source of the problem. Similarly, in the failed chips, we have to find whether any particular block is failing and if it is a design problem or a manufacturing problem (i.e. if some process is not run correctly).

Failure Analysis:  
 In an IC manufacturing facility, there is a separate section called “failure analysis” (FA) section. There, very sophisticated instruments are used to remove the materials from the failed chips layer by layer and find the exact location of the failure. To do FA, it may take one or two days for a single chip. Thus, many chips cannot be analyzed by this method and just by analyzing few chips, one has to find the major cause of failure.

Scribe line (Parametric testing)
Will all the wafers and chips go through the complete testing? The answer is “no”. Here again an example will help explain the logic behind this answer.  Assume that you go to a shop to buy a computer. Let us assume that all the computers are at the table and switched off and that you do not have a firm idea on the model that you want to buy. How would one make a decision about the purchase? First you will switch on the computer, run perhaps an application, or play a movie, or connect to internet and open a website. (You will also look into the specifications and then decide). We can call the initial stages as the superficial testing. If the computer or the monitor or the keyboard looks broken or old, you will not test it at all because you will assume that if it is broken at the outside, it is probably not in good condition. In case you see that all the computers in display look old or broken, even if they are functioning, you will not evaluate the computers there.  You will move on to the next shop.

In the same way, for microelectronic chips, a superficial testing will be done at the wafer level. For this purpose, a few transistors and a few resistors are created on the sides of the chips (i.e. between the chips). In the lithography section, we learnt that in a mask there are many chips (Fig 2.18) and that in the gaps between the chips and at the corners, alignment marks are placed In the remaining space between the chips, test structures are used. These are sometimes called kerf or scribe lines and these tests are called parametric tests or process tests. These structures are frequently tested during the process. If these structures fail, then the complete test on the chips will not be done. A ‘batch of wafers’ may contain 25 or 10 or 13 wafers (depending on the size of the wafer. In case of 200 mm wafer, 25 wafers are called a batch. In 300 mm wafers, 10 or 13 wafers are called a batch). If the first two or three wafers are tested parametrically, and if all of them fail, the remaining wafers will not be even tested parametrically or completely. The testing equipment is extremely expensive and there are limited numbers of equipment available in each fab. In order to test each chip and each wafer completely, it takes a significant amount of time. Thus if superficial testing (parametric testing) shows that the structure are likely to be severely damaged,  the company will not spend  much time and use expensive testing equipment on each and every individual chips because they are likely to be failing chips. The parametric testing is done frequently (e.g. after  the transistors are made,  then  again M1 is made, perhaps after M2 is made and at the end of the manufacturing steps, just before complete test)
Shorts and opens:
The basic tests in the scribeline structures consist of checks for shorts and open. “Shorts” means two metal lines which are supposed to be separated electrically are actually short circuited. “Open” means lines which are supposed to be electrically connected are actually disconnected. The following figures shows the examples of the electrical structures used to check for this. This are commonly called snake or comb and these are made at each metal layer (e.g. poly silicon snakes, M1 snakes or  combs, but not via or contact). Only a few such structures are made, which means the ability to estimate the failure is also low. i.e. The process may actually be able to make these simple structures easily but it may not be able to make  all the complicated structures in the actual chips fully. But since it is only a superficial test, this estimate is sufficient. A snake structure consists of a single metal line with winds like a snake and this is used for testing for opens (Fig 10.1 (a)). If the line is not continuous because of some problem in the process, then the resistance will be too high or it can be even infinity (if the line is completely broken). There might be tens or hundreds of the test structures and if, 99 out 100 pass, then the quality is considered as ‘acceptable’. If 10 or more out of 100 fail, then the conclusion will be that the process is not good; the logic is, if even for these simple structures 10 out of 100 fail, then we try to make complicated structures (in the actual chip), most likely most of the chips will fail.

Description: Description: https://nptel.ac.in/courses/103106075/Courses/images/10.1.webp

                                     Fig. 10.1. Test structure (a) Snake (b) Comb

To measure the shorts, the structure shown in Fig 10.1 (b) is used. This looks like a comb kept next to another comb, and this whole structure is frequently called comb structure. If the processes are run correctly, then structure will be made properly and there will not be any shorts (current leakage) between these two combs. If there is any short circuit, then there will be leakage currents between these combs. In the typical fab, these tests will be conducted and the results will be plotted in a particular format, for example as a cumulative density function plot (CDF).  These plots are helpful in identifying even a small number of failures. Logarithmic scale is used to identify very small levels such as ppm or ppb levels of defects. When these analyses are performed for test results corresponding to individual modules (such as M1 or M2), they also help in identifying where a problem lies.
Description: Description: https://nptel.ac.in/courses/103106075/Courses/images/10.2.webp
Fig 10.2 Sample CDF plot of leakage current of a structure. If 100 structures are tested, a few may fail (i.e. exhibit high  current)

If we know that a chip fails, then it is not sufficiently detailed information. If we know that the chips fails at M1 level for shorts, then it helps in identifying likely sources of the problem and in taking remedial action to correct this. Few other structures are also used in the parametric or the process test. One is called VDP, which is an acronym for Van-der-Pauw. Using this structure, the thickness or the sheet resistance of a deposited material can be determined. Using additional structures, even the width of a line can be estimated. The resistance of a metal line is normally expressed in terms of the resistivity of the material and the length and the cross sectional area of the wire. However in semiconductor processing, we learnt that sheet resistance is used more frequently. At M1, the metal thickness is same for all the lines but the lengths and widths vary. Similarly at M2, they have a different thickness compared to M1 but among all the M2lines, the thickness is the same (but the width and the length vary again). Thus it is clear that in the semiconductor processing the thickness is the same for the given layer, but the width and the length vary.
 Description: Description: https://nptel.ac.in/courses/103106075/Courses/images/10.3.webp
Fig. 10.3.  Schematic of a Greek cross (a Van-der-Pauw structure). (REDRAW THE ABOVE FIGURE)
A  VDP structure, called Greek cross, is shown in Fig. 10.3. In this, the current is passed between A and B terminals while the potential is measured between terminals C and D.  The sheet resistance is calculated as  Description: Description: https://nptel.ac.in/courses/103106075/Courses/10_2_clip_image010.webp
Complete test: The complete testing program looks similar to a C program or a script. The testing programs are written in their own language, but most of them are easy to read. They are also normally well commented and will have the information such as the name of the test, the voltage applied, and the device under test (DUT). Apart from the test information, they also have information on the alignment and the x-y movement of the tester. The test pins (wires to pass potential or current to the chip) must be properly aligned. After a chip is tested, the wafer must be moved slightly so that the next chip is placed correctly below the pins. Thus the alignment and x-y movement information are also important. Some of the test may be capacitance-voltage (CV) test for oxide or current-voltage (IV) test. These types of tests are very time consuming and are not done frequently. Most of the tests done on modern chips are digital tests. Some of them are usually referred to by acronyms such as TDDB (temperature dependence dielectric break down), NBTI (negative bias temperature instability) and IDDQ (quiescent supply current) and so on.

Binning  (Sort test) : The tested chips (also called dies) are sorted or separated based on the failure modes. This process is called binning or sorting. Remember that a die (i.e. chip) will have blocks such as memory, scan, clock etc. In the production stage, the block which tends to fail more will always be tested first. If it fails, there is no use in testing the chip further. Hence, once a chip fails, it will not be tested further.  Based on the block where the failure occurs, and the type of failure, it will be marked. One can imagine that it is thrown into a waste bin or waste bucket; different buckets would be available and each bucket will correspond to a block and a failure mode. There may be one bucket for chips failing in memory. There may be a second bucket for chips failing in the scan region and a third bucket for chips having opens in the analog region. There may be a fourth bucket for a chip failing in the clock region. Even within the buckets, there may be different compartments. Within the scan region, if it fails for too much current, it may go to one compartment, and if it fails for too little current, it may go to another compartment. If a chip passes all the tests, then it is thrown in a ‘good chip’ bucket (i.e. bin).

Description: Description: https://nptel.ac.in/courses/103106075/Courses/images/10.4-falling-chips.webp 
Fig. 10.4 Bin test (or sort test). A simplified illustration


However in practice, none of the chips are physically thrown in buckets. Instead, a computer program will keep track of the mode of failure of each. This process is called binning, because one can imagine that the chips are thrown in different buckets or bins. It we say, it fails in scan region, then we know overall it has gone into the scan bucket. If we also know the compartment within the bucket it has gone into, then we have more detailed data. The buckets are called hard bin and the bucket with compartment information is called soft bin. Hard bin means we know the overall reason about the failure. Soft bin means we also know the compartment in which it has failed or compartment in which it has been placed (i.e. we know the detailed information on the  failure).  The binning of failing chips helps to determine the block which fails frequently.  Based on the data, the engineer will try to determine the cause of the issue and make the improvements necessary. The actual troubleshooting requires in depth analysis and a lot of experience. Typically, a fab will have a team of experts to perform the ‘yield analysis’ and suggest improvements.

Typical test sequences: Normally the very first tests are the continuity tests to test the short and the open, which will catch the problem if there is a severe shortage or severe open issue. The next test is called built-in-self-test or BIST.  Third is typically the sequence of tests which looks into the logic area (also called functional area) and these tests are usually called scan. Most of the chips now have built in memory or embedded memory. The testing for memory is different from the testing for logic in one sense. That is the memory chip can be repaired if there is a failure and the repair will improve the yield dramatically. The memory contains lot of identical circuits called bits. If many bits fail in a memory part, the location of all failing bits can be marked as a pattern. This is called failed bit map (FBM). FBM can help identify the problem in a process, without actually doing failure analysis.