Question Set 23

6. What is Interrupt latency? How do you measure interrupt latency? How to reduce the interrupt latency?

Answer: Interrupt latency is the time between interrupt request and execution of first instruction of the ISR.

We need a oscilloscope or a logic state analyzer. By entering the interrupt service routine (ISR), you need to activate an available port on your hardware (like a led port or so on) and deactivate it just before returning from the ISR. You can do that by writing the appropriate code.

By connecting one input of the oscilloscope (or logic state analyzer) to the INTR pin of the microprocessor and the second one to the port you activate/deactivate, you can measure the latency time and the duration of the ISR

Causes of interrupt latencies

· The first delay is typically in the hardware: The interrupt request signal needs to be synchronized to the CPU clock. Depending on the synchronization logic, typically up to 3 CPU cycles can be lost before the interrupt request has reached the CPU core.

· The CPU will typically complete the current instruction. This instruction can take a lot of cycles; on most systems, divide, push-multiple or memory-copy instructions are the instructions which require most clock cycles. On top of the cycles required by the CPU, there are in most cases additional cycles required for memory access. In an ARM7 system, the instruction STMDB SP!,{R0-R11,LR}; Push parameters and perm. Registers is typically the worst case instruction. It stores 13 32-bit registers on the stack. The CPU requires 15 clock cycles. The memory system may require additional cycles for wait states.

· After completion of the current instruction, the CPU performs a mode switch or pushes registers (typically PC and flag registers) on the stack. In general, modern CPUs (such as ARM) perform a mode switch, which requires less CPU cycles than saving registers.

· Pipeline fill: Most modern CPUs are pipelined. Execution of an instruction happens in various stages of the pipeline. An instruction is executed when it has reached its final stage of the pipeline. Since the mode switch has flushed the pipeline, a few extra cycles are required to refill the pipeline.

7. Vonneuman and harvard architecture differences?

Answer: The name Harvard Architecture comes from the Harvard Mark I relay-based computer. The most obvious characteristic of the Harvard Architecture is that it has physically separate signals and storage for code and data memory. It is possible to access program memory and data memory simultaneously. Typically, code (or program) memory is read-only and data memory is read-write. Therefore, it is impossible for program contents to be modified by the program itself.

The vonneumann Architecture is named after the mathematician and early computer scientist John von Neumann. von Neumann machines have shared signals and memory for code and data. Thus, the program can be easily modified by itself since it is stored in read-write memory.

Harvard architecture has separate data and instruction busses, allowing transfers to be performed simultaneously on both buses. Von Neumann architecture has only one bus which is used for both data transfers and instruction fetches, and therefore data transfers and instruction fetches must be scheduled – they cannot be performed at the same time.

It is possible to have two separate memory systems for a Harvard architecture. As long as data and instructions can be fed in at the same time, then it doesn’t matter whether it comes from a cache or memory. But there are problems with this. Compilers generally embed data (literal pools) within the code, and it is often also necessary to be able to write to the instruction memory space, for example in the case of self modifying code, or, if an ARM debugger is used, to set software breakpoints in memory. If there are two completely separate, isolated memory systems, this is not possible. There must be some kind of bridge between the memory systems to allow this.

Using a simple, unified memory system together with a Harvard architecture is highly inefficient. Unless it is possible to feed data into both buses at the same time, it might be better to use a von Neumann architecture processor.

Use of caches

At higher clock speeds, caches are useful as the memory speed is proportionally slower. Harvard architectures tend to be targeted at higher performance systems, and so caches are nearly always used in such systems.

Von Neumann architectures usually have a single unified cache, which stores both instructions and data. The proportion of each in the cache is variable, which may be a good thing. It would in principle be possible to have separate instruction and data caches, storing data and instructions separately. This probably would not be very useful as it would only be possible to ever access one cache at a time.

Caches for Harvard architectures are very useful. Such a system would have separate caches for each bus. Trying to use a shared cache on a Harvard architecture would be very inefficient since then only one bus can be fed at a time. Having two caches means it is possible to feed both buses simultaneously….exactly what is necessary for a Harvard architecture.

This also allows to have a very simple unified memory system, using the same address space for both instructions and data. This gets around the problem of literal pools and self modifying code. What it does mean, however, is that when starting with empty caches, it is necessary to fetch instructions and data from the single memory system, at the same time. Obviously, two memory accesses are needed therefore before the core has all the data needed. This performance will be no better than a von Neumann architecture. However, as the caches fill up, it is much more likely that the instruction or data value has already been cached, and so only one of the two has to be fetched from memory. The other can be supplied directly from the cache with no additional delay. The best performance is achieved when both instructions and data are supplied by the caches, with no need to access external memory at all.

This is the most sensible compromise and the architecture used by ARMs Harvard processor cores. Two separate memory systems can perform better, but would be difficult to implement

8. RISC and CISC differences?

Answer:

CISC: (Complex Instruction Set Computer)

Eg: Intel and AMD CPU’s

· CISC chips have a large amount of different and complex instructions.

· CISC chips are relatively slow (compared to RISC chips) per instruction, but use little (less than RISC) instructions.

· CISC architecture is to complete a task in as few lines of assembly as possible. This is achieved by building processor hardware that is capable of understanding and executing a series of operations.

· In CISC, compiler has to do very little work to translate a high-level language statement into assembly. Because the length of the code is relatively short, very little RAM is required to store instructions. The emphasis is put on building complex instructions directly into the hardware.

· When executed, this instruction loads the two values into separate registers, multiplies the operands in the execution unit, and then stores the product in the appropriate register.

RISC: (Reduced Instruction Set Computer)

Eg: Apple, ARM processors

· Fewer, simpler and faster instructions would be better, than the large, complex and slower CISC instructions. However, more instructions are needed to accomplish a task.

· RISC chips require fewer transistors, which makes them easier to design and cheaper to produce.

· it’s easier to write powerful optimized compilers, since fewer instructions exist.

· RISC is cheaper and faster.

· RISC puts a greater burden on the software. Software needs to become more complex. Software developers need to write more lines for the same tasks. Therefore they argue that RISC is not the architecture of the future, since conventional CISC chips are becoming faster and cheaper

· Simple instructions that can be executed within one clock cycle.

· “MULT” command described above could be divided into three separate commands: “LOAD,” which moves data from the memory bank to a register, “PROD,” which finds the product of two operands located within the registers, and “STORE,” which moves data from a register to the memory banks.

· At first, this may seem like a much less efficient way of completing the operation. Because there are more lines of code, more RAM is needed to store the assembly level instructions. The compiler must also perform more work to convert a high-level language statement into code of this form.

· Separating the “LOAD” and “STORE” instructions actually reduces the amount of work that the computer must perform.

· Major problem of RISC – they don’t afford the widespread compatibility, that x86 chips do.

	CISC		RISC
	Emphasis on hardware		Emphasis on software
	Includes multi-clock complex instructions		Single-clock, reduced instruction only
	Memory-to-memory: “LOAD” and “STORE” incorporated in instructions		Register to register: “LOAD” and “STORE” are independent instructions
	Small code sizes, high cycles per second		Low cycles per second, large code sizes
	Transistors used for storing complex instructions		Spends more transistors on memory registers
CISC		RISC
Complex instructions require multiple cycles		Reduced instructions take 1 cycle
Many instructions can reference memory		Only Load and Store instructions can reference memory
Instructions are executed one at a time		Uses pipelining to execute instructions
Few general registers		Many general registers

9. What are the startup code steps?

Answer:

1. Disable all the interrupts.

2. Copy and initialized data from ROM to RAM.

3. Zero the uninitialized data area.

4. Allocate space and for initialize the stack.

5. Initialize the processor stack pointer

6. Call main

10. What are the booting steps for a CPU?

Answer:

· The power supply does a self check and sends a power-good signal to the CPU.

The CPU starts executing the code stored in ROM on the motherboard starts the address 0xFFFF0.

· The routines in ROM test the central hardware, search for video ROM, perform a checksum on the video ROM and executes the routines in video ROM.

· The routines in the mother board ROM then continue searching for any ROM, checksum and executes these routines.

· After performing the POST (Power On-Self Test) is executed. The system will search for a boot device.

· Assuming that the valid boot device is found, IO.SYS is loaded into memory and executed. IO.SYS consists primarily of initialization code and extension to the memory board ROM BIOS.

· MSDOS.SYS is loaded into memory and executed. MSDOS.SYScontains the DOS routines.

· CONFIG.SYS (created and modified by the user. load additional device drivers for peripheral devices), COMMAND.COM (It is command interpreter- It translates the commands entered by the user. It also contains internal DOS commands. It executes and AUTOEXEC.BAT), AUTOEXEC.BAT (It contains all the commands that the user wants which are executed automatically every time the computed is started).