Computer Architecture

Practice⚓︎

Ch1&AppA⚓︎

Q1: (§1-6 Cost)⚓︎

Info

Many factors are involved in the price of a computer chip. Intel is spending $7 billion to complete its Fab 42 fabrication facility for 7 nm technology. In this case study, we explore a hypothetical company in the same situation and how different design decisions involving fabrication technology, area, and redundancy affect the cost of chips. The company will sell a range of chips from that factory, and it needs to decide how much capacity to dedicate to each chip. Imagine that it will sell two chips. Phoenix is a completely new architecture designed with 7 nm technology in mind, whereas RedDragon is the same architecture as their 10 nm BlueDragon. Imagine that RedDragon will make a profit of $15 per defect-free chip. Phoenix will make a profit of $30 per defect-free chip. Each wafer has a 450 mm diameter. Assume that the wafer yield is 0.9 and refer to Figure 1.26 for more detailed information.

Question (a)

How much profit do you make on each wafer of Phoenix chips?

Answer (a)

\[\begin{aligned} DieYeild = \frac{0.9}{(1+0.04 \times 2)^{14}} = 0.306 \end{aligned} \]

\[\begin{aligned} DiesPerWafer = \frac{\pi \times 225^{2}}{200} - \frac{\pi \times 450}{\sqrt[2]{2 \times200}} =724.53 \approx 724 \end{aligned} \]

\[\begin{aligned} GoodDiesPerWafer = 724 \times 0.306 = 221.544 \approx 221 \end{aligned} \]

\[\begin{aligned} Profit = 30 \times 221 = 6630(\$) \end{aligned} \]

Question (b)

How much profit do you make on each wafer of RedDragon chips?

Answer (b)

\[\begin{aligned} DieYeild = \frac{0.9}{(1+0.04 \times 1.2)^{14}} = 0.467 \end{aligned} \]

\[\begin{aligned} DiesPerWafer = \frac{\pi \times 225^{2}}{120} - \frac{\pi \times 450}{\sqrt[2]{2 \times 120}} = 1234.1 \approx 1234 \end{aligned} \]

\[\begin{aligned} GoodDiesPerWafer = 1234 \times 0.467 = 576.278 \approx 576 \end{aligned} \]

\[\begin{aligned} Profit = 15 \times 576 = 8640(\$) \end{aligned} \]

Question (c)

If your demand is 50,000 RedDragon chips per month and 25,000 Phoenix chips per month, and your facility can fabricate 100 wafers a month, how many wafers should you make of each chip? Why?

Answer (c)

從answer(a)(b)得知RedDragoen wafer每片利潤8640元，比起Phoenix wafer每片利潤6630元高，所以優先做RedDragon chip。

\[\begin{aligned} RedDragonChips = \frac{50000}{576} = 86.8 \approx 87 \end{aligned} \]

\[\begin{aligned} PhoenixChips = 100 - 87 = 13 \end{aligned} \]

從上述得知，用來製作RedDragon chips的wafer共87片，而用來製作Phoenix chips的wafer共13片。

Q2: (§1-5 Power and Energy)⚓︎

Info

A cell phone performs very different tasks, including streaming music, streaming video,and reading email. These tasks perform very different computing tasks. Battery life and overheating are two common problems for cell phones, so reducing power and energy consumption are critical. In this problem, we consider what to do when the user is not using the phone to its full computing capacity. For these problems, we will evaluate an unrealistic scenario in which the cell phone has no specialized processing units. Instead, it has aquad-core, general-purpose processing unit.

Question (a)

How much dynamic energy and power are required compared to running at full power? First, suppose that the quad-core operates for 1/8 of the time and is idle for the rest of the time. That is, the clock is disabled for 7/8 of the time, with no leakage occurring during that time. Compare total dynamic energy as well as dynamic power while the core is running.

Answer (a)

Since the capacitance is constant, and the operating voltage remains unaffected, the power of thesystem is still the same. However, the clock is only run for an eighth of the time, leading to an eighth ofthe energy consumption.

Question (b)

How much dynamic energy and power are required using frequency and voltage scaling? Assume frequency and voltage are both reduced to 1/8 the entire time.

Answer (b)

\[\begin{aligned} Energy_{dynamic} \propto CV^{2} \end{aligned} \]

\[\begin{aligned} Power_{dynamic} \propto \frac{1}{2}CV^{2}F \end{aligned} \]

\[\begin{aligned} \frac{Energy_{new}}{Energy_{old}} = \frac{1}{8^{2}} = \frac{1}{64} \end{aligned} \]

\[\begin{aligned} \frac{Power_{new}}{Power_{old}} = \frac{1}{8^{3}} = \frac{1}{512} \end{aligned} \]

Question (c)

Now assume the voltage may not decrease below 50% of the original voltage. This voltage is referred to as the voltage floor, and any voltage lower than that will lose the state. Therefore, while the frequency can keep decreasing, the voltage cannot. Assume that the frequency is reduced to 1/8 of the origin, what are the dynamic energy and power savings in this case?

Answer (c)

\[\begin{aligned} Energy_{dynamic} \propto CV^{2} \end{aligned} \]

\[\begin{aligned} Power_{dynamic} \propto \frac{1}{2}CV^{2}F \end{aligned} \]

\[\begin{aligned} \frac{Energy_{new}}{Energy_{old}} = \frac{1}{2^{2}} = \frac{1}{4} \end{aligned} \]

\[\begin{aligned} \frac{Power_{new}}{Power_{old}} = \frac{1}{2^{2} \times 8} = \frac{1}{32} \end{aligned} \]

Question (d)

How much energy is used with a dark silicon approach? This involves creating specialized ASIC hardware for each major task and power gating those elements when not in use. Only one general-purpose core would be provided, and the rest of the chip would be filled with specialized units. For email, the one core would operate for 25% the time and be turned completely off with power gating for the other 75% of the time. During the other 75% of the time, a specialized ASIC unit that requires 20% of the energy of a core would be running. Please compare the energy requirement of this dark silicon approach against the original quad-core general-purpose processor.

Answer (d)

\[\begin{aligned} Energy_{NewProcessor} &= Energy_{NewProcessor} + Energy_{ASIC} \\&= 0.25 \times 0.25 \times Energy_{old} + 0.75 \times 0.25 \times 0.2 \times Energy_{old} \\&= 0.1Energy_{old} \end{aligned} \]

Q3: (§1-7 Dependability)⚓︎

Info

Availability is the most important consideration for designing servers, followed closely by scalability and throughput.

Question (a)

We have a single processor with a failure in time (FIT) of 200. What is the mean time to failure (MTTF) for this system?

Answer (a)

Failures in time (FIT) is traditionally reported as failure per billion ($1 × 10^{9}$) hours of operation.

\[\begin{aligned} MTTF = \frac{10^{9}}{200} = 5 \times 10^{6}(hours) \end{aligned} \]

Question (b)

If it takes one day to get the system running again, what is the availability of the system?

Answer (b)

\[\begin{aligned} MTTR = 24(hours) \end{aligned} \]

\[\begin{aligned} AvailabilityOfTheSystem = \frac{5 \times 10^{6}}{24+ 5 \times 10^{6}} = 0.999995 \end{aligned} \]

Question (c)

Imagine that the government, to cut costs, is going to build a supercomputer out of inexpensive computers rather than expensive, reliable computers. What is the MTTF for a system with 500 processors? Assume that if one fails, they all fail.

Answer (c)

\[\begin{aligned} FIT_{500processors} = 500 \times 200 = 10^{5} \end{aligned} \]

\[\begin{aligned} MTTF = \frac{10^{9}}{10^{5}} = 10^{4}(hours) \end{aligned} \]

Q4: (§1-9 Processor Performance & Amdahl’s Law)⚓︎

Info

A 2-GHz processor was used to execute a benchmark program with the instruction mix and clock cycle counts as shown in the following table：

Question (a)

Assume that the total number of instructions executed is $6 \times 10^{6}$, determine the effective CPI, and execution time of this program.

Answer (a)

\[\begin{aligned} EffectiveCPI = 1 \times 15\% + 17 \times 30\% + 5 \times 45\% + 3 \times 10\% = 7.8(cycle) \end{aligned} \]

\[\begin{aligned} ExecutionTime_{program} &= NumOfInstructions \times EffectiveCPI \times \frac{1}{Frequency} \\&= 6 \times 10^{10} \times 7.8 \times \frac{1}{2 \times 10^{9}} = 234(s) \end{aligned} \]

Question (b)

Assume that a design enhancement is to reduce the CPI of the FP operations to 5 with 20% lengthening of the clock cycle time. Determine the effective CPI of the enhancement and the speedup of the enhancement to the original design.

Answer (b)

\[\begin{aligned} EffectiveCPI = 1 \times 15\% + 5 \times 30\% + 5 \times 45\% + 3 \times 10\% = 4.2(cycle) \end{aligned} \]

\[\begin{aligned} ExecutionTime_{program} &= NumOfInstructions \times EffectiveCPI \times \frac{1}{Frequency} \\&= 6 \times 10^{10} \times 4.2 \times \frac{1.2}{2 \times 10^{9}} = 151.2(s) \end{aligned} \]

\[\begin{aligned} SpeedUp = \frac{234}{151.2} = 1.548 \end{aligned} \]

Question (c)

Assume that we build an optimizing compiler to discard 2/3 of the Load/Store operations from the original instruction mix. Determine the fraction of the enhancement, and calculate the speedup of the enhancement to the original design using Amdahl’s law.

Answer (c)

\[\begin{aligned} Fraction_{Enhancement} = \frac{5 \times 45\%}{1 \times 15\% + 17 \times 30\% + 5 \times 45\% + 3 \times 10\%} = 0.288 \end{aligned} \]

\[\begin{aligned} SpeedUp &= \frac{1}{(1 - 0.288) + \frac{0.288}{3}} \\&= \frac{1}{0.808} = 1.238 \end{aligned} \]

Q5: (§A-7 Instruction Encoding)⚓︎

Info

For the following, we consider instruction encoding for instruction set architectures.

Question (a)

Consider the case of a processor with an instruction length of 14 bits and with 64 general-purpose registers so the size of the address fields is 6 bits. Is it possible to have instruction encodings for the following?

3 two-address instructions
63 one-address instructions
45 zero-address instructions

Answer (a)

題目說明了每個address都要用6 bits編碼。

\[\begin{aligned} 3TwoAddressInstructions &= \lceil log_23 \rceil + 6 \times 2(Instructions) = 14(bits) \\ 63OneAddressInstructions &= \lceil log_23 \rceil + \lceil log_263 \rceil + 6 \times 1(Instructions) = 14(bits) \\ 45ZeroAddressInstructions &= \lceil log_23 \rceil + \lceil log_263 \rceil +\lceil log_245 \rceil + 6 \times 0(Instructions) = 14(bits) \end{aligned} \]

從上述運算可以看出，這三個的最短編碼都不超過14bits，所以這的編碼是可以實現的。

Question (b)

Assuming the same instruction length and address field sizes as above, determine if it is possible to have

3 two-address instructions
65 one-address instructions
35 zero-address instructions

Explain your answer.

Answer (b)

題目說明了每個address都要用6 bits編碼。

\[\begin{aligned} 3TwoAddressInstructions &= \lceil log_23 \rceil + 6 \times 2(Instructions) = 14(bits) \\ 63OneAddressInstructions &= \lceil log_23 \rceil + \lceil log_265 \rceil + 6 \times 1(Instructions) = 15(bits) \end{aligned} \]

從上述運算可以看出會超過14bits，所以這個的編碼是不可以實現的。

Question (c)

Assume the same instruction length and address field sizes as above. Further assume there are already 3 two-address and 24 zero-address instructions. What is the maximum number of one-address instructions that can be encoded for this processor?

Answer (c)

\[\begin{aligned} min(14 - 2 - 5 , 6) &= 6 \\ 2^{6} - 1 &= 63(instructions) \end{aligned} \]

Question (d)

Assume the same instruction length and address field sizes as above. Further assume there are already 3 two-address and 65 zero-address instructions. What is the maximum number of one-address instructions that can be encoded for this processor?

Answer (d)

\[\begin{aligned} \lceil log_265 \rceil - 6 &= 1 \\ 2^6 - 2^1 &= 62(Instructions) \end{aligned} \]

Q6: (§A-2 ISA classes)⚓︎

Info

Your task is to compare the memory efficiency of four different styles of instruction set architectures. The architecture styles are:

Accumulator—All operations occur between a single register and a memory location.
Memory-memory—All instruction addresses reference only memory locations.
Stack—All operations occur on top of the stack. Push and pop are the only instructions that access memory; all others remove their operands from the stack and replace them with the result. The implementation uses a hardwired stack for only the top two stack entries, which keeps the processor circuit very small and low in cost. Additional stack positions are kept in memory locations, and accesses to these stack positions require memory references.
Load-store—All operations occur in registers, and register-to-register instructions have three register names per instruction.
To measure memory efficiency, make the following assumptions about all four instruction sets:
All instructions are an integral number of bytes in length.
The opcode is always one byte (8 bits).
Memory accesses use direct, or absolute, addressing.
The variables A, B, C, and D are initially in memory.

Question (a)

Invent your own assembly language mnemonics (Figure A.2 provides a useful sample to generalize), and for each architecture write the best equivalent assembly language code for this high-level language code sequence:

Text Only

A =B+C;
B =A+C;
D =A–B;

Answer (a)

Question (b)

In your assembly codes for part (a), point out the following information.

Where a value is loaded from memory after having been loaded once?
Where the result of one instruction is passed to another instruction as an operand?
Are the above events in 2. involving the storage within the processor or in memory?

Answer (b)

Accumulator: register Stack: stack Memory-memory: 不會load Register-register: register
Accumulator: memory Stack: memory Memory-memory: memory Register-register: register
Accumulator: 運算在stack(processor)，用memory傳值 Stack: 運算在stack(processor)，用memory傳值 Memory-memory: 都在memory Register-register: 都在register

Question (c)

Assume that the given code sequence is from a small, embedded computer application that uses a 16-bit memory address and data operands. If a load-store architecture is used, assume it has 16 general-purpose registers. For each architecture answer the following questions: How many instruction bytes are fetched? How many bytes of data are transferred from/to memory? Which architecture is most efficient as measured by total memory traffic (code+data)?

Answer (c)