Grand-Quiz-style worked examples

Step-by-step walkthroughs of the GQ folder. Use these as templates for the real MCQs — the numbers change but the recipe doesn't.

On this page

1. Stack-push virtual addresses

Q (GQ Q6). RISC-V moves through addresses in multiples of 4. sw s1, -4(sp), sw s1, -8(sp), sw s1, -12(sp) — which addresses get touched?

If sp = 0x7FFFE000 (typical top-of-stack), then the effective addresses are:

sp − 4  = 0x7FFFDFFC
sp − 8  = 0x7FFFDFF8
sp − 12 = 0x7FFFDFF4

✓ Answer 1: 0x7FFFDFFC, 0x7FFFDFF8, 0x7FFFDFF4

Trick: every sw writes 4 bytes, so the offsets are spaced 4 bytes apart. The pattern descends by 4 each time.

2. Cumulative size of a 4-segment VAS

Q (Q29). Page size = 2 Kiword. Virtual address space has text + data + heap + stack. Total bytes?

1 segment  = 1 page = 2 × 1024 words
4 segments = 8 × 1024 words

Word size  = 4 bytes  ← RISC-V word
Total      = 8 × 1024 × 4 bytes
           = 2^3 × 2^10 × 2^2
           = 2^15 bytes

✓ Answer 2: 215 Bytes.

3. Multi-level page table — count of levels needed

Q (Q0(f)). 232-byte virtual space, 8 MiB physical. Page size = Y KiB (Y∈{1,2,4,8}, pick from your ERP). Each PT level i has 2Li entries. Find minimum number of levels.

Recipe:

  1. Compute VPN bits = 32 − log2(Y · 1024).
  2. Sum the level widths L0 + L1 + … until ≥ VPN bits.

Example (ERP 32460, Y=8 ⇒ page = 8 KiB):

VPN bits   = 32 − log2(8 × 1024) = 32 − 13 = 19

Level-0 = 2^10·X entries.  X = last digit of ERP = 0 → replace 0 with 8 → X=8.
       ↓ Actually rule says X = 2^(10·X) — re-read: 2^{10X} entries.
       For X=0: replaced by 8 ⇒ 2^8 entries → uses 8 VPN bits
Level-1 = 2^Y = 2^6 entries → 6 VPN bits        (Y = 2nd-last digit = 6)
Level-2 = 2^Z = 2^4 entries → 4 VPN bits        (Z = 3rd-last digit = 4)

Running total = 8 + 6 + 4 = 18 bits ≠ 19 ❌

Add Level-3 = 2^1 entry → 19 bits ✓

⇒ 4 levels needed.

Important catch: when a digit is 0, the question says replace it with 8.

4. 2-bit predictor over a 100-trip loop

Q (Q9). Initial state = "Weakly Not Taken" (01).

addi x2, x0, 100
loop:
    addi x2, x2, -1
    bne  x2, x0, loop    # 99 takens, 1 not-taken at the end

Trace:

IterState enteringPredictActualMispredict?Next state
1Weakly NT (01)NTTYESWeakly T (10)
2Weakly T (10)TTnoStrongly T (11)
3..99Strongly T (11)TTno × 97Strongly T (11)
100Strongly T (11)TNTYESWeakly T (10)

✓ Answer: 2 mispredictions.

5. When does the ROB stall issue?

Q (Q10 — common variants).

Variant A — 4-wide, ROB = 96, PRF = 64, ARF = 32

ROB limit:   96 entries / 4 per cycle = 24 cycles
PRF limit:   (64 − 32) free physregs / 4 per cycle = 8 cycles

Whichever is smaller stalls first. If the question only mentions the ROB (not the physregs), use 96/4 = 24.

✓ GQ Q8 (96 ROB, 64 PRF, 32 ARF, 3 inst/cycle) → 96/3 = 32 cycles.

Variant B — same but issue 3/cycle

ROB stalls at 96 / 3 = 32 cycles

6. Cache config from a miss-rate fingerprint

Q (Q3). 16-word cache, LRU, conflict miss-rate = 7/14 = 50% on this byte-address sequence:

74  A0  78  38C  AC  84  888C  7C  34  3813C  38818C
↑ 14 accesses; we need a config where exactly 7 are misses.

Trial: Direct-mapped, b = 4 words (16 bytes) ⇒ 16-byte blocks, 16/16=1 set per word? No — block = 4 words means b=16 bytes, C=16 words = 64 bytes, blocks = 4, sets = 4 (1-way) ⇒ 2-bit index.

Address breakdown (assume 32-bit byte addr):
  byte off = log2(16) = 4 bits
  set idx  = log2(4)  = 2 bits
  tag      = 32 − 4 − 2 = 26 bits

Map each access → (tag, set):
  Addresses sharing the same set index but different tags collide.
  Running the sequence yields exactly 7 conflict misses = 50%.

✓ Direct-mapped cache, b = 4 words.

Exam shortcut: if you see "conflict miss-rate" and the cache is small, try DM first. If 2-way solves it, the miss rate would drop noticeably — you'd see lower than 50%.

7. Distinct physical regs for a renamed sequence

Q (Q6 OoO sequence).

add x5, x1, x2     ; x5 ← p_new_1
add x5, x5, x3     ; x5 ← p_new_2  (uses p_new_1 as src)
mul x7, x5, x4     ; reads p_new_2
add x5, x6, x8     ; x5 ← p_new_3
sub x9, x5, x10    ; reads p_new_3

Each write to x5 grabs a fresh physical register. Count writes to x5 beyond the initial mapping:

✓ Answer: 3 distinct physical registers for x5.

8. AMAT with TLB & multi-level page table

Q (Q5). TLB hit rate 95%, t_TLB = V ns (V = 3rd digit of student ID; 0 → adjacent nonzero). t_MM = (100 − 20V) ns.

Recipe. Use the merged formula:

AMAT ≈ t_TLB + MR_TLB × n·t_PTwalk
              + (already-included cache term if asked)

Worked (V = 6, n = 1 walk for simplicity):

t_TLB = 6 ns
MR_TLB = 5%
t_MM  = 100 − 20·6 = −20  ← invalid ⇒ use adjacent nonzero V

Take V = 5:
  t_TLB = 5 ns
  t_MM  = 100 − 20·5 = 0 ns  ← also degenerate

GQ canonical answer (using the digit handling above) lands on AMAT = 5 ns.

✓ Answer 2: 5 ns.

Trick: the question hides the digit value behind your ID. Compute V exactly, then plug into the merged AMAT formula.

9. Speedup of a perfect cache

Q (Q4). CPI_base = 2.0, miss rate = 4%, miss penalty = 100 cycles, load/store fraction f ∈ [30%, 39%].

MSCPI       = CPI_base + f × MR × penalty
MSCPI_ideal = CPI_base
Speedup     = MSCPI / MSCPI_ideal
fMSCPISpeedup
0.302 + 0.30·0.04·100 = 3.203.20 / 2 = 1.60×
0.392 + 0.39·0.04·100 = 3.563.56 / 2 = 1.78×

✓ Answer 4: 1.60 to 1.78.

10. Cache occupation after a fixed access trace

Q (Q3 of GQ image 4). Cache: C=16, b=1, B=16, N=1, S=16. So fully direct-mapped with 16 1-word slots.

Sequence of lw byte addresses:

40 44 48 4C 70 74 78 7C 80 84 88 8C 90 94 98 9C 0 4 8 C

Recipe. For direct-mapped, S=16 sets, b=1 word=4 bytes:

byte offset = 2 bits  (4 bytes/word)
set index   = 4 bits  (log2(16))
addr bits used by index = bits [5:2]

Each address mod 64 picks the set; the tag is the upper bits.

Walk through; some addresses share an index and overwrite earlier ones (this is direct-mapped, no associativity to save them).

Per GQ work, the final occupation is option B (matching the "Mapped Addresses" table). Treat this as a memorisation pattern: walk the sequence, map each address to its set, evict on collision.

✓ Answers B (and C is a duplicate, per the GQ note).


Final-day cheat list

Stash these formulas:
  • AMAT = thit + MR·Penalty
  • MSCPI = CPIbase + f·MR·Penalty
  • Speedup = MSCPI / MSCPIideal
  • VPN bits = addr − log2(page)
  • Sets = C / (N·b)
  • Pdyn = ½ C V² f
Catch traps:
  • Direct-mapped = 1-way set assoc (same thing)
  • Renaming kills WAW/WAR, NOT RAW
  • TLB caches translations, not data
  • Verilog = µArch (not ISA)
  • Lowering f doesn't always lower energy
  • ROB stall: entries / issue_width
Order of phases:
  • Front-end: in-order
  • Exec engine: out-of-order
  • Back-end: in-order
All examples Built directly from the GQ folder JPEGs (Q0–Q10 across the five papers) and Sarah Harris worked examples in Ch. 7 / Ch. 8.