Step by step Dynamic Scheduling design and verification - A 1. Start with a functional Simple RISC pipeline (Simple RISC HW Specifications v.1.0) 2. Replace the single general-purpose execution unit with three specialized execution units: One execution unit for the integer arithmetic, logic, branch and control instructions, another one for all store/load instructions, and a third one for the floating point arithmetic instructions. 3. For this stage of design treat the operands and the result of any floating point instruction as integers and its operation as a normal integer operation. The design of the floating point execution unit will be done in the next semester. 4. Increase the delay for the execution unit for floating point instructions up to four cycles, emulating a real multiclock execution unit (propagate the result through a shift register): always @(posedge clk) begin r1 <= r; r2 <= r1; r3 <= r2; ... end 5. Modify the read stage module so that it redirects (dispatches) the instruction to the appropriate execution unit, according to the instruction type, provided that the target execution unit is able to receive it. 6. Modify the write back module so that it takes the result from the execution unit that finishes its operation. If more than one execution unit finish at the same time, only one result is propagated to the register set. Because the floating point execution unit is the slowest, its result has priority over the other results. If the result must be selected between the other two execution units, pick up the oldest one. The execution units whose results cannot be written-back immediately are stalled, and some other pipeline units may be stalled too, if necessary. The stalled execution unit signals a not-ready bit to the read/dispatch stage. Some NOPs may be overwritten during this stall with useful instructions, in order to keep the number of stalled stages as small as possible. 7. For verification, use sequences of independent instructions, mixing different instruction types.