Abstract:
A method of providing history based done logic includes receiving a cache line in a L2 cache; determining if the cache line has a history of access at least three times on a previous call into the L2 cache; providing the cache line directly to a processor if the history of access was less then the at least three times; and loading the cache line into an L1 cache if the history of access was the at least three times.
Abstract:
A method of providing history based done logic for a D-cache includes receiving a D-cache line in an L2 cache; determining if the D-cache line is unprefetchable; aging the D-cache line without a delay if the D-cache line is prefetchable; and aging the D-cache line with a delay if the D-cache line is unprefetchable.
Abstract:
One embodiment of the invention provides a processor. The processor generally includes a first and second processor core, each having a plurality of pipelined execution units for executing an issue group of multiple instructions and scheduling logic configured to issue a first issue group of instructions to the first processor core for execution and a second issue group of instructions to the second processor core for execution when the processor is in a first mode of operation and configured to issue one or more vector instructions for concurrent execution on the first and second processor cores when the processor is in a second mode of operation.
Abstract:
Embodiments of the invention provide a look-aside-look-aside buffer (LLB) configured to retain a portion of the real addresses in a translation look-aside (TLB) buffer to allow prefetching of data from a cache. A subset of real address bits associated with an effective address may be retrieved relatively quickly from the LLB, thereby allowing access to the cache before the complete address translation is available and reducing cache access latency.
Abstract:
A method of providing history based done logic for a I-cache includes receiving an I-cache line in an L2 cache; determining if the I-cache line is unprefetchable; aging the I-cache line without a delay if the I-cache line is prefetchable; and aging the I-cache line with a delay is the I-cache line is unprefetchable.
Abstract:
Embodiments of the invention provide a processor for executing instructions. In one embodiment, the processor includes circuitry to receive a load instruction and a store instruction to be executed in the processor and detect a conflict between the load instruction and the store instruction. Detecting the conflict includes determining if load-store conflict information indicates that the load instruction previously conflicted with the store instruction. The load-store conflict information is stored for both the load instruction and the store instruction. The processor further includes circuitry to schedule execution of the load instruction and the store instruction so that execution of the load instruction and the store instruction do not result in a conflict.
Abstract:
Embodiments of the invention provide an apparatus of storing branch prediction information. In one embodiment, an integrated circuit device includes a first table for storing local branch prediction information, a second table for storing global branch prediction information, and circuitry. The circuitry is configured to receive a branch instruction and store local branch prediction information for the branch instruction in the first table. The local branch prediction information includes a local predictability value for the local branch prediction information. The circuitry is further configured to store global branch prediction information for the branch instruction in the second table only if the local predictability value is below a threshold value of predictability.
Abstract:
Improved techniques for executing instructions in a pipelined manner that may reduce stalls that occur when executing dependent instructions are provided. Stalls may be reduced by utilizing a cascaded arrangement of pipelines with execution units that are delayed with respect to each other. This cascaded delayed arrangement allows dependent instructions to be issued within a common issue group by scheduling them for execution in different pipelines to execute at different times.
Abstract:
Embodiments of the invention provide a method and processor for executing instructions. In one embodiment, the method includes receiving a load instruction and a store instruction to be executed in the processor and detecting a conflict between the load instruction and the store instruction. Detecting the conflict includes determining if load-store conflict information indicates that the load instruction previously conflicted with the store instruction. The load-store conflict information is stored for both the load instruction and the store instruction. The method further includes scheduling execution of the load instruction and the store instruction so that execution of the load instruction and the store instruction do not result in a conflict.
Abstract:
Methods and systems for repairing ports are disclosed. Embodiments may detect a hard failure of a port, select an alternative port from existing ports in use within an array, and share the alternative port to route operands bound for the first port and the alternative port, to transmit operands associated with the failed port to the corresponding destination unit. Embodiments include an additional wire, or an alternative port path, that couples the alternative port to the destination unit that is associated with the first port. For instance, in a multi-pipeline processor, an operand of an instruction that is bound for the failed read port may be routed via an alternative read port to the corresponding execution unit. Similarly, data bound for failed write ports may be, e.g., written back to a register file by routing the data via an alternative write port of the register file.