Processors, methods, systems, and instructions to store consecutive source elements to unmasked result elements with propagation to masked result elements

    公开(公告)号:US10223113B2

    公开(公告)日:2019-03-05

    申请号:US15122005

    申请日:2014-03-27

    Abstract: A processor of an aspect includes a decode unit to decode an instruction indicating a first source packed data operand including at least four data elements, a source mask including at least four mask elements, and a destination storage location. An execution unit, in response to the instruction, stores a result packed data operand having a series of at least two unmasked result data elements. Each of the unmasked result data elements stores a value of a different one of at least two consecutive data elements of the first source packed data operand in a relative order. All masked result elements, which are between a nearest corresponding pair of unmasked result data elements, have a same value as an unmasked result data element of the corresponding pair, which is closest to a first end of the result packed data operand. The masked result data elements correspond to masked mask elements.

    Instruction for element offset calculation in a multi-dimensional array

    公开(公告)号:US10025591B2

    公开(公告)日:2018-07-17

    申请号:US15363785

    申请日:2016-11-29

    Abstract: An apparatus is described having functional unit logic circuitry. The functional unit logic circuitry has a first register to store a first input vector operand having an element for each dimension of a multi-dimensional data structure. Each element of the first vector operand specifying the size of its respective dimension. The functional unit has a second register to store a second input vector operand specifying coordinates of a particular segment of the multi-dimensional structure. The functional unit also has logic circuitry to calculate an address offset for the particular segment relative to an address of an origin segment of the multi-dimensional structure.

    Methods, apparatus, instructions, and logic to provide permute controls with leading zero count functionality
    5.
    发明授权
    Methods, apparatus, instructions, and logic to provide permute controls with leading zero count functionality 有权
    方法,设备,指令和逻辑,以提供带有零计数功能的置换控制

    公开(公告)号:US09372692B2

    公开(公告)日:2016-06-21

    申请号:US13731008

    申请日:2012-12-29

    Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.

    Abstract translation: 说明和逻辑提供带有零计数功能的SIMD置换控制。 一些实施例包括具有多个数据字段的寄存器的处理器,每个数据字段用于存储第二多个位。 目的地寄存器具有对应的数据字段,这些数据字段中的每一个用于存储对于相应数据字段设置为零的最重要连续位数的计数。 响应于对向量前导零计数指令进行解码,执行单元对寄存器中的每个数据字段计数设置为零的最高有效连续位的数目,并将计数存储在第一目的地寄存器的相应数据字段中。 向量前导零计数指令可用于生成与该组置换控制一起使用的置换控制和完成掩码,以解决采集修改散射SIMD操作中的依赖关系。

    INSTRUCTION FOR IMPLEMENTING VECTOR LOOPS OF ITERATIONS HAVING AN ITERATION DEPENDENT CONDITION
    6.
    发明申请
    INSTRUCTION FOR IMPLEMENTING VECTOR LOOPS OF ITERATIONS HAVING AN ITERATION DEPENDENT CONDITION 有权
    执行具有迭代相关条件的迭代矢量图的指令

    公开(公告)号:US20160011873A1

    公开(公告)日:2016-01-14

    申请号:US14327527

    申请日:2014-07-09

    Abstract: A processor is described having an instruction execution pipeline. The instruction execution pipeline includes an instruction fetch stage to fetch an instruction. The instruction identifies an input vector operand whose input elements specify one or the other of two states. The instruction execution pipeline also includes an instruction decoder to decode the instruction. The instruction execution pipeline also includes a functional unit to execute the instruction and provide a resultant output vector. The functional unit includes logic circuitry to produce an element in a specific element position of the resultant output vector by performing an operation on a value derived from a base value using a stride in response to one but not the other of the two states being present in a corresponding element position of the input vector operand.

    Abstract translation: 描述了具有指令执行流水线的处理器。 指令执行流水线包括取指令的指令提取阶段。 该指令识别其输入元素指定两种状态中的一种或另一种的输入向量操作数。 指令执行流水线还包括用于解码指令的指令解码器。 指令执行管线还包括执行指令并提供合成输出向量的功能单元。 功能单元包括逻辑电路,用于通过响应于两个状态中的一个而不是另一个状态中的一个而不是另一个状态来执行对从基本值导出的值的操作来产生所得到的输出向量的特定元素位置中的元素 输入向量操作数的相应元素位置。

    Collapsing of multiple nested loops, methods, and instructions

    公开(公告)号:US11042377B2

    公开(公告)日:2021-06-22

    申请号:US16233955

    申请日:2018-12-27

    Abstract: In an embodiment, the present invention is directed to a processor including a decode logic to receive a multi-dimensional loop counter update instruction and to decode the multi-dimensional loop counter update instruction into at least one decoded instruction, and an execution logic to execute the at least one decoded instruction to update at least one loop counter value of a first operand associated with the multi-dimensional loop counter update instruction by a first amount. Methods to collapse loops using such instructions are also disclosed. Other embodiments are described and claimed.

    Methods, apparatus, instructions and logic to provide permute controls with leading zero count functionality

    公开(公告)号:US10452398B2

    公开(公告)日:2019-10-22

    申请号:US16228529

    申请日:2018-12-20

    Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.

    Systems, apparatuses, and methods for broadcast compare addition

    公开(公告)号:US10268479B2

    公开(公告)日:2019-04-23

    申请号:US15396199

    申请日:2016-12-30

    Abstract: Systems, apparatuses, and methods for executing an instruction. The instruction includes fields for a first source operand, a second source operand, and a destination operand. A decoded instruction causes a reduction of broadcasted packed data elements of a first packed data source with a reduction operation and store a result of each of the reductions in a packed data destination, wherein the packed data elements of the first packed data source to be used in the reduction are dictated by a result of a comparison of broadcasted values of packed data elements stored in a second packed data source to the packed data elements stored in the second packed data source without broadcasting.

Patent Agency Ranking