Abstract:
One embodiment of the present invention sets forth a technique for reducing the number of assembly instructions included in a computer program. The technique involves receiving a directed acyclic graph (DAG) that includes a plurality of nodes, where each node includes an assembly instruction of the computer program, hierarchically parsing the plurality of nodes to identify at least two assembly instructions that are vectorizable and can be replaced by a single vectorized assembly instruction, and replacing the at least two assembly instructions with the single vectorized assembly instruction.
Abstract:
A basic block within a thread program is characterized for convergence based on mapping the basic block to an indicator subnet within a corresponding Petri net generated to model the thread program. Each block within the thread program may be similarly characterized. Each corresponding Petri net is enumerated to generate a corresponding state space graph. If the state space graph includes an exit node with an odd execution count attribute, such as by Petri net coloring, then the corresponding basic block is divergent. The corresponding basic block is convergent otherwise. Using this characterization technique, a thread program compiler may advantageously identify all convergent blocks within a thread program and apply appropriate optimizations to the convergent blocks.
Abstract:
A system and method are provided for inserting synchronization statements into a program file to mitigate race conditions. The method includes reading a program file and determining one or more convergent statements in the program file. The method also includes inserting one or more synchronization statements in the program file between the determined convergent statements. The method further includes removing one or more of the inserted synchronization statements and writing the modified program file. The method may include, after removing the inserted synchronization statements, identifying to a user any remaining inserted synchronization statements.
Abstract:
A system, method, and computer program product are provided for transposing a matrix. In use, a matrix is identified. Additionally, the matrix is transposed utilizing row-wise operations and column-wise operations, where the row-wise operations and the column-wise operations are performed independently.
Abstract:
A system and method for detecting, filtering, prioritizing and reporting shared memory hazards are disclosed. The method includes, for a unit of hardware operating on a block of threads, mapping a plurality of shared memory locations assigned to the unit to a tracking table. The tracking table comprises initialization information for each shared memory location. The method also includes, for an instruction of a program within a barrier region, identifying a potential conflict by identifying a second access to a location in shared memory within a block of threads executed by the hardware unit. First information associated with a first access and second information associated with the second access to the location is determined. Filter criteria is applied to the first and second information to determine whether the instruction causes a reportable hazard. The instruction is reported when it causes the reportable hazard.
Abstract:
One embodiment of the present invention sets forth a technique for reducing the number of assembly instructions included in a computer program. The technique involves receiving a directed acyclic graph (DAG) that includes a plurality of nodes, where each node includes an assembly instruction of the computer program, hierarchically parsing the plurality of nodes to identify at least two assembly instructions that are vectorizable and can be replaced by a single vectorized assembly instruction, and replacing the at least two assembly instructions with the single vectorized assembly instruction.
Abstract:
A system and method for detecting, filtering, prioritizing and reporting shared memory hazards are disclosed. The method includes, for a unit of hardware operating on a block of threads, mapping a plurality of shared memory locations assigned to the unit to a tracking table. The tracking table comprises initialization information for each shared memory location. The method also includes, for an instruction of a program within a barrier region, identifying a potential conflict by identifying a second access to a location in shared memory within a block of threads executed by the hardware unit. First information associated with a first access and second information associated with the second access to the location is determined. Filter criteria is applied to the first and second information to determine whether the instruction causes a reportable hazard. The instruction is reported when it causes the reportable hazard.
Abstract:
A basic block within a thread program is characterized for convergence based on mapping the basic block to an indicator subnet within a corresponding Petri net generated to model the thread program. Each block within the thread program may be similarly characterized. Each corresponding Petri net is enumerated to generate a corresponding state space graph. If the state space graph includes an exit node with an odd execution count attribute, such as by Petri net coloring, then the corresponding basic block is divergent. The corresponding basic block is convergent otherwise. Using this characterization technique, a thread program compiler may advantageously identify all convergent blocks within a thread program and apply appropriate optimizations to the convergent blocks.
Abstract:
A system and method for detecting shared memory hazards are disclosed. The method includes, for a unit of hardware operating on a block of threads, mapping a plurality of shared memory locations assigned to the unit to a tracking table. The tracking table comprises an initialization bit as well as access type information, collectively called the state tracking bits for each shared memory location. The method also includes, for an instruction of a program within a barrier region, identifying a second access to a location in shared memory within a block of threads executed by the hardware unit. The second access is identified based on a status of the state tracking bits. The method also includes determining a hazard based on a first type of access and a second type of access to the shared memory location. Information related to the first access is provided in the table.