Multi-mode low-precision inner-product computation circuits for massively parallel neural inference engine
Abstract:
Neural inference chips for computing neural activations are provided. In various embodiments, the neural inference chip is adapted to: receive an input activation tensor comprising a plurality of input activations; receive a weight tensor comprising a plurality of weights; Booth recode each of the plurality of weights into a plurality of Booth-coded weights, each Booth coded value having an order; multiply the input activation tensor by the Booth coded weights, yielding a plurality of results for each input activation, each of the plurality of results corresponding to the orders of the Booth-coded weights; for each order of the Booth-coded weights, sum the corresponding results, yielding a plurality of partial sums, one for each order; and compute a neural activation from a sum of the plurality of partial sums.
Information query
Patent Agency Ranking
0/0