Systems and methods for efficient data searching, storage and reduction
Abstract:
A computer-implemented method, according to one embodiment includes, for each repository data chunk in repository data that comprises a plurality of the repository data chunks, generating a corresponding set of repository distinguishing characteristics (RDCs). Each set of RDCs is generated by: applying a hash function to the respective input data chunk or repository data chunk to generate a plurality of hashes, each hash comprising a hash value and a hash position within the data chunk, applying a first function to the plurality of generated hashes to identify a first subset of hashes distributed across the data chunk, applying a second function to the hash positions of the hashes of the first subset to identify a second subset of the plurality of generated hashes, and defining the second subset of hashes as the set of RDCs.
Information query
Patent Agency Ranking
0/0