Invention Grant
- Patent Title: Deduplication using nearest neighbor cluster
-
Application No.: US16412946Application Date: 2019-05-15
-
Publication No.: US11029871B2Publication Date: 2021-06-08
- Inventor: Jonathan Krasner , Sweetesh Singh , Steven Chalmer
- Applicant: EMC IP Holding Company LLC
- Applicant Address: US MA Hopkinton
- Assignee: EMC IP Holding Company LLC
- Current Assignee: EMC IP Holding Company LLC
- Current Assignee Address: US MA Hopkinton
- Agent Krishnendu Gupta; Anne-Marie Dinius
- Main IPC: G06F3/06
- IPC: G06F3/06 ; G06N20/00 ; H04L9/06

Abstract:
Disclosed are techniques for data deduplication, which include methods, systems, or computer products for reducing data redundancy in a data storage system comprising searching a cluster of nearest neighbors, wherein the cluster has been created using a locality sensitive hashing algorithm, to determine if a data block has been stored in the data storage system prior to writing the data block. In alternate embodiments, the nearest neighbor clusters could be created using one or more of the following algorithms: k-means clustering algorithm, a k-medoids clustering algorithm, a mean shift algorithm, a generalized method of moment (GMM) algorithm, or a density based spatial clustering of applications with noise (DBSCAN) algorithm.
Information query