Deduplication using nearest neighbor cluster

Invention Grant

US11029871B2 Deduplication using nearest neighbor cluster 有权

Please log in to see more content

Patent Title: Deduplication using nearest neighbor cluster
Application No.: US16412946

Application Date: 2019-05-15
Publication No.: US11029871B2

Publication Date: 2021-06-08
Inventor: Jonathan Krasner , Sweetesh Singh , Steven Chalmer
Applicant: EMC IP Holding Company LLC
Applicant Address: US MA Hopkinton
Assignee: EMC IP Holding Company LLC
Current Assignee: EMC IP Holding Company LLC
Current Assignee Address: US MA Hopkinton
Agent Krishnendu Gupta; Anne-Marie Dinius
Main IPC: G06F3/06
IPC: G06F3/06 ; G06N20/00 ; H04L9/06

Deduplication using nearest neighbor cluster

Abstract:

Disclosed are techniques for data deduplication, which include methods, systems, or computer products for reducing data redundancy in a data storage system comprising searching a cluster of nearest neighbors, wherein the cluster has been created using a locality sensitive hashing algorithm, to determine if a data block has been stored in the data storage system prior to writing the data block. In alternate embodiments, the nearest neighbor clusters could be created using one or more of the following algorithms: k-means clustering algorithm, a k-medoids clustering algorithm, a mean shift algorithm, a generalized method of moment (GMM) algorithm, or a density based spatial clustering of applications with noise (DBSCAN) algorithm.

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F3/00	用于将所要处理的数据转变成为计算机能够处理的形式的输入装置；用于将数据从处理机传送到输出设备的输出装置，例如，接口装置
G06F3/06	.来自记录载体的数字输入，或者到记录载体上去的数字输出