Abstract:
A computer-implemented method for neutralizing file-format-specific exploits contained within electronic communications may include (1) identifying an electronic communication, (2) identifying at least one file contained within the electronic communication, and then (3) neutralizing any file-format-specific exploits contained within the file. In one example, neutralizing any file-format-specific exploits contained within the file may include applying at least one file-format-conversion operation to the file. Additionally or alternatively, neutralizing any file-format-specific exploits contained within the file may include constructing a sterile version of the file that selectively omits at least a portion of any exploitable content contained within the file. Various other methods, systems, and computer-readable media are also disclosed.
Abstract:
A computer-implemented method for validating ownership of deduplicated data may include (1) identifying a request from a remote client to store a data object in a data store that already includes an instance of the data object, (2) in response to the request, verifying that the remote client possesses the data object by (i) issuing a randomized challenge to the remote client, the randomized challenge including a random value which, when combined with at least a portion of the data object, produces an authentication token demonstrating possession of the data object and, in response to the randomized challenge, (ii) receiving the authentication token from the remote client; and, in response to receiving the authentication token from the remote client, (3) storing the data object in the data store on behalf of the remote client. Various other methods and systems are also disclosed.
Abstract:
Various embodiments of a network protocol that utilizes a congestion control algorithm that distinguishes between congestion loss and damage loss are described. In response to a packet loss on a network, a delay-based detection algorithm may be performed based on RTT (Round-Trip Time) information to determine whether the network is congested. If the delay-based detection algorithm does not determine that the network is congested then a consistency-based detection algorithm may be performed based on packet loss rate information. If either the delay-based detection algorithm or the consistency-based detection algorithm determine that the network is congested then the rate of data transmission may be reduced, e.g., by reducing a congestion window size.
Abstract:
A computer-implemented method for providing increased scalability in deduplication storage systems may include (1) identifying a database that stores a plurality of reference objects, (2) determining that at least one size-related characteristic of the database has reached a predetermined threshold, (3) partitioning the database into a plurality of sub-databases capable of being updated independent of one another, (4) identifying a request to perform an update operation that updates one or more reference objects stored within at least one sub-database, and then (5) performing the update operation on less than all of the sub-databases to avoid processing costs associated with performing the update operation on all of the sub-databases. Various other systems, methods, and computer-readable media are also disclosed.
Abstract:
A system and method for managing a resource reclamation reference list at a coarse level. A storage device is configured to store a plurality of storage objects in a plurality of storage containers, each of said storage containers being configured to store a plurality of said storage objects. A storage container reference list is maintained, wherein for each of the storage containers the storage container reference list identifies which files of a plurality of files reference a storage object within a given storage container. In response to detecting deletion of a given file that references an object within a particular storage container of the storage containers, a server is configured to update the storage container reference list by removing from the storage container reference list an identification of the given file. A reference list associating segment objects with files that reference those segment objects may not be updated response to the deletion.
Abstract:
A de-duplication storage system which uses multiple indices is described. A first group of one or more indices may be stored in random access memory (RAM) or another type of fast storage. A second group of one or more indices may be stored on one or more disk drives or another type of storage where large amounts of data can be stored inexpensively. The first group of indices may be used when adding new files to the de-duplication storage system in order to determine whether the file segments of the new files are already stored. The second group of indices may be used when restoring files in order to lookup the segments of the files.
Abstract:
A computer-implemented method for performing lookups on distributed deduplicated data systems may include (1) identifying a collection of deduplicated data stored within a plurality of nodes, (2) identifying a request to locate a deduplicated object of the collection within the plurality of nodes, (3) identifying a fingerprint of the deduplicated object, the fingerprint being generated using an algorithm that maps deduplicated objects onto a fingerprint space, (4) directing the request, based on a partitioning scheme that divides the fingerprint space among the plurality of nodes, to a first node within the plurality of nodes that is responsible for forwarding requests pertaining to a partition of the fingerprint space that includes the fingerprint, and (5) forwarding the request from the first node to a second node identified by the first node as corresponding to the fingerprint. Various other methods, systems, and computer-readable media are also disclosed.
Abstract:
A system and method for backing up files to a single-instance storage system are disclosed. The files may be split into segments, and the file data may be stored in the single-instance storage system as individual segments. The single-instance storage system uses the concept of a file region which covers multiple segments of the file. If a region of a file is unchanged from one backup to the next, the system may use a region object to refer to the unchanged region. This avoids the need to update the reference information for each of the segments within the region, thus increasing the efficiency of backing up the new version of the file.
Abstract:
A computer-implemented method for removing unreferenced data segments from deduplicated data systems may include: 1) identifying a deduplicated data system that contains a plurality of data segments, 2) identifying a plurality of containers within the deduplicated data system, with each container containing a subset of the data segments within the deduplicated data system, 3) identifying at least one container within the plurality of containers that is likely to include a large proportion of data segments that are not referenced by data objects within the deduplicated data system, and then, for each identified container, 4) searching for unreferenced data segments within the identified container and 5) removing the unreferenced data segments from the identified container. Various other methods, systems, and computer-readable media are also disclosed.
Abstract:
A computer-implemented method for garbage collection in deduplicated data systems may include: 1) identifying a deduplicated data system, 2) identifying at least one segment object added to the deduplicated data system during a garbage-collection process of the deduplicated data system, 3) locking the segment object to prevent removal of the segment object by the garbage-collection process, and 4) unlocking the segment object after the garbage-collection process. The method may allow a small possibility of incorrectly removing useful segment objects. The method may also verify data objects during the garbage-collection process and recover incorrectly removed segment objects. Various other methods, systems, and computer-readable media are also disclosed.