Patent Issued for Data-defined architecture for network data management (USPTO 11368476): Helios Data Inc.
2022 JUL 12 (NewsRx) -- By a
The patent’s assignee for patent number 11368476 is
News editors obtained the following quote from the background information supplied by the inventors:
“Technical Field
“This disclosure relates generally to a data management system.
“Description of the Related Art
“The amount of data used by and accessible to computer systems is extremely large, and growing quickly. One estimate is that in 2016, there is approximately 11,000 exabytes of such information, which was expected to climb to around 52,000 Exabytes in 2020. Rizzatti,
“That data often includes certain classes of information that is either legally required to be treated in a particular manner (as in the case of government regulation) or is desired to be treated in some fashion (as in the case with an enterprise data management policy). But since data is often scattered and unstructured, protecting that data or even ensuring that it is properly handled is impractical as the necessary understanding of what data is stored, how it is stored, where it is stored, and/or how it is used is simply not there or severely limited.
“In some cases, companies are legally mandated to handle certain data in a particular way. For example, in the health care context, health care enterprises commonly store records for their patients that identify personal health information (PHI) such as demographic information, medical history information, insurance information, etc. These records are usually in an unstructured format (e.g., photos, videos, e-mail messages, WORD documents, portable document format (PDF), etc.), making it easy for an employee to store those records with other types of information. Since PHI may be mixed with other types of information, it is difficult to identify all locations where PHI is stored in an on-premise file system or in cloud storage. Health care enterprises often need to exchange records that do not include PHI with other enterprises, while also complying with Health Insurance Portability and Accountability Act (HIPPA) provisions that set out requirements for protecting PHI. Since a health care enterprise does not know where all its PHI is stored, that health care enterprise may unknowingly provide another enterprise with access to a database that stores records with PHI that should not be accessed by the other enterprise. Thus, data can be a liability for companies that do not have the mechanisms in place to properly management it.
“Even aside from legal mandates, data security management is of paramount importance in ensuring proper internal usage of data. Data security management is normally performed by controlling access on the boundaries of a computer network. But once the network’s perimeter defenses (e.g., firewalls) are breached, there is normally little (if any) interior defense to prevent malware (e.g., a virus) from roaming and attacking the network by damaging or stealing sensitive data. In some cases, an “interior defense” strategy may involve an agent-based defense that requires every susceptible device in the network to run a localized security process. But each device in this approach represents a point of weakness within the network because if a single local process is out of date, disabled by a user, or has already been compromised, then this could lead to a significant data breach.
“This disclosure includes references to “one embodiment” or “an embodiment.” The appearances of the phrases “in one embodiment” or “in an embodiment” do not necessarily refer to the same embodiment. Particular features, structures, or characteristics may be combined in any suitable manner consistent with this disclosure.
“Within this disclosure, different entities (which may variously be referred to as “units,” “circuits,” other components, etc.) may be described or claimed as “configured” to perform one or more tasks or operations. This formulation-[entity] configured to [perform one or more tasks]-is used herein to refer to structure (i.e., something physical, such as an electronic circuit). More specifically, this formulation is used to indicate that this structure is arranged to perform the one or more tasks during operation. A structure can be said to be “configured to” perform some task even if the structure is not currently being operated. A “network interface configured to communicate over a network” is intended to cover, for example, an integrated circuit that has circuitry that performs this function during operation, even if the integrated circuit in question is not currently being used (e.g., a power supply is not connected to it). Thus, an entity described or recited as “configured to” perform some task refers to something physical, such as a device, circuit, memory storing program instructions executable to implement the task, etc. This phrase is not used herein to refer to something intangible. Thus, the “configured to” construct is not used herein to refer to a software entity such as an application programming interface (API).
“The term “configured to” is not intended to mean “configurable to.” An unprogrammed FPGA, for example, would not be considered to be “configured to” perform some specific function, although it may be “configurable to” perform that function and may be “configured to” perform the function after programming.
“Reciting in the appended claims that a structure is “configured to” perform one or more tasks is expressly intended not to invoke 35 U.S.C. 112(f) for that claim element. Accordingly, none of the claims in this application as filed are intended to be interpreted as having means-plus-function elements. Should Applicant wish to invoke Section 112(f) during prosecution, it will recite claim elements using the “means for” [performing a function] construct.
“As used herein, the terms “first,” “second,” etc. are used as labels for nouns that they precede, and do not imply any type of ordering (e.g., spatial, temporal, logical, etc.) unless specifically stated. For example, in a data structure that has multiple classes, the terms “first” class and “second” class can be used to refer to any class of the data structure. In other words, the first and second classes are not limited to the initial two classes of a data structure.
“As used herein, the term “based on” is used to describe one or more factors that affect a determination. This term does not foreclose the possibility that additional factors may affect a determination. That is, a determination may be solely based on specified factors or based on the specified factors as well as other, unspecified factors. Consider the phrase “determine A based on B.” This phrase specifies that B is a factor is used to determine A or that affects the determination of A. This phrase does not foreclose that the determination of A may also be based on some other factor, such as C. This phrase is also intended to cover an embodiment in which A is determined based solely on B. As used herein, the phrase “based on” is thus synonymous with the phrase “based at least in part on.””
As a supplement to the background information on this patent, NewsRx correspondents also obtained the inventors’ summary information for this patent: “Managing data from the vantage point of the network perimeter is increasingly challenging, particularly with the current and expected further proliferation in governmental data usage regulations worldwide. To address such problems, the present disclosure sets forth a “data-defined” approach to data management. In this approach, data management problems can largely be seen as anomalous behavior of data, which can be addressed by classifying data in a network, defining “normal behavior” (or “anomalous behavior,” which refers to any improper use of data relative to some standard or data policy, whether or not that use is malicious), and then instituting an enforcement mechanism that ensures that anomalous data usage is controlled.
“The current content and nature of data within a given computer network is typically poorly understood. Conventional infrastructure-driven approaches to network organization and data management are not concerned with what different types of data are present within a network and how that data normally behaves (whether that data is in use, at rest, or in transit), which puts such data management paradigms at a severe disadvantage when dealing with novel threats.
“Data management broadly refers to the concept of ensuring that data is used in accordance with a policy objective. (Such “use” of the data includes the manner in which the data is stored, accessed, or moved.) The concept of data management thus includes data security (e.g., protecting data from malware attacks), data compliance (e.g., ensuring control of personal data is managed in accordance with a policy that may be established by a governmental organization), as well as permissioning that enforces entity-specific policies (e.g., certain groups in a company can access certain projects). The present disclosure describes a “data-defined” approach to data management, resulting in what is described as a “data-defined network” (DDN)-that is, a network (or portion of a network) that implements this data-defined approach.
“Broadly speaking, a DDN stores one or more DDN data structures through which data in a network is organized and managed on the basis of observed attributes of the data and data usage policies, rather than infrastructure-driven factors, such as the particular physical devices or locations where that data is stored. In this manner, a set of DDN data structures may form the building block of a DDN and incorporate multiple dimensions of relevant data attributes to facilitate capturing the commonality of data in a network. In some embodiments, a given one of the set of DDN data structures in a particular network may correspond to a set of data objects (e.g., files) that have similar content (e.g., as defined by reference to some similarity metric) and may indicate a baseline behavior for that set of objects. As used herein, the term “observed behavior” refers to how data objects are observed to be used within a network; observed behavior may be determined through a learning or training phase as described in this disclosure. For example, if a document is exchanged between two computer systems, then exchanging that document between those two systems is said to be an example of observed behavior for that document.
“When describing the behavior of data, the term “behavior” refers to actions performed on data, characteristics of those actions, and characteristics of those entities involved in the actions. Actions performed on the data may include without limitation reading, writing, deleting, transmitting, etc. Characteristics of those actions refers to properties of the actions being performed beyond the types of actions being performed on the data. Such characteristics may include without limitation the protocols used in those actions, the time when the action was initiated, the specific data involved in the action, parameters passed as part of the action, etc. Finally, data behavior also includes the identity and/or characteristics of the entities involved in the actions. Thus, if observed data behavior includes the transmission of data from user A to user B from a software entity C, data behavior can include information about user A, user B, and software entity C. Characteristics of the entities involved in the actions may include without limitation type of application transmitting the data, the type of system (e.g., client, server, etc.) running the application, etc. Accordingly, data behavior is intended to broadly encompass any information that can be extracted by a computer system when an operation is performed on a data object.”
The claims supplied by the inventors are:
“1. A computer-implemented method of controlling data within a computing network, the method comprising: evaluating network traffic to extract and group data objects based on their content and a set of similarity criteria, and to identify baseline data behavior with respect to accesses of the data objects; generating a set of data-defined network (DDN) data structures that logically group the data objects independent of physical infrastructure via which those data objects are stored, communicated, or utilized, wherein a given one of the set of DDN data structures includes a content class and one or more behavioral classes; wherein the content class is indicative of one or more of the data objects that have been grouped based on the one or more data objects sharing similar content according to a machine learning classification model; and wherein the one or more behavioral classes indicate baseline behavior of the one or more data objects as determined from evaluation of the network traffic; detecting anomalous data behavior within the network traffic, wherein the detecting includes: extracting a particular data object from the network traffic; classifying, based on the machine learning classification model, the particular data object into a particular content class; and in response to the particular content class corresponding to a particular one of the set of DDN data structures, determining whether baseline data behavior of the particular data object classifies into one of one or more behavioral classes of the particular DDN data structure, wherein detection of the anomalous data behavior is based on the determining; and in response to detecting the anomalous data behavior, preventing network traffic corresponding to the anomalous data behavior from being communicated via the computing network.
“2. The computer-implemented method of claim 1, further comprising: receiving one or more user-provided data samples; generating respective root hash values corresponding to the one or more user-provided data samples; and storing the root hash values in a database.
“3. The computer-implemented method of claim 2, wherein the evaluating includes: determining that a given one of the data objects satisfies the set of similarity criteria, including by: generating a data object hash value of the given data object; and determining that the data object hash value matches a given one of the root hash values stored in the database.
“4. The computer-implemented method of claim 3, further comprising: subsequent to determining that the given data object satisfies the set of similarity criteria, storing a record of behavioral features associated with the given data object.
“5. The computer-implemented method of claim 1, wherein the one or more behavioral classes of the particular DDN data structure are based upon a machine learning behavioral classification, and wherein the determining of whether the baseline data behavior of the particular data object classifies into one of the one or more behavioral classes of the particular DDN data structure includes performing the machine learning behavioral classification upon a record of behavioral features associated with the particular data object.
“6. The computer-implemented method of claim 5, wherein the machine learning behavioral classification is based upon a set of convolutional neural networks (CNN) and recurrent neural networks (RNN).
“7. A non-transitory computer-readable medium having program instructions stored thereon that are executable by a computer system to perform operations comprising: evaluating network traffic within a computing network to group data objects based on their content and a set of similarity criteria, and to identify baseline network behavior with respect to accesses of the data objects; generating a data structure that includes a content class corresponding to a machine learning content classification model and one or more behavioral classes corresponding to a machine learning behavioral classification model; wherein the content class is indicative of one or more of the data objects that have been grouped based on the one or more data objects having a set of similar content; wherein the one or more behavioral classes indicate baseline network behavior of the one or more data objects as determined from evaluation of the network traffic; and detecting anomalous data behavior within the network traffic utilizing the data structure, wherein the detecting includes: extracting a particular data object from the network traffic; classifying, based on the machine learning content classification model, content of the particular data object; and in response to determining that the particular data object classifies into the content class of the data structure, making a determination on whether baseline network behavior of the particular data object classifies into the one or more behavioral classes of the data structure, wherein detection of the anomalous data behavior is based on the determination; and in response to detecting the anomalous data behavior, preventing the network traffic corresponding to the anomalous data behavior from being communicated via the computing network.
“8. The computer-readable medium of claim 7, wherein the making of the determination includes: determining, based upon the machine learning behavioral classification model, that the particular data object does not classify into the one or more behavior classes and thus does not exhibit expected behavior; and indicating that the particular data object exhibits anomalous behavior based upon the particular data object failing to exhibit the expected behavior.
“9. The computer-readable medium of claim 7, wherein the operations further comprise: obtaining one or more user-defined rules regarding content or behavior of data objects; and storing the one or more user-defined rules in association with the data structure.
“10. The computer-readable medium of claim 9, wherein the detecting further includes: in response to determining that the particular data object exhibits expected behavior according to the machine learning behavioral classification model, determining that the particular data object fails to satisfy the one or more user-defined rules included in the data structure; and indicating that the particular data object exhibits anomalous behavior based upon the particular data object failing to satisfy the one or more user-defined rules.
“11. The computer-readable medium of claim 7, wherein the operations further comprise: retrieving a plurality of data samples from one or more storage devices; generating a respective plurality of root hash values using the plurality of data samples; and storing the plurality of root hash values within a database.
“12. The computer-readable medium of claim 11, wherein determining that content of a given one of the data objects satisfies the set of similarity criteria comprises: generating a data object hash value of the given data object; and determining that the data object hash value matches a given one of the root hash values stored in the database.
“13. A network device, comprising: a plurality of network ports configured to communicate packetized network traffic; one or more processors configured to route the packetized network traffic among the plurality of network ports; and a memory that stores program instructions executable by the one or more processors to perform operations comprising: evaluating the packetized network traffic to identify data objects that satisfy a set of similarity criteria with respect to one or more user-provided data samples; in response to identifying a set of data objects that satisfy the set of similarity criteria, storing content and behavioral features associated with the set of data objects in a database; generating a plurality of data-defined network (DDN) data structures based on the stored content and behavioral features associated with the set of data objects, wherein a given one of the plurality of DDN data structures includes a content class and one or more behavioral classes; wherein the content class is indicative of one or more of the set of data objects that have been grouped based on the one or more data objects sharing similar content according to a machine learning classification model; wherein the one or more behavioral classes indicate baseline network behavior of the one or more data objects as determined from evaluation of the packetized network traffic; and detecting anomalous data behavior within the packetized network traffic, wherein the detecting includes: extracting a particular data object from the packetized network traffic; classifying, based on the machine learning classification model, the particular data object into a particular content class; and in response to the particular content class corresponding to a particular one of the set of DDN data structures, determining whether baseline data behavior of the particular data object classifies into one of one or more behavioral classes of the particular DDN data structure, wherein detection of the anomalous data behavior is based on the determining; and preventing the packetized network traffic corresponding to the anomalous data behavior from being transmitted to a device coupled to the packetized network device.
“14. The network device of claim 13, wherein identifying that a given one of the set of data objects satisfies the set of similarity criteria comprises: generating a data object hash value of the given data object; and determining that the data object hash value matches a given root hash value stored in a database, wherein the database stores one or more root hash values respectively generated from the one or more user-provided data samples.
“15. The network device of claim 14, wherein the determining is based upon a machine learning behavioral classification model.”
There are additional claims. Please visit full patent to read further.
For additional information on this patent, see: Zou, Fei. Data-defined architecture for network data management.
(Our reports deliver fact-based news of research and discoveries from around the world.)
Reports from Michigan State University Add New Study Findings to Research in Risk Management (Nightly Automobile Claims Prediction from Telematics-Derived Features: A Multilevel Approach): Insurance – Risk Management
Patent Issued for Computing system for de-identifying patient data (USPTO 11366927): Allscripts Software LLC
Advisor News
Annuity News
Health/Employee Benefits News
Life Insurance News