Patent Issued for Data segmentation (USPTO 11381587): Helios Data Inc.

2022 JUL 22 (NewsRx) -- By a News Reporter-Staff News Editor at Insurance Daily News -- According to news reporting originating from Alexandria, Virginia, by NewsRx journalists, a patent by the inventors Sun, Yi (Mountain View, CA, US), Zhang, Huiyu (Menlo Park, CA, US), Zou, Fei (Menlo Park, CA, US), filed on January 20, 2020, was published online on July 5, 2022.

The assignee for this patent, patent number 11381587, is Helios Data Inc. (Mountain View, California, United States).

Reporters obtained the following quote from the background information supplied by the inventors:

“Technical Field

“This disclosure relates generally to a data management system.

“Description of the Related Art

“The amount of data used by and accessible to computer systems is extremely large, and growing quickly. One estimate is that in 2016, there is approximately 11,000 exabytes of such information, which was expected to climb to around 52,000 Exabytes in 2020. Rizzatti, Dr. Lauro. “Digital Data Storage Is Undergoing Mind-Boggling Growth.” EETimes, 14 Sep. 2016. This article states that unstructured data (e.g., documents, photos, videos, etc.) accounts for most of the available data. In addition to typically being unstructured, data is often scattered around layers of a network (e.g., a cloud network, a data center network, a corporate network, etc.) with poor structuring and visibility. Given that data is often scattered and unstructured, this makes ensuring proper handling of the data quite difficult.

“That data often includes certain classes of information that is either legally required to be treated in a particular manner (as in the case of government regulation) or is desired to be treated in some fashion (as in the case with an enterprise data management policy). But since data is often scattered and unstructured, protecting that data or even ensuring that it is properly handled is impractical as the necessary understanding of what data is stored, how it is stored, where it is stored, and/or how it is used is simply not there or severely limited.

“In some cases, such as the health care context, certain data handling is legally mandated. Health care enterprises commonly store records for their patients that identify personal health information (PHI) such as demographic information, medical histories, insurance information, etc. These health care enterprises often need to exchange records with other enterprises, while also complying with Health Insurance Portability and Accountability Act (HIPPA) provisions that set out requirements for protecting that health information. These records, however, are usually in an unstructured format (e.g., photos, videos, e-mail messages, WORD documents, portable document format (PDF), etc.), making it easy for an employee to store those records without identifying what content that they store. Thus, all possible locations where PHI might be stored in an on-premise file system or in cloud storage is not known. As an example, a PDF document that is incorrectly named might include PHI, but an enterprise may be unaware that the document does, in fact, store PHI. In such a scenario, a health care enterprise may unknowingly provide another enterprise with access to a database that includes records (e.g., PDFs) with PHI that should not be accessed by that other enterprise simply because the health care enterprise lacks an understanding of its data.

“Even aside from legal mandates, data security is also of paramount importance. Data security management is normally performed by controlling access on the boundaries of a network. But once the network’s perimeter defenses such as firewalls are breached, there may be little (if any) interior defense to prevent malware (e.g., a virus) from roaming and attacking the network by damaging or stealing sensitive data. In some cases, an “interior defense” strategy may involve an agent-based defense that requires every susceptible device in the network to run a local security process. But this approach presents multiple points of weakness within the network. Thus, if a single local process is out of date, disabled by a user, or has already been compromised, this could lead to a significant data breach.

“This disclosure includes references to “one embodiment” or “an embodiment.” The appearances of the phrases “in one embodiment” or “in an embodiment” do not necessarily refer to the same embodiment. Particular features, structures, or characteristics may be combined in any suitable manner consistent with this disclosure.

“Within this disclosure, different entities (which may variously be referred to as “units,” “circuits,” other components, etc.) may be described or claimed as “configured” to perform one or more tasks or operations. This formulation-[entity] configured to [perform one or more tasks]-is used herein to refer to structure (i.e., something physical, such as an electronic circuit). More specifically, this formulation is used to indicate that this structure is arranged to perform the one or more tasks during operation. A structure can be said to be “configured to” perform some task even if the structure is not currently being operated. A “network interface configured to communicate over a network” is intended to cover, for example, an integrated circuit that has circuitry that performs this function during operation, even if the integrated circuit in question is not currently being used (e.g., a power supply is not connected to it). Thus, an entity described or recited as “configured to” perform some task refers to something physical, such as a device, circuit, memory storing program instructions executable to implement the task, etc. This phrase is not used herein to refer to something intangible. Thus, the “configured to” construct is not used herein to refer to a software entity such as an application programming interface (API).

“The term “configured to” is not intended to mean “configurable to.” An unprogrammed FPGA, for example, would not be considered to be “configured to” perform some specific function, although it may be “configurable to” perform that function and may be “configured to” perform the function after programming.

“Reciting in the appended claims that a structure is “configured to” perform one or more tasks is expressly intended not to invoke 35 U.S.C. 112(f) for that claim element. Accordingly, none of the claims in this application as filed are intended to be interpreted as having means-plus-function elements. Should Applicant wish to invoke Section 112(f) during prosecution, it will recite claim elements using the “means for” [performing a function] construct.

“As used herein, the terms “first,” “second,” etc. are used as labels for nouns that they precede, and do not imply any type of ordering (e.g., spatial, temporal, logical, etc.) unless specifically stated. For example, in a data structure that has multiple classes, the terms “first” class and “second” class can be used to refer to any class of the data structure. In other words, the first and second classes are not limited to the initial two classes of a data structure.

“As used herein, the term “based on” is used to describe one or more factors that affect a determination. This term does not foreclose the possibility that additional factors may affect a determination. That is, a determination may be solely based on specified factors or based on the specified factors as well as other, unspecified factors. Consider the phrase “determine A based on B.” This phrase specifies that B is a factor is used to determine A or that affects the determination of A. This phrase does not foreclose that the determination of A may also be based on some other factor, such as C. This phrase is also intended to cover an embodiment in which A is determined based solely on B. As used herein, the phrase “based on” is thus synonymous with the phrase “based at least in part on.””

In addition to obtaining background information on this patent, NewsRx editors also obtained the inventors’ summary information for this patent: “Managing data from the vantage point of the network perimeter is increasingly challenging, particularly with the current and expected further proliferation in governmental data usage regulations worldwide. To address such problems, the present disclosure sets forth a “data-defined” approach to data management. In this approach, data management problems can largely be seen as anomalous behavior of data, which can be addressed by classifying data in a network, defining “normal behavior” (or “anomalous behavior,” which refers to any improper use of data relative to some standard or data policy, whether or not that use is malicious), and then instituting an enforcement mechanism that ensures that anomalous data usage is controlled.

“The current content and nature of data within a given computer network is typically poorly understood. Conventional infrastructure-driven approaches to network organization and data management are not concerned with what different types of data are present within a network and how that data normally behaves (whether that data is in use, at rest, or in transit), which puts such data management paradigms at a severe disadvantage when dealing with novel threats.

“Data management broadly refers to the concept of ensuring that data is used in accordance with a policy objective. (Such “use” of the data includes the manner in which the data is stored, accessed, or moved.) The concept of data management thus includes data security (e.g., protecting data from malware attacks), data compliance (e.g., ensuring control of personal data is managed in accordance with a policy that may be established by a governmental organization), as well as permissioning that enforces entity-specific policies (e.g., certain groups in a company can access certain projects). The present disclosure describes a “data-defined” approach to data management, resulting in what is described as a “data-defined network” (DDN)-that is, a network (or portion of a network) that implements this data-defined approach.

“Broadly speaking, a DDN stores one or more data structures in which data in a network is organized and managed on the basis of observed attributes of the data, rather than infrastructure-driven factors, such as the particular physical devices or locations where that data is stored. In this manner, a group of DDN data structures may form the building block of a DDN and incorporate multiple dimensions of relevant data attributes to facilitate capturing the commonality of data in a network. In some embodiments, a given one of the group of DDN data structures in a particular network may correspond to a set of data objects that have similar content (e.g., as defined by reference to some similarity metric) and indicate baseline behavior for that set of objects. As used herein, the term “observed behavior” refers to how data objects are observed to be used within a network; observed behavior may be determined through a learning or training phase as described in this disclosure. For example, if a document is exchanged between two computer systems, then exchanging that document between those two systems is said to be an example of observed behavior for that document.

“When describing the behavior of data, the term “behavior” refers to actions performed on data, characteristics of those actions, and characteristics of those entities involved in the actions. Actions performed on the data may include without limitation reading, writing, deleting, transmitting, etc. Characteristics of those actions refers to properties of the actions being performed beyond the types of actions being performed on the data. Such characteristics may include without limitation the protocols used in those actions, the time when the action was initiated, the specific data involved in the action, parameters passed as part of the action, etc. Finally, data behavior also includes the identity and/or characteristics of the entities involved in the actions. Thus, if observed data behavior includes the transmission of data from user A to user B from a software entity C, data behavior can include information about user A, user B, and software entity C. Characteristics of the entities involved in the actions may include without limitation type of application transmitting the data, the type of system (e.g., client, server, etc.) running the application, etc. Accordingly, data behavior is intended to broadly encompass any information that can be extracted by a computer system when an operation is performed on a data object.”

The claims supplied by the inventors are:

“1. A method of controlling data within a computing network, the method comprising: generating, by a computer system, a set of data-defined network (DDN) data structures that define a set of content classes that can be used to logically group data objects; using the set of DDN data structures, the computer system identifying a plurality of data objects to be grouped into a data segmentation, wherein the plurality of data objects are stored in a subset of a plurality of data stores that are in different portions of the computing network and are coupled to data managers that monitor network traffic directed at the plurality of data stores for compliance with protection policies, and wherein the data segmentation is independent of physical locations of the plurality of data objects; determining, by the computer system, a baseline behavior associated with accesses of the plurality of data objects of the data segmentation; generating, by the computer system, a set of protection policies for the data segmentation based on the baseline behavior, wherein the set of protection policies defines permissible types of access to data objects within the data segmentation; sending, by the computer system, identifying information for the data segmentation and the set of protection policies to those data managers that monitor network traffic directed at the subset of data stores; determining, by the computer system, that an attempt to access one or more data objects located within the data segmentation is inconsistent with the set of protection policies; and based at least in part on the determining, the computer system preventing the attempt to access the one or more data objects.

“2. The method of claim 1, wherein generating the set of DDN data structures is performed based on one or more machine learning models.

“3. The method of claim 2, wherein the one or more machine learning models are capable of performing content classification of data objects to classify ones of those data objects into the set of content classes.

“4. The method of claim 2, wherein the one or more machine learning models are capable of performing behavior classification of data objects.

“5. The method of claim 2, wherein sending the identifying information for the data segmentation includes sending the set of DDN data structures and the one or more machine learning models.

“6. The method of claim 1, wherein a given data store of the subset of data stores includes a database and a respective database server that manages the database.

“7. The method of claim 1, wherein identifying the plurality of data objects to be grouped into the data segmentation is further based on user-supplied samples distinct from the DDN data structures.

“8. The method of claim 1, wherein the baseline behavior is based at least in part on data behavior reported by the data managers that monitor network traffic directed at the subset of data store.

“9. The method of claim 1, wherein generating the set of DDN data structures is performed at a different level of data object abstraction than identifying the plurality of data objects to be grouped into the data segmentation.

“10. A non-transitory computer-readable medium having program instructions stored thereon that are executable to cause a computer system to perform operations comprising: generating a set of data-defined network (DDN) data structures that define a set of content classes that can be used to logically group data objects; using the set of DDN data structures, identifying a plurality of data objects to be grouped into a data segmentation, wherein the plurality of data objects are stored in a subset of a plurality of data stores that are in different portions of a computing network and are coupled to data managers that monitor network traffic directed at the plurality of data stores for compliance with protection policies, and wherein the data segmentation is independent of physical locations of the plurality of data objects; determining a baseline behavior associated with accesses of the plurality of data objects of the data segmentation; generating a set of protection policies for the data segmentation based on the baseline behavior, wherein the set of protection policies defines permissible types of access to data objects within the data segmentation; sending identifying information for the data segmentation and the set of protection policies to those data managers that monitor network traffic directed at the subset of data stores; determining that an attempt to access one or more data objects located within the data segmentation is inconsistent with the set of protection policies; and based at least in part on the determining, preventing the attempt to access the one or more data objects.

“11. The non-transitory computer-readable medium of claim 10, wherein sending the identifying information for the data segmentation includes sending the set of DDN data structures and one or more machine learning models.

“12. The non-transitory computer-readable medium of claim 10, wherein generating the set of DDN data structures includes: evaluating network traffic to extract and group data objects based on their content satisfying a set of similarity criteria.

“13. The non-transitory computer-readable medium of claim 10, wherein the operations further comprise: sending, to a user device of a user associated with the computer system, a notification indicating that there was an attempt to access one or more data objects within the data segmentation that was inconsistent with the set of protection policies.

“14. A method of controlling data within a computing network, the method comprising: receiving, by a subset of a plurality of data managers that monitor network traffic that is directed at a plurality of data stores located within different portions of the computing network, information identifying a data segmentation and a set of protection policies that is derived from baseline behavior associated with accesses of a plurality of data objects included in the data segmentation, wherein the data segmentation is independent of physical locations of the plurality of data objects; determining, by the subset of data managers, that a set of attempts to access one or more data objects within the data segmentation is inconsistent with the set of protection policies; and based at least in part on the determining, the subset of data managers preventing the set of attempts to access the one or more data objects within the data segmentation.

“15. The method of claim 14, wherein receiving the identifying information for the data segmentation includes receiving a set of data-defined network (DDN) data structures, wherein the DDN data structures logically group data objects independent of physical infrastructure via which those data objects are stored, communicated, or utilized.

“16. The method of claim 15, wherein receiving the identifying information for the data segmentation includes receiving one or more machine learning models used in the generation of the set of DDN data structures.

“17. The method of claim 16, wherein the one or more machine learning models are configured to perform content classification of data objects.

“18. The method of claim 16, wherein the one or more machine learning models are configured to perform behavior classification of data objects.”

For more information, see this patent: Sun, Yi. Data segmentation. U.S. Patent Number 11381587, filed January 20, 2020, and published online on July 5, 2022. Patent URL: http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=11381587.PN.&OS=PN/11381587RS=PN/11381587

(Our reports deliver fact-based news of research and discoveries from around the world.)

Patent Issued for Data segmentation (USPTO 11381587): Helios Data Inc.

Advisor News

Annuity News

Health/Employee Benefits News

Life Insurance News

Patent Issued for Data segmentation (USPTO 11381587): Helios Data Inc.

Advisor News

Annuity News

Health/Employee Benefits News

Life Insurance News

Sign in with your Insider Pro Account