Patent Issued for Data security classification sampling and labeling (USPTO 11704431): Microsoft Technology Licensing LLC
2023 AUG 09 (NewsRx) -- By a
Patent number 11704431 is assigned to
The following quote was obtained by the news editors from the background information supplied by the inventors: “A major goal of information assurance is to provide confidence that information systems will perform as desired, and that information will be available only to authorized users. Cybersecurity is viewed by some people as a specialization within the realm of information assurance, while other people take a broader view of cybersecurity and may even consider cybersecurity and information assurance to be essentially the same as one another. A broad view, which treats “information assurance” and “cybersecurity” as interchangeable, applies in this document.
“Regardless of the terminology used, however, various classifications of data may be employed to help make data available, to keep data confidential, and to maintain the integrity of data. In the present document, “data classification”, “data security classification”, and “data categorization” all mean the same thing, as opposed to other contexts in which “classification” more narrowly means an official action taken by a government or a military to restrict access to data based on national security concerns, or a result of such official action.
“Data classification activities recognize that data in one category can, or should, or in some cases must, be treated differently and protected differently from data in another category, according to the respective categorizations. Many laws, regulations, guidelines, standards, and policies define different categories of data, and describe category-dependent criteria for protecting or using data. Some of the many examples include the General Data Protection Regulation (GDPR) in
In addition to the background information obtained for this patent, NewsRx journalists also obtained the inventors’ summary information for this patent: “Some embodiments taught herein use or perform operations that enhance cybersecurity and data categorization efficiency by providing reliable statistics about the number and location of sensitive data of different categories. These data sensitivity statistics are computed while iteratively sampling a collection of items that hold data. Efficient sampling algorithms are described. Data sensitivity statistic gathering or updating that is based on the sampling activity ends when a specified threshold has been reached, e.g., a certain number of items have been sampled, a certain amount of data has been sampled, sampling has used a certain amount of power or CPU cycles or another computational resource, or the sensitivity statistics have stabilized to a certain extent. The resulting statistics about data sensitivity can be utilized for regulatory compliance, policy formulation or enforcement, data protection, forensic investigation, risk management, evidence production, or another classification-dependent or classification-enhanced activity.
“Some embodiments repeat iterations of a data sampling sequence until an iterations-complete-condition is met. The data sampling sequence of a current iteration includes: selecting a current iteration scan-set of stored items from a group of stored items, the selecting based at least partially on a current iteration sampling allotment; when a scanning-condition is met then in response scanning data of the current iteration scan-set for sensitive data which meets a predefined sensitivity criterion which defines a sensitivity type; when scanned data of a particular stored item of the current iteration scan-set includes sensitive data which meets the predefined sensitivity criterion, then in response updating a data security classification statistical measure; calculating a next iteration sampling allotment which is based at least partially on the current iteration sampling allotment and the data security classification statistical measure; and when the iterations-complete-condition is not met, then in response using the next iteration sampling allotment as the current iteration sampling allotment of a next iteration of the data sampling sequence.
“Some embodiments of teachings presented herein include or communicate with data security classification sampling functionality that includes digital hardware that is configured to perform certain operations. These operations may provide data security classification statistics by (a) getting an iterations-complete-condition, and (b) iteratively repeating a data sampling sequence until the iterations-complete-condition is met, wherein the data sampling sequence of a current iteration includes (b1) selecting a current iteration scan-set of stored items from a group of stored items, the selecting based at least partially on a current iteration sampling allotment, (b2) when a scanning-condition is met then in response scanning data of the current iteration scan-set for sensitive data which meets a predefined sensitivity criterion which defines a sensitivity type, (b3) when scanned data of a particular stored item of the current iteration scan-set includes sensitive data which meets the predefined sensitivity criterion, then in response labeling the particular stored item with a predefined sensitivity label which corresponds to the predefined sensitivity criterion, and when the scanned data of a particular stored item does not include data which meets the predefined sensitivity criterion, then in response avoiding labeling the particular stored item with the predefined sensitivity label, (b4) updating a data security classification statistical measure in response to the labeling or the avoiding labeling, (b5) calculating a next iteration sampling allotment which is based at least partially on the current iteration sampling allotment and the data security classification statistical measure, and (b6) when the iterations-complete-condition is not met, then in response using the next iteration sampling allotment as the current iteration sampling allotment of a next iteration of the data sampling sequence.
“Some embodiments can provide a data sensitivity result which is suitable for beneficial use by at least one of the following: a data privacy tool, a data security tool, a data loss prevention tool, a risk management tool, a regulatory compliance tool, a forensics tool, a computational resource administration tool, or a litigation evidence production tool. The data sensitivity result includes at least one data sensitivity statistic based on the sampling. The data sensitivity result optionally includes sampling metadata such as time expended, resources used, items scanned, items labeled, or the like, which are not necessarily part of the iterations-complete-condition.
“Other technical activities pertinent to teachings herein will also become apparent to those of skill in the art. The examples given are merely illustrative. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Rather, this Summary is provided to introduce-in a simplified form-some technical concepts that are further described below in the Detailed Description. The innovation is defined with claims, and to the extent this Summary conflicts with the claims, the claims should prevail.”
The claims supplied by the inventors are:
“1. A system to improve power management in a computer network, comprising: a memory; a processor which is in operable communication with the memory, the processor configured to configure the memory with instructions and data and perform steps which include providing data security classification statistics by (a) getting an iterations-complete-condition, and (b) iteratively repeating a data sampling sequence until the iterations-complete-condition is met, wherein the data sampling sequence of a current iteration includes (b1) selecting a current iteration scan-set of stored items from a group of stored items in the computer network, the selecting based at least partially on a current iteration power consumption budget representing an amount of electric power consumption in the computer network, (b2) when a scanning-condition is met then in response scanning data of the current iteration scan-set for sensitive data which meets a predefined sensitivity criterion which defines a sensitivity type, (b3) when scanned data of a particular stored item of the current iteration scan-set includes sensitive data which meets the predefined sensitivity criterion, then in response labeling the particular stored item with a predefined sensitivity label which corresponds to the predefined sensitivity criterion, and when the scanned data of a particular stored item does not include data which meets the predefined sensitivity criterion, then in response avoiding labeling the particular stored item with the predefined sensitivity label, (b4) updating a data security classification statistical measure in response to the labeling or the avoiding labeling, (b5) calculating a next iteration power consumption budget which is based at least partially on the current iteration power consumption budget and the data security classification statistical measure, and (b6) when the iterations-complete-condition is not met, then in response using the next iteration power consumption budget as the current iteration power consumption budget of a next iteration of the data sampling sequence; whereby the system manages power consumption in the computer network during data security classification by selectively limiting which stored items are scanned for data that meets the predefined sensitivity criterion instead of scanning all stored items.
“2. The system of claim 1, wherein the system comprises multiple data scanners which are configured to perform scanning for sensitive data which meets a respective predefined sensitivity criterion, and wherein the processor is configured to set the scanning-condition to enable zero or more scanners for a particular iteration based on at least one of the following: which sensitivity type or combination of sensitivity types have been found by previous scanning, metadata of the group of stored items, the data security classification statistical measure, an iteration number which indicates how many iterations of the data sampling sequence have been performed, or a computational cost that is associated with a particular scanner.
“3. The system of claim 1, wherein the current iteration power consumption budget for a first iteration is based on at least one of the following: an amount of time, an amount of a computational resource, an amount of power consumption, a number of stored items, or an amount of stored item data.
“4. The system of claim 1, wherein the iterations-complete-condition comprises at least one of the following: a maximum number of iterations, a minimum number of iterations, a maximum time expended during iterations, a minimum time expended during iterations, a maximum computational resource used during iterations, a minimum computational resource used during iterations, a maximum power consumption during iterations, a minimum power consumption during iterations, a maximum number of stored items scanned during iterations, a minimum number of stored items scanned during iterations, a maximum number of stored items labeled during iterations, a minimum number of stored items labeled during iterations, a maximum amount of data scanned during iterations, a minimum amount of data scanned during iterations, or a specified stability of the data security classification statistical measure during iterations.
“5. The system of claim 1, wherein the current iteration scan-set includes stored items from a plurality of groups of stored items, and a portion of the current iteration power consumption budget is allocated to each of the groups.
“6. The system of claim 1, wherein the data security classification statistical measure comprises at least one of the following: a sensitivity-presence value which measures stored items which have been labeled during iterations performed so far, relative to a measure of all stored items of the group; a sensitivity-diversity value which measures an amount of sensitivity types of stored items which have been labeled during iterations performed so far, relative to a measure of all defined sensitivity types.
“7. A method to improve power management in a computer network, comprising performing programmed operations as follows: allocating an initial power consumption budget among m groups of stored data items in the computer network, m being an integer greater than one, the power consumption budget representing an amount of electric power consumption in the computer network; for each iteration i until an iterations-complete-condition is met: for each group group-j of stored data items, j ranging from one to m: selecting a scan-set scan-set-i-j of stored items from within group group-j, the selecting based at least partially on a power consumption budget allotment-i-j which is based at least partially on a data security classification statistical measure score-i-j, wherein score-i-j is based at least partially on sensitive data identified so far by scanning data of stored items; when a scanning-condition is met then in response scanning data of the scan-set scan-set-i-j of stored items for sensitive data, wherein sensitive data is data that meets a predefined sensitivity criterion which defines a sensitivity type; when a scanned particular stored item of the current iteration scan-set includes sensitive data, then in response updating score-i-j; and providing a data sensitivity result to at least one of the following: a data privacy tool, a data security tool, a data loss prevention tool, a risk management tool, a regulatory compliance tool, a forensics tool, computational resource administration tool, or a litigation evidence production tool.
“8. The method of claim 7, wherein providing data sensitivity results comprises providing at least one of the following: each score-i-j; a per-group data security classification statistical measure score-j which is based on score-i-j values for group-j over multiple iterations; a per-group sensitivity-presence value sensitivity-presence-j which measures sensitive data identified in group group-j relative to a measure of all data of group-j; a per-group sensitivity-diversity value sensitivity-diversity-j which measures an amount of sensitivity types of data identified in group group-j relative to a measure of all defined sensitivity types; an overall data security classification statistical measure score which is based on score-i-j values for all groups over all iterations; an overall sensitivity-presence value which measures sensitive data identified in all groups over all iterations relative to a measure of all data in all groups; or an overall sensitivity-diversity value which measures an amount of sensitivity types of data identified in all groups over all iterations relative to a measure of all defined sensitivity types.
“9. The method of claim 7, further comprising at least one of the following: choosing on a per-group basis which zero or more sensitivity types to scan data for in a particular group-j; choosing on a per-iteration basis which zero or more sensitivity types to scan data for during a particular iteration i; or scanning data for different sensitivity types at different times during the method.
“10. The method of claim 7, further comprising labeling sensitive data during the operations with at least one predefined sensitivity label which corresponds to the predefined sensitivity criterion satisfied by the sensitive data, after the sensitive data is identified during the operations.
“11. The method of claim 7, wherein the method comprises meeting the iterations-complete-condition by discerning a specified level of stability of the data security classification statistical measure over at least two iterations.
“12. The method of claim 7, wherein: selecting a scan-set of stored items includes selecting at least one of the following stored items: blobs, files, tables, records, objects, email messages, email attachments; and selecting a scan-set of stored items from within a group includes selecting stored items from within at least one of the following stored item groups: a container, a directory, a database, a list, a tree, an account, a repository.”
There are additional claims. Please visit full patent to read further.
URL and more information on this patent, see: Bashir, Salam. Data security classification sampling and labeling.
(Our reports deliver fact-based news of research and discoveries from around the world.)
Researchers Submit Patent Application, “Systems And Methods For Using Tokenized Icons To Provide Insurance Policy Quotes”, for Approval (USPTO 20230230175): Patent Application
Data on Biologics Reported by Laetitia Penso and Colleagues [Persistence of second-line biologics in psoriasis after first-line biologic failure: a nationwide cohort study from the French health insurance database (SNDS)]: Biotechnology – Biologics
Advisor News
Annuity News
Health/Employee Benefits News
Life Insurance News