Researchers Submit Patent Application, “Coordinated De-Identification Of A Dataset Across A Network”, for Approval (USPTO 20190266352)
2019 SEP 18 (NewsRx) -- By a
The patent’s assignee is
News editors obtained the following quote from the background information supplied by the inventors: “Technical Field
“Present invention embodiments relate to methods, systems and computer program products for receiving at a network device a dataset with masked direct identifiers from a client’s site and performing further data de-identification of the dataset to protect indirect (or quasi) identifiers and sensitive attributes. In particular, a server receives from a customer site a person-specific dataset with masked direct identifiers, discovers indirect/quasi identifiers and sensitive attributes within the dataset, and performs further compatible data de-identification techniques to protect the indirect identifiers and the sensitive attributes of the dataset.
“Discussion of the Related Art
“Data anonymization is a data sanitization process for protecting personally identifiable information in datasets, including both direct identifiers that can directly identify individuals such as, for example, full names of individuals, social security numbers, customer numbers, patient identifiers, phone numbers, credit card numbers, etc., as well as indirect identifiers, which are non-direct identifier attribute values in a dataset, a combination of which may be unique for some individuals and could be used to re-identify these individuals. For example, a five-digit zip code of a home address, a gender, and a date of birth of individuals are well-known quasi-identifiers because a combination of their values has been shown to be unique for a large number of
“A third type of identifier in a dataset is sensitive attributes, which are non-direct, non-quasi-identifier attributes having values that are sensitive and should therefore not be linked to specific individuals. As an example, individuals may not want to be linked with disease, salary, or sensitive location information in a dataset (e.g., church, hospital, etc.). Preventing linkage of individuals to their sensitive attribute values blocks sensitive information disclosure attacks and goes beyond protection against subject re-identification. However, preventing sensitive information disclosure is usually part of data de-identification efforts.
“Personal data that have been ‘sufficiently anonymized’ such as, for example, anonymized data that satisfies the Health Insurance Portability and Accountability Act (HIPAA) requirements in
“Data owners are hesitant to allow highly sensitive personal data such as, for example, customers’ transactions, purchase records, healthcare information, etc., to leave their premises (even in encrypted form using state-of-the-art encryption algorithms) for uploading to a cloud platform for de-identification and additional processing to support business use cases, analytics and other uses. Before allowing highly sensitive personal data to leave their premises, data owners are increasingly using existing in-house solutions for performing data de-identification, which are limited to the support of data masking algorithms and in most cases are unable to adequately protect data to meet legal requirements.”
As a supplement to the background information on this patent application, NewsRx correspondents also obtained the inventor’s summary information for this patent application: “According to a first aspect of embodiments of the invention, a method of de-identifying a dataset is provided. A network device receives information from a client device, wherein the information includes a list of at least one group of techniques selected from groups consisting of a group of data masking techniques and a group of pseudonymization techniques, associated configuration options that are supported by the client device and a description of a dataset to be de-identified. The network device determines a first technique from the at least one group of techniques and associated configuration options supported by the client device and the network device. The network device receives a dataset from the client device, wherein the dataset is produced at the client device by applying the determined first technique and the associated configuration options to corresponding attributes. A de-identification technique is applied to the dataset at the network device to produce a resulting set of de-identified data, wherein the data de-identification technique is coordinated with the first technique and configuration options to further de-identify the dataset.
“According to a second aspect of embodiments of the invention, a system for de-identifying data of a dataset is provided. The system includes at least one processor and at least one memory having instructions embodied therein such that the at least one processor is configured to perform: receiving information from a client device, wherein the information includes a list of at least one group of techniques selected from groups consisting of a group of data masking techniques and a group of data pseudonymization techniques, and associated configuration options that are supported by the client device and a description of a dataset to be de-identified; determining a first technique from the at least one group of techniques and configuration options that are supported by the client device and the system; receiving a dataset from the client device, wherein the dataset is produced at the client device by applying the determined first technique and the associated configuration options to corresponding data attributes; and applying a de-identification technique to the dataset to produce a resulting set of de-identified data, wherein the de-identification technique is coordinated with the first technique and the associated configuration options to de-identify the masked dataset.
“According to a third aspect of embodiments of the invention, a computer program product including at least one computer readable storage medium having computer readable program code embodied therewith for execution on at least one processor is provided. The computer readable program code is configured to be executed by the at least one processor to perform: receiving information from a client device, wherein the information includes a list of at least one group of techniques selected from groups consisting of a group of data masking techniques and a group of data pseudonymization techniques, and associated configuration options that are supported by the client device and a description of a dataset to be de-identified; determining a first technique from the at least one group of techniques, associated configuration options supported by the client device and a system including the at least one processor; receiving a dataset from the client device, wherein the dataset is produced at the client device by applying the determined first technique and the associated configuration options to corresponding data attributes; and applying a de-identification technique to the dataset to produce a resulting set of de-identified data, wherein the de-identification technique is coordinated with the first technique and the configuration options to de-identify the dataset.”
The claims supplied by the inventors are:
“1-7. (canceled)
“8. A system de-identifying data of a dataset, the system comprising: at least one processor; and at least one memory having instructions embodied therein, the at least one processor being configured to perform: receiving information from a client device, wherein the information includes a list of at least one group of techniques selected from groups consisting of a group of data masking techniques and a group of data pseudonymization techniques, and associated configuration options supported by the client device and a description of a dataset to be de-identified; determining a first technique from the at least one group of techniques and associated configuration options supported by the client device and the system; receiving a dataset from the client device, wherein the dataset is produced at the client device by applying the determined first technique and the associated configuration options to corresponding data attributes; and applying a de-identification technique to the dataset to produce a resulting set of de-identified data, wherein the de-identification technique is coordinated with the first technique and the associated configuration options to de-identify the dataset.
“9. The system of claim 8, wherein the system resides within a cloud computing environment.
“10. The system of claim 8, wherein the description of the dataset includes a data dictionary that provides a list of data attributes appearing in the dataset, their corresponding data types and associated metadata.
“11. The system of claim 8, wherein the attributes include one or more direct identifiers.
“12. The system of claim 11, wherein applying the de-identification technique further comprises: identifying one or more sets of quasi-identifiers within the dataset; and applying the de-identification technique to the identified one or more sets of quasi-identifiers to produce the resulting set of de-identified data.
“13. The system of claim 12, wherein identifying the one or more sets of quasi-identifiers comprises: analyzing values of attributes of each record to find unique combinations of the values; and identifying attributes of the unique combinations of the values as the one or more sets of quasi-identifiers.
“14. The system of claim 8, wherein the at least one processor is further configured to perform: applying further protection to the resulting set of de-identified data to improve a privacy level by extending the first technique applied at the client device using compatible techniques supported by the system; identifying at least one sensitive attribute within the dataset; and applying the de-identification technique to the at least one identified sensitive attribute to produce the resulting set of de-identified data.
“15. A computer program product comprising at least one computer readable storage medium having computer readable program code embodied therewith for execution on at least one processor, the computer readable program code being configured to be executed by the at least one processor to perform: receiving information from a client device, wherein the information includes a list of at least one group of techniques selected from groups consisting of a group of data masking techniques and a group of data pseudonymization techniques, and associated configuration options supported by the client device and a description of a dataset to be de-identified; determining a first technique from the at least one group of techniques and associated configuration options supported by the client device and a system including the at least one processor; receiving a dataset from the client device, wherein the dataset is produced at the client device by applying the determined first technique and the associated configuration options to corresponding data attributes; and applying a de-identification technique to the dataset to produce a resulting set of de-identified data, wherein the de-identification technique is coordinated with the first technique and the associated configuration options to de-identify the dataset.
“16. The computer program product of claim 15, wherein the description of the dataset includes a data dictionary that provides a list of data attributes appearing in the dataset, their corresponding data types and associated metadata.
“17. The computer program product of claim 15, wherein the attributes include one or more direct identifiers.
“18. The computer program product of claim 17, wherein applying the de-identification technique further comprises: identifying one or more sets of quasi-identifiers within the dataset; and applying the de-identification technique to the identified one or more sets of quasi-identifiers to produce the resulting set of de-identified data.
“19. The computer program product of claim 18, wherein identifying the one or more sets of quasi-identifiers comprises: analyzing values of attributes of each record to find unique combinations of the values; and identifying attributes of the unique combinations of the values of attributes as the one or more sets of quasi-identifiers.
“20. The computer program product of claim 15, wherein the computer readable program code is configured to be executed by the at least one processor to perform: applying further protection to the resulting set of de-identified data to improve a privacy level by extending the first technique applied at the client device using compatible techniques supported by the system; identifying at least one sensitive attribute within the dataset; and applying the de-identification technique to the at least one identified sensitive attribute to produce the resulting set of de-identified data.”
For additional information on this patent application, see: Gkoulalas-Divanis, Aris. Coordinated De-Identification Of A Dataset Across A Network. Filed
(Our reports deliver fact-based news of research and discoveries from around the world.)



VIDEO: Georgia War Veterans Home to host suicide prevention workshop
Benefits Officers Development and Outreach (BODO) March training event – Pittsburgh, PA
Advisor News
- Hagens Berman: Retired First Responders Sue Washington State over Rights to $3.3B Pension Funds Threatened by Lawmakers
- Financially support your adult children without risking your future
- NY insurance agent and Ponzi schemer faces 4-12 years in prison
- Economic pressure makes boomerang living a new normal
- Millennials ready to bring their advisor to the family table
More Advisor NewsAnnuity News
- A new opportunity for advisors: Younger indexed annuity buyers
- Most employers support embedding guaranteed lifetime income options into DC Plans
- InspereX Partners with AuguStar Retirement for Strategic Expansion into Annuity Market
- FACC and DOL enter stipulation to dismiss 2020 guidance lawsuit
- Zinnia’s Zahara policy admin system adds FIA chassis to product library
More Annuity NewsHealth/Employee Benefits News
- Reports from University of Washington Provide New Insights into Managed Care (Self-Reported Stress, Hair Cortisol and Untreated Caries in Low-Income Adolescents in the United States): Managed Care
- Research on Health Insurance Published by Researchers at Metropolitan Autonomous University (Health Insurance Coverage and Income Inequality in the United States: Findings from the American Community Survey, 2010 to 2023): Health Insurance
- Private Medicare plans get a break
- LAWMAKERS SPOTLIGHT HOW HIGH HOSPITAL PRICES DRIVE THE HEALTH CARE AFFORDABILITY CRISIS
- ACTING SUPERINTENDENT KAITLIN ASROW SECURES $2.25 MILLION CYBERSECURITY SETTLEMENT WITH DELTA DENTAL
More Health/Employee Benefits NewsLife Insurance News
- Finalists announced for Lincoln's 2026 Best Places to Work
- Investors Heritage Promotes Anna Reynolds to Senior Vice President and General Counsel
- AM Best Affirms Credit Ratings of Old Republic International Corporation’s Subsidiaries
- Government seeks dismissal of Dean Vagnozzi’s lawsuit against SEC
- Symetra Promotes Nicholas Mocciolo to Chief Investment Officer of Symetra Financial Corporation
More Life Insurance News