“Self-Contained System For De-Identifying Unstructured Data In Healthcare Records” in Patent Application Approval Process (USPTO 20190236310) - Insurance News | InsuranceNewsNet

InsuranceNewsNet — Your Industry. One Source.™

Sign in
  • Subscribe
  • About
  • Advertise
  • Contact
Home Now reading Newswires
Topics
    • Advisor News
    • Annuity Index
    • Annuity News
    • Companies
    • Earnings
    • Fiduciary
    • From the Field: Expert Insights
    • Health/Employee Benefits
    • Insurance & Financial Fraud
    • INN Magazine
    • Insiders Only
    • Life Insurance News
    • Newswires
    • Property and Casualty
    • Regulation News
    • Sponsored Articles
    • Washington Wire
    • Videos
    • ———
    • About
    • Meet our Editorial Staff
    • Advertise
    • Contact
    • Newsletters
  • Exclusives
  • NewsWires
  • Magazine
  • Newsletters
Sign in or register to be an INNsider.
  • AdvisorNews
  • Annuity News
  • Companies
  • Earnings
  • Fiduciary
  • Health/Employee Benefits
  • Insurance & Financial Fraud
  • INN Exclusives
  • INN Magazine
  • Insurtech
  • Life Insurance News
  • Newswires
  • Property and Casualty
  • Regulation News
  • Sponsored Articles
  • Video
  • Washington Wire
  • Life Insurance
  • Annuities
  • Advisor
  • Health/Benefits
  • Property & Casualty
  • Insurtech
  • About
  • Advertise
  • Contact
  • Editorial Staff

Get Social

  • Facebook
  • X
  • LinkedIn
Newswires
Newswires RSS Get our newsletter
Order Prints
August 21, 2019 Newswires
Share
Share
Post
Email

“Self-Contained System For De-Identifying Unstructured Data In Healthcare Records” in Patent Application Approval Process (USPTO 20190236310)

Insurance Daily News

2019 AUG 21 (NewsRx) -- By a News Reporter-Staff News Editor at Insurance Daily News -- A patent application by the inventors Austin, Joseph (Sterling, MA); Kassam-Adams, Shahir (Lovingston, VA); LaBonte, Jason A. (Natick, MA); Bayless, Paul J. (Burke, VA), filed on January 23, 2019, was made available online on August 1, 2019, according to news reporting originating from Washington, D.C., by NewsRx correspondents.

This patent application is assigned to Datavant (San Francisco, California, United States).

The following quote was obtained by the news editors from the background information supplied by the inventors: “There exists a vast store of information within the unstructured data fields of healthcare records that can be critical to properly understanding the effectiveness and safety of clinical treatment. However, due to their inherent lack of structure, de-identification of these data fields is a challenge due to the lack of an automated solution, leaving much of this information absent from analytical datasets. This challenge is one that was created by the implementation of databases and other data storage and analysis technologies combined with the unstructured nature of healthcare records combined with the need for meeting privacy regulations and requirements. Conventional healthcare data systems are limited in their ability to provide information from individual records in healthcare data sets because each record contains protected health information (‘PHI’) or personal identification information (PII) (e.g., names, addresses, dates of birth, dates of death, social security numbers, etc.). It is a potential Health Insurance Portability and Accountability Act (HIPAA) violation to incorporate PHI elements into a healthcare data set. Accordingly, to be compliant with government regulations, all PHI data elements must be removed and/or de-identified before being incorporated into any healthcare data set. However, once PHI data elements are removed from record, users have no way to understand which individuals in the data set match particular structured or unstructured data relevant for analysis.

“Generally, conventional (systems, devices, methods) are unable to identify, flag, and remove protected health information (PHI) or personal identification information (PII) (e.g. names, addresses, dates of birth, dates of death, social security numbers, etc.) from unstructured data in an automated fashion. Instead, the current practice is to remove PHI or PII manually based on human determinations, or to not incorporate any unstructured data fields into healthcare data sets being used for non-clinical purposes, because of the inability to efficiently and accurately identify, flag, and remove PHI or PII using existing technologies.”

In addition to the background information obtained for this patent application, NewsRx journalists also obtained the inventors’ summary information for this patent application: “There is a need for improvements for enabling healthcare data sets within healthcare records of individuals to be accessible and useable without exposing protected healthcare information of the individual. There is a need for improvements for a system whereby unstructured data can be de-identified in an automated manner, but still be able to be matched to individual records in healthcare data sets without exposure of PHI or PII. Additionally, the de-identification process needs to be ‘tune-able’ to control redaction of sensitive information while allowing information that looks like PHI or PII, but is not, to remain in the data set. Moreover, it would be desirable to have the de-identified text to remain coherent after the personal identifying information has been removed.

“The present invention is directed toward further solutions to address this need, in addition to having other desirable characteristics. Specifically, the present invention provides an advancement made in computer technology that consists of improvements defined by logical structures and processes directed to a specific implementation of a solution to a problem in software, data structures and data management, wherein the existing data structure technology relies upon unacceptable reproduction of protected health information, personal identification information, or other private information to transmit data for data processing purposes that cannot meet or be used under current requirements of the Health Insurance Portability and Accountability Act (HIPAA) (42 U.S.C. .sctn. 1301 et seq.), and other laws, regulations, rules and standards governing privacy and data security (e.g. General Data Protection Regulation (Regulation (EU) 2016/679), Federal Trade Commission Act (15 U.S.C. .sctn.sctn. 41-58), Children’s Online Privacy Protection Act (COPPA) (15 U.S.C. .sctn.sctn. 6501-6506), Financial Services Modernization Act (Gramm-Leach-Bliley Act (GLB)) (15 U.S.C. .sctn.sctn. 6801-6827) and California Consumer Privacy Act of 2018), by providing a system and method in which unstructured data is processed from individual records in a healthcare data set without exposing PHI. The present invention provides a system and method that creates a specific, non-abstract improvement to computer functionality previously incapable of merging certain data sets without exposing PHI and PII, that de-identifies data by removing protected health information and personal identification information from the record, adds a unique encrypted person token to each record, and merges the record with other healthcare data sets that have likewise been de-identified and tokenized by matching the unique encrypted person tokens in data sets to one another, thus maintaining the ability to match disparate data (e.g., unstructured data and structured healthcare data) from disparate sources for a same individual. In particular, the present invention implements a self-contained, dictionary-based system containing tunable lists of PHI or PII entities and data formats (e.g., names, birth dates, phone numbers, addresses, etc.) to be utilized to de-identify unstructured data within healthcare and other data sets. The dictionaries include blacklisted terms (e.g., first names, last names, etc.) to be redacted from the unstructured data and also include blacklisted standard number formats (e.g., social security numbers, telephone numbers, etc.) to be removed from records, a project specific whitelist of terms not to be removed (terms to be allowed to remain in the data despite being included in the blacklist), and a record-specific blacklist created from the PHI or PII present in a specific record (e.g., individual patient records). Taken together, these three dictionaries create an adjusted blacklist that can be ‘tuned’ to control the level of de-identification and scrub a set of records. The present invention uses the dictionaries to remove all elements determined to be PII or PHI, but also replaces the removed elements with a case-type tag identifying the type of information being removed (e.g., ‘first name’, ‘address’, etc.). The addition of the case-type tag ensures that the unstructured data are still coherent even after information has been removed/redacted.

“Additionally, to connect the resulting de-identified data to other data sets, the invention works in a manner consistent with other de-identification systems and methods. In particular, the records are tokenized in a standardized format to include encrypted patient tokens with every record. This ‘tokenized’ data can then be merged with structured healthcare data sets that have also been de-identified and tokenized by matching the tokens in each data set against each other. In this way, users can connect individuals across healthcare data sets without ever seeing or using PHI or PII.

“In accordance with example embodiments of the present invention, a method for de-identifying unstructured data within data sets is provided. The method includes initializing a blacklist dictionary and a whitelist dictionary, modifying the blacklist dictionary by removing terms included within the whitelist dictionary to create an adjusted blacklist dictionary, and augmenting the adjusted blacklist dictionary with a record-specific blacklist for each individual record within the data sets. The method also includes scrubbing personally identifiable information (PII) and protected health information (PHI) from each individual record utilizing the record-specific adjusted blacklist dictionary. The scrubbing includes removing all elements within each individual record determined to be PII or PHI according to terms in the record-specific adjusted blacklist dictionary, replacing removed elements with a case-type tag identifying a type of information being removed according to the record-specific adjusted blacklist dictionary, and repeating the removing and replacing step for each individual record within the data sets.

“In accordance with aspects of the present invention, the method can further include tokenizing and merging each individual record in the data sets. The blacklist dictionary can include standard terms and standard number formats to be removed from records within the data sets. The standard number formats can include social security numbers, telephone numbers, URLs, zip codes, email addresses, IP addresses, dates, patient IDs, record numbers, and insurance IDs. The standard formats can include cities, counties, first names, last names, prefixes, and medical terms. The whitelist dictionary can include terms allowed to remain in the data sets despite being included in the blacklist dictionary. The record-specific adjusted blacklist dictionary can include terms created from the PII or PHI present in specific individual records within the data sets.

“In accordance with aspects of the present invention, the method can further include tuning terms in the whitelist dictionary and the record-specific adjusted blacklist dictionary to adjust a level of de-identification of records within the data sets. The whitelist dictionary and the record-specific adjusted blacklist dictionary can include a tunable list of names, birth dates, phone numbers, addresses and other forms of PII and PHI. Augmentation can include adjusting the adjusted blacklist dictionary to include known PII and PHI terms for an individual associated with an individual record, according to the record-specific adjusted blacklist dictionary, that should be removed.”

The claims supplied by the inventors are:

“1. A method for de-identifying unstructured data within data sets, the method comprising: initializing, using a computing device comprising a processor and memory, a blacklist dictionary and a whitelist dictionary; modifying, using the processor and a register module, the blacklist dictionary by removing terms included within the whitelist dictionary to create an adjusted blacklist dictionary; augmenting, using the processor, the adjusted blacklist dictionary with a record-specific blacklist for each individual record within the data sets; scrubbing, using the processor and a de-identification engine, personally identifiable information (PII) and protected health information (PHI) from each individual record utilizing the record-specific adjusted blacklist dictionary, the scrubbing comprising: removing all elements within each individual record determined to be PII or PHI according to terms in the record-specific adjusted blacklist dictionary; replacing removed elements with a case-type tag identifying a type of information being removed according to the record-specific adjusted blacklist dictionary; and repeating, using the de-identification engine, the scrubbing personally identifiable information (PII) and protected health information (PHI) comprising removing steps and replacing steps for each individual record within the data sets.

“2. The method of claim 1, further comprising tokenizing and merging, using a merging module, each individual record in the data sets.

“3. The method of claim 1, wherein the blacklist dictionary comprises standard terms and standard number formats to be removed from records within the data sets.

“4. The method of claim 3, wherein the standard number formats comprise social security numbers, telephone numbers, URLs, zip codes, email addresses, IP addresses, dates, patient IDs, record numbers, and insurance IDs.

“5. The method of claim 3, wherein the standard terms comprise cities, counties, first names, last names, prefixes, and medical terms.

“6. The method of claim 1, wherein the whitelist dictionary comprises terms selected to remain in the data sets despite being included in the blacklist dictionary.

“7. The method of claim 1, wherein the record-specific adjusted blacklist dictionary comprises terms created from the PII or PHI present in specific individual records within the data sets.

“8. The method of claim 1, further comprising tuning terms in the whitelist dictionary and the record-specific adjusted blacklist dictionary to adjust a level of de-identification of records within the data sets.

“9. The method of claim 8, wherein the whitelist dictionary and the record-specific adjusted blacklist dictionary include a tunable list of names, birth dates, phone numbers, addresses and other forms of PII and PHI present in data store records or within the data sets.

“10. The method of claim 8, wherein augmenting comprises adjusting the adjusted blacklist dictionary to include known PII and PHI terms, for an individual associated with an individual record, according to the record-specific adjusted blacklist dictionary, that are designated to be removed.

“11. A system for de-identifying unstructured data within data sets, comprising: memory and a processor configured for accessing the data sets from data sources and parsing data within the data sets and identifying elements of unstructured data within the data sets and de-identifying unstructured data using a de-identification engine initializing a blacklist dictionary and a whitelist dictionary, and comprising: a register module configured to modify the blacklist dictionary by removing terms included within the whitelist dictionary to generate an adjusted blacklist dictionary, and augmenting the adjusted blacklist dictionary with a record-specific blacklist for each individual record within the data sets; a de-identification module configured to determine which unstructured elements are to be removed from each individual record of a data set by scrubbing personally identifiable information (PII) and protected health information (PHI) from each individual record utilizing the record-specific adjusted blacklist dictionary, the scrubbing comprising: removing all elements within each individual record determined to be PII or PHI according to terms in the record-specific adjusted blacklist dictionary; and replacing removed elements with a case-type tag identifying a type of information being removed according to the record-specific adjusted blacklist dictionary; wherein the de-identification module repeats the scrubbing personally identifiable information (PII) and protected health information (PHI) for each individual record within the data sets; and one or more user devices configured to receive input from one or more users by an input-output interface and communicate with the processor and the de-identification engine over a telecommunication network, providing functionality for the register module, de-identification module, and merging module that share a secured network connection.

“12. The system of claim 11, further comprising a merging module configured to tokenize and merge each individual record in the data sets by identifying a same token in both data sets and join together matched to individual records in healthcare data sets without exposure of PHI or PII.

“13. The system of claim 11, wherein the blacklist dictionary comprises standard terms and standard number formats to be removed from records within the data sets.

“14. The system of claim 13, wherein the standard number formats comprise social security numbers, telephone numbers, URLs, zip codes, email addresses, IP addresses, dates, patient IDs, record numbers, and insurance IDs.

“15. The system of claim 13, wherein the standard terms comprise cities, counties, first names, last names, prefixes, and medical terms.

“16. The system of claim 11, wherein the whitelist dictionary comprises terms selected to remain in the data sets despite being included in the blacklist dictionary.

“17. The system of claim 11, wherein the record-specific adjusted blacklist dictionary comprises terms created from the PII or PHI present in specific individual records within the data sets.

“18. The system of claim 11, further comprising tuning terms in the whitelist dictionary and the record-specific adjusted blacklist dictionary to adjust a level of de-identification of records within the data sets.

“19. The system of claim 18, wherein the whitelist dictionary and the record-specific adjusted blacklist dictionary include a tunable list of names, birth dates, phone numbers, addresses and other forms of PII and PHI present in data source records or within the data sets.

“20. The system of claim 18, wherein augmenting comprises adjusting the adjusted blacklist dictionary to include known PII and PHI terms for an individual associated with an individual record, according to the record-specific adjusted blacklist dictionary, that are designated to be removed.”

URL and more information on this patent application, see: Austin, Joseph; Kassam-Adams, Shahir; LaBonte, Jason A.; Bayless, Paul J. Self-Contained System For De-Identifying Unstructured Data In Healthcare Records. Filed January 23, 2019 and posted August 1, 2019. Patent URL: http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220190236310%22.PGNR.&OS=DN/20190236310&RS=DN/20190236310

(Our reports deliver fact-based news of research and discoveries from around the world.)

Older

OPINION: Letter to the Editor: Legalization of marijuana should concern us all

Newer

NFIP Direct Services

Advisor News

  • What advisors need to know about the life settlement boom
  • Report: Many Americans paying up to 45% of annual income on auto loans
  • Latest state budget raises taxes on Californians, ignores voter priorities
  • What advisors and clients must know about Roth conversions
  • Worker retirement confidence dips to lowest level in a decade
More Advisor News

Annuity News

  • Globe Life Inc. (NYSE: GL) Making Surprising Moves in Tuesday Session
  • Why annuities are gaining traction with younger investors
  • Best’s Special Report: U.S. Life/Annuity Industry Sees Bottom-Line Growth Despite 18% Decline in Total Income in First-Quarter 2026
  • Globe Life Inc. (NYSE: GL) Records 52-Week High Thursday Morning
  • Fortitude Re Completes $500 Million FABN Issuance
More Annuity News

Health/Employee Benefits News

  • Largest Medicaid pediatric provider sues DeSantis administration over pay rates
  • Research Conducted at University Medical Center Munster Has Updated Our Knowledge about Transgender Health (Longitudinal Trends of Health Service Utilization for Gender Dysphoria In Germany Between 2010 and 2021 Based On Health Insurance Data): Health and Medicine – Transgender Health
  • Karnes County renews employee health insurance benefits
  • Fresno’s Community Health System and Blue Shield end stalemate, reach new agreement
  • Goliad council delays engineering decisions, approves employee health plan renewal
More Health/Employee Benefits News

Life Insurance News

  • Roberts Disability Law Sues Unum Life Insurance Company of America on Behalf of Disabled Valero Refinery Operator for Allegedly Underpaying Long-Term Disability Benefits
  • Avoid the ‘summertime slump:’ Strategies to remain productive
  • Globe Life Inc. (NYSE: GL) Making Surprising Moves in Tuesday Session
  • Symetra Partners with PlanSource to Streamline Workforce Benefits Administration
  • Royal Neighbors of America achieves record growth
More Life Insurance News

NEWS INSIDE

  • Companies
  • Earnings
  • Economic News
  • INN Magazine
  • Insurtech News
  • Newswires Feed
  • Regulation News
  • Washington Wire
  • Videos

FEATURED OFFERS

Maximize Your FIA Case Results
Learn a repeatable process to review, reposition, and present FIA opportunities with confidence.

Aim higher during Annuity Awareness Month
Raise the bar with our diverse portfolio of Ascend annuities, backed by superior financial strength

You Could Be Losing Up to 20% of Your Commissions
GreenWave helps you find, fix, and prevent commission errors.

True Independence Means Having Choices
Cambridge offers flexibility, stability, proven tools—no private equity strings attached.

Life moves fast. Your BGA should, too.
Stay ahead with Modern Life's AI-powered tech and expert support.

Looking for stronger rates, amplified growth & real results?
Sentinel's Accumulation Protector Plus℠ Annuity is for clients wanting more from retirement planning

Press Releases

  • Prosperity Life GroupSM Launches Prosperity PathWaySM Series, Bringing Greater Choice and Flexibility to Retirement Income Planning
  • Senior Market Sales® Fortifies Annuity Reach With Acquisition of Retirement Planning Firm Stratton & Company
  • RFP #T01625
  • Rockwood Programs Appoints Kerry Ladouceur as Vice President, Financial Lines
  • JP Insurance Group Launches Commercial Property & Casualty Division; Appoints Joe Webster as Managing Director
More Press Releases > Add Your Press Release >

How to Write For InsuranceNewsNet

Find out how you can submit content for publishing on our website.
View Guidelines

Topics

  • Advisor News
  • Annuity Index
  • Annuity News
  • Companies
  • Earnings
  • Fiduciary
  • From the Field: Expert Insights
  • Health/Employee Benefits
  • Insurance & Financial Fraud
  • INN Magazine
  • Insiders Only
  • Life Insurance News
  • Newswires
  • Property and Casualty
  • Regulation News
  • Sponsored Articles
  • Washington Wire
  • Videos
  • ———
  • About
  • Meet our Editorial Staff
  • Advertise
  • Contact
  • Newsletters

Top Sections

  • AdvisorNews
  • Annuity News
  • Health/Employee Benefits News
  • InsuranceNewsNet Magazine
  • Life Insurance News
  • Property and Casualty News
  • Washington Wire

Our Company

  • About
  • Advertise
  • Contact
  • Meet our Editorial Staff
  • Magazine Subscription
  • Write for INN

Sign up for our FREE e-Newsletter!

Get breaking news, exclusive stories, and money- making insights straight into your inbox.

select Newsletter Options
Facebook Linkedin Twitter
© 2026 InsuranceNewsNet.com, Inc. All rights reserved.
  • Terms & Conditions
  • Privacy Policy
  • InsuranceNewsNet Magazine

Sign in with your Insider Pro Account

Not registered? Become an Insider Pro.
Insurance News | InsuranceNewsNet