Patent Issued for De-identification of electronic records (USPTO 11461496): University of California

2022 OCT 20 (NewsRx) -- By a News Reporter-Staff News Editor at Insurance Daily News -- University of California (Oakland, California, United States) has been issued patent number 11461496, according to news reporting originating out of Alexandria, Virginia, by NewsRx editors.

The patent’s inventors are Butte, Atul (San Francisco, CA, US), Norgeot, Beau (Palo Alto, CA, US), Rutenberg, Eugenia (San Francisco, CA, US), Schenk, Gundolf (San Francisco, CA, US).

This patent was filed on June 9, 2020 and was published online on October 4, 2022.

From the background information supplied by the inventors, news correspondents obtained the following quote: “Electronic records may include data useful to researchers and publishers, but may also include personal information and/or other sensitive information. According to some government and/or industry regulations, such personal or sensitive information must be removed or obfuscated from the electronic records prior to use in research, publication, and/or dissemination. For example, the Health Insurance Portability and Accountability Act of 1996 (HIPAA) requires the removal and/or obfuscation of protected health information (PHI) found in electronic records including, for example, names, addresses, any elements of dates related to an individual, telephone numbers, fax numbers, email addresses, Social Security Numbers, medical record numbers, health plan beneficiary numbers, account numbers, certificate or license numbers, vehicle or other device serial numbers, Web Uniform Resource Locators (URLs), Internet Protocol (IP) addresses, finger or voice prints, photographic images, and/or the like.”

Supplementing the background information on this patent, NewsRx reporters also obtained the inventors’ summary information for this patent: “Systems, methods, and articles of manufacture, including computer program products, are provided for de-identifying electronic records. In some example embodiments, there is provided a system that includes at least one processor and at least one memory. The at least one memory may include program code that provides operations when executed by the at least one processor. The operations may include: tokenizing an electronic record to produce a plurality of tokens including a first token; determining, whether a protected health information is included in the electronic record by at least determining whether the first token is part of one of a first plurality of expressions, each of the first plurality of expressions known to include the protected health information, and in response to determining that the first token is not part of any one of the first plurality of expressions, determining, based on a blacklist of tokens known to comprise the protected health information, whether the first token comprises the protected health information; and in response to determining that the first token comprises the protected health information, generating a de-identified electronic record by at least replacing the first token with a second token obfuscating the protected health information.

“In some variations, one or more features disclosed herein including the following features can optionally be included in any feasible combination. In response to an incorrect identification of the protected health information, the first plurality of expressions may be updated by at least adding, to the first plurality of expressions, an expression including the first token and a third token adjacent to the first token in the electronic record.

“In some variations, in response to an incorrect identification of the protected health information, the blacklist of tokens may be updated by at least adding the first token to the black list of tokens or removing the first token from the blacklist of tokens.

“In some variations, in response to an incorrect identification of the protected health information, the blacklist may be applied before applying the first plurality of expressions.

“In some variations, determining whether the first token includes the protected health information may further include assigning a part-of-speech to the first token. In response to an incorrect identification of the protected health information, the part-of-speech assigned to the first token may be modified by at least modifying a first part-of-speech tagging algorithm applied to assign the part-of-speech to the first token and/or changing the first part-of-speech tagging algorithm to a second part-of-speech tagging algorithm.”

The claims supplied by the inventors are:

“1. A system, comprising: at least one data processor; and at least one memory storing instructions, which when executed by the at least one data processor, result in operations comprising: tokenizing an electronic record to produce a plurality of tokens including a first token; determining whether a protected health information is included in the electronic record by at least: determining whether the first token is part of one of a first plurality of expressions, each of the first plurality of expressions known to include the protected health information, and in response to determining that the first token is not part of any one of the first plurality of expressions, determining, based on a blacklist of tokens known to comprise the protected health information, whether the first token comprises the protected health information; in response to determining that the first token comprises the protected health information, generating a de-identified electronic record by at least replacing the first token with a second token obfuscating the protected health information; and responding to an incorrect identification of the protected health information by at least updating the first plurality of expressions, the first plurality of expressions being updated by at least adding, to the first plurality of expressions, an expression including the first token and a third token adjacent to the first token in the electronic record.

“2. The system of claim 1, further comprising: responding to the incorrect identification of the protected health information by at least updating the blacklist of tokens, the blacklist of tokens being updated by at least adding the first token to the black list of tokens or removing the first token from the blacklist of tokens.

“3. The system of claim 1, wherein determining whether the first token comprises the protected health information further comprises assigning a part-of-speech to the first token.

“4. The system of claim 3, further comprising: responding to the incorrect identification of the protected health information by at least modifying the part-of-speech assigned to the first token, the part-of-speech assigned to the first token being modified by at least modifying a first part-of-speech tagging algorithm applied to assign the part-of-speech to the first token and/or changing the first part-of-speech tagging algorithm to a second part-of-speech tagging algorithm.

“5. The system of claim 3, further comprising: in response to the first token being assigned a first part-of-speech, determining, based on the blacklist of tokens known to comprise the protected health information, whether the first token comprises the protected health information.

“6. The system of claim 5, further comprising: responding to the incorrect identification of the protected health information by at least applying the blacklist of tokens in response to the first token being assigned a second part-of-speech instead of the first part-of-speech.

“7. The system of claim 1, further comprising: in response to determining that the first token comprises neither the protected health information nor a non-protected health information, generating the de-identified electronic record by at least replacing the first token with the second token obfuscating the protected health information.

“8. The system of claim 1, further comprising: determining whether the first token comprises a non-protected health information by at least determining whether the first token is part of one of a second plurality of expressions, each of the second plurality of expressions known to exclude the protected health information.

“9. The system of claim 1, further comprising: determining whether the first token comprises the protected health information based at least on a notes map including one or more note-specific unsafe regular expressions, one or more note-specific blacklists, and/or one or more note-specific parts of speech.

“10. A computer-implemented method, comprising: tokenizing an electronic record to produce a plurality of tokens including a first token; determining whether a protected health information is included in the electronic record by at least determining whether the first token is part of one of a first plurality of expressions, each of the first plurality of expressions known to include the protected health information, and in response to determining that the first token is not part of any one of the first plurality of expressions, determining, based on a blacklist of tokens known to comprise the protected health information, whether the first token comprises the protected health information; in response to determining that the first token comprises the protected health information, generating a de-identified electronic record by at least replacing the first token with a second token obfuscating the protected health information; and responding to an incorrect identification of the protected health information by at least updating the first plurality of expressions, the first plurality of expressions being updated by at least adding, to the first plurality of expressions, an expression including the first token and a third token adjacent to the first token in the electronic record.

“11. The method of claim 10, further comprising: responding to the incorrect identification of the protected health information by at least updating the blacklist of tokens, the blacklist of tokens being updated by at least adding the first token to the black list of tokens or removing the first token from the blacklist of tokens.

“12. The method of claim 10, wherein determining whether the first token comprises the protected health information further comprises assigning a part-of-speech to the first token.

“13. The method of claim 12, further comprising: responding to the incorrect identification of the protected health information by at least modifying the part-of-speech assigned to the first token, the part-of-speech assigned to the first token being modified by at least modifying a first part-of-speech tagging algorithm applied to assign the part-of-speech to the first token and/or changing the first part-of-speech tagging algorithm to a second part-of-speech tagging algorithm.

“14. The method of claim 12, further comprising: in response to the first token being assigned a first part-of-speech, determining, based on the blacklist of tokens known to comprise the protected health information, whether the first token comprises the protected health information.

“15. The method of claim 14, further comprising: responding to the incorrect identification of the protected health information by at least applying the blacklist of tokens in response to the first token being assigned a second part-of-speech instead of the first part-of-speech.

“16. The method of claim 10, further comprising: in response to determining that the first token comprises neither the protected health information nor a non-protected health information, generating the de-identified electronic record by at least replacing the first token with the second token obfuscating the protected health information.

“17. The method of claim 10, further comprising: determining whether the first token comprises a non-protected health information by at least determining whether the first token is part of one of a second plurality of expressions, each of the second plurality of expressions known to exclude the protected health information.

“18. The method of claim 10, further comprising determining whether the first token comprises the protected health information based at least on a notes map including one or more note-specific unsafe regular expressions, one or more note-specific blacklists, and/or one or more note-specific parts of speech.”

For the URL and additional information on this patent, see: Butte, Atul. De-identification of electronic records. U.S. Patent Number 11461496, filed June 9, 2020, and published online on October 4, 2022. Patent URL: http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=11461496.PN.&OS=PN/11461496RS=PN/11461496

(Our reports deliver fact-based news of research and discoveries from around the world.)

Patent Issued for De-identification of electronic records (USPTO 11461496): University of California

Advisor News

Annuity News

Health/Employee Benefits News

Life Insurance News

Patent Issued for De-identification of electronic records (USPTO 11461496): University of California

Advisor News

Annuity News

Health/Employee Benefits News

Life Insurance News

Sign in with your Insider Pro Account