Patent Issued for Systems and methods for de-identifying medical and healthcare data (USPTO 11270027): Medicom Technologies Inc.

2022 MAR 28 (NewsRx) -- By a News Reporter-Staff News Editor at Insurance Daily News -- From Alexandria, Virginia, NewsRx journalists report that a patent by the inventors Benitz, Malcolm (Raleigh, NC, US), Goldstein, Brent (Raleigh, NC, US), Rosenberg, Michael (Raleigh, NC, US), Suttles, Jason (Raleigh, NC, US), Woodlief, Chris (Raleigh, NC, US), filed on September 23, 2020, was published online on March 8, 2022.

The patent’s assignee for patent number 11270027 is Medicom Technologies Inc. (Raleigh, North Carolina, United States).

News editors obtained the following quote from the background information supplied by the inventors:

“Field of the Invention

“The present invention relates, in general, to systems and methods for protecting patient privacy when health care and medical information is shared between various entities and, in particular, to systems and methods that implement a multi-stage sanitizing routine for de-identifying protected health information (PHI) from medical and healthcare data, such as, for example, medical reports and diagnostic images, in order to ensure patient privacy, while preserving the ability for sanitized medical reports and diagnostic images to be re-identified.

“Description of Related Art

“The ease with which electronic data can be transmitted, together with the increasing use of health care, medical, and patient information (collectively, “medical information”) for research purposes, has raised concerns about patient confidentiality and institutional liability, as well as concerns surrounding the protection of patient privacy when such medical information is transmitted between various entities, such as, for example, a medical provider and a research institution.

“To maintain patient privacy in the context of research and various third-party uses, it must be ensured that any medical information used in aggregate is not associated with any specific patient or individual, and that only authorized entities based on a patient’s informed consent have access to such medical data.

“Such patient privacy can be maintained by disclosing only specific portions of the medical information through de-identification processes, where portions of the medical information that may be classified as personally identifiable information (PII). PII can be any data that could potentially identify a specific patient or individual. Sensitive PII is information which, when disclosed, could result in harm to an individual whose privacy has been breached. Sensitive PII can include biometric information, medical information, personally identifiable financial information, and unique identifiers such as passport or Social Security numbers. PHI and PII is typically removed, deleted, masked, or replaced with non-identifiable information through such conventional de-identification processes.

“In the United States, standards such as Health Insurance Portability and Accountability Act (HIPAA) have resulted in federal regulations that place strict requirements on the archiving and disclosure of medical information. For example, in accordance with HIPAA, federal regulations have been enacted that require healthcare organizations, physicians, and entities having access to such medical information to ensure the protection, privacy and security of the patient information, which can include PHI and PII. In particular, the “Privacy Rule” of HIPAA provides federal privacy regulations that set forth requirements for confidentiality and privacy policies and procedures, consents, authorizations and notices, which must be adopted in order to maintain, use, or disclose PHI and PII in the course of a patient’s treatment, as well as other business functions or other activities.

“The HIPAA Privacy Rule allows for entities to de-identify PHI for certain purposes so that medical information may be used and disclosed freely, without being subject to the protections afforded by the Privacy Rule. The term “de-identified data” as used by HIPAA refers to medical information from which all information, data and tags that could reasonably be used to identify the patient has been removed (such as, for example, their name, address, social security number, date of birth, contact information, and the like).

“Conventional methods for de-identifying medical data include simply stripping all information considered to be PHI or PII from a medical record that can be used to determine the identity of a patient, or replacing such information with something else (such as, for example, replacing the actual patient name with the string “name”). Although the medical records are de-identified with such conventional methods, there remains no mechanism by which PHI or PII can be recovered for re-identification purposes, if required.

“In addition, various methods of de-identification generally of documents and metadata fields include built-in code to remove portions marked for de-identification, or utilize template-based approaches to redact information from documents. Methods of de-identification have been used for text documents, structured metadata fields such as in Digital Imaging and Communications in Medicine (DICOM) metadata, but de-identification of visual media data when the medical information is burned into, or embedded inside the media, can be difficult and time consuming.

“Therefore, there is a need for a reliable system and method to ensure complete de-sanitization of both diagnostic images and associated medical reports containing text and burned in medical information, whereby the sanitized PHI and PII can be recovered for re-identification purposes.”

As a supplement to the background information on this patent, NewsRx correspondents also obtained the inventors’ summary information for this patent: “In one embodiment, the invention relates to a method for de-identifying medical data, comprising: receiving, at a server, a selection of at least one medical record to be de-identified of patient information, wherein the medical record includes a diagnostic image and a medical report; determining, by the server, a modality associated with the diagnostic image; retrieving, by the server, a de-identification profile for the modality, wherein the de-identification profile specifies at least one area of the diagnostic image that contains patient information; applying, by a sanitizing engine coupled to the server, a blackout zone over the area of the diagnostic image specified in the de-identification profile, wherein the blackout zone prevents the patient information in the area from being visible; performing, by the sanitizing engine, an optical character recognition operation in the area after the blackout zone has been applied; determining, by the sanitizing engine, if any characters are detected in the area after the blackout zone has been applied; detecting, by the sanitizing engine, a boundary for a region of interest of the diagnostic image if no characters are detected in the area after the blackout zone has been applied; detecting, by the sanitizing engine, if non-black pixels are present outside of the boundary for the region of interest; and performing a first operation by the sanitizing engine to convert any non-black pixels detected outside of the boundary for the region of interest to black pixels, or performing a second operation by the sanitizing engine to encapsulate the diagnostic image into a DICOM format if non-black pixels are not detected outside of the boundary for the region of interest.

“In another embodiment, the invention relates to a method for de-identifying medical data, comprising: receiving, at a server, a selection of at least one medical record to be de-identified of patient information, wherein the medical record includes a diagnostic image and a medical report; applying, by a sanitizing engine coupled to the server, a sanitizing process in an area of the diagnostic image determined by a pre-stored de-identification profile, wherein the de-identification profile specifies the area of the diagnostic image containing patient information, and wherein the sanitizing process prevents the patient information in the area from being visible; detecting, by the sanitizing engine, if any characters are present in the area after the sanitizing process has been applied; detecting, by the sanitizing engine, a gradient boundary for a region of interest of the diagnostic image if no characters are detected in the area after the sanitizing process has been applied; and converting, by the sanitizing engine, any non-black pixels detected outside of the gradient boundary to black pixels.

“In another embodiment, the invention relates to a system for de-identifying medical data, comprising: a database configured to store at least one medical record, wherein the medical record includes a diagnostic image and a medical report; a sanitizing engine communicatively coupled to the database, the sanitizing engine configured to import the medical record from the database, the sanitizing engine further configured to apply a blackout zone in an area of the diagnostic report to that contains patient information, the sanitizing engine further configured to detect if characters exist in the area that the blackout zone was applied, the sanitizing engine further configured to detect a gradient boundary for a region of interest on the diagnostic image, and the sanitizing engine further convert non-black pixels that exist outside of the gradient boundary to black pixels; the sanitizing engine further configured to encapsulate the diagnostic image into a DICOM file; and a server communicatively coupled to the database and the sanitizing engine, the server configured to transmit the DICOM file to a remote computing system.”

The claims supplied by the inventors are:

“1. A method for de-identifying medical data, comprising: retrieving, by a server, a medical image to be de-identified of patient information; determining, by the server, a modality associated with the medical image; retrieving, by the server, a de-identification profile for the modality, wherein the de-identification profile specifies at least one area of the medical image that contains patient information; applying, by a sanitizing engine coupled to the server, a blackout zone over the area of the medical image specified in the de-identification profile, wherein the patient information within the blackout zone is deleted by the sanitizing engine; performing, by the sanitizing engine, an optical character recognition operation in the area after the blackout zone has been applied; determining, by the sanitizing engine, if any characters are detected in the area after the blackout zone has been applied; detecting, by the sanitizing engine, a boundary for a diagnostic portion of the medical image if no characters are detected in the area after the blackout zone has been applied; detecting, by the sanitizing engine, if non-black pixels are present outside of the boundary for the diagnostic portion; and performing a first operation by the sanitizing engine to convert any non-black pixels detected outside of the boundary for the diagnostic portion to black pixels, or performing a second operation by the sanitizing engine to encapsulate the medical image into a DICOM format if non-black pixels are not detected outside of the boundary for the diagnostic portion.

“2. The method of claim 1, further comprising extracting, by the sanitizing engine, text from a medical report associated with the medical image.

“3. The method of claim 2, further comprising, determining, by the sanitizing engine, if the text contains any identifying information, and replacing the identifying information with randomly generated characters or a pre-determined character string.

“4. The method of claim 1, wherein the server utilizes machine learning to analyze sanitized medical images over time.

“5. The method of claim 1, wherein the blackout zone has a free-form shape.

“6. The method of claim 1, wherein the boundary for the diagnostic portion is detected by analysis of a directional change in the intensity or color of the region of interest.

“7. The method of claim 1, wherein the boundary for the diagnostic portion is a gradient boundary.

“8. The method of claim 1, further comprising, storing, by the server, header data from the medical image, where the header data is used to re-identify the medical image with patient information.

“9. A method for de-identifying medical data, comprising: retrieving, by a server, a medical image to be de-identified of patient information; applying, by a sanitizing engine coupled to the server, a sanitizing process in an area of the medical image determined by a previously generated de-identification profile, wherein the de-identification profile specifies the area of the medical image containing patient information, and wherein the sanitizing process deletes the patient information in the area; detecting, by the sanitizing engine, if any characters are present in the area after the sanitizing process has been applied; detecting, by the sanitizing engine, a gradient boundary for a diagnostic portion of the medical image if no characters are detected in the area after the sanitizing process has been applied; and converting, by the sanitizing engine, pixels of any characters detected outside of the gradient boundary for the diagnostic portion to an inverse color.

“10. The method of claim 9, wherein the sanitizing engine performs an optical character recognition operation to detect if any characters are present in the area after the sanitizing process has been applied.

“11. The method of claim 9, wherein the server utilizes machine learning to analyze sanitized medical images over time.

“12. The method of claim 9, wherein the sanitizing process is selected from a group consisting deleting, obfuscating, and cropping the patient information within the area.

“13. The method of claim 9, further comprising extracting, by the sanitizing engine, text from a medical report associated with the medical image.

“14. The method of claim 13, further comprising, replacing, by the sanitizing engine, any text that contains identifying information with randomly generated characters or a pre-determined character string.

“15. A system for de-identifying medical data, comprising: a database configured to store at least one medical record, wherein the medical record includes a medical image and a medical report; a sanitizing engine communicatively coupled to the database, the sanitizing engine configured to import the medical record from the database, the sanitizing engine further configured to apply a blackout zone in an area of the medical image that contains patient information, the sanitizing engine further configured to delete the patient information within the blackout zone, the sanitizing engine further configured to detect if characters exist in the area that the blackout zone was applied, the sanitizing engine further configured to detect a gradient boundary for a diagnostic portion on the medical image, and the sanitizing engine further configured to convert non-white pixels that exist outside of the gradient boundary for the diagnostic portion to white pixels, the sanitizing engine further configured to encapsulate the medical image into a DICOM file; and a server communicatively coupled to the database and the sanitizing engine, the server configured to transmit the DICOM file to a remote computing system.

“16. The system of claim 15, wherein the sanitizing engine utilizes an optical character recognition operation to detect if characters exist in the area that the blackout zone was applied to.

“17. The system of claim 15, wherein the database is configured to store DICOM header data, wherein the DICOM header data is associated with the medical image using a unique identifier generated by a hexadecimal salt hashing mechanism, wherein the DICOM header data is utilized to re-identify the medical image with patient information.

“18. The system of claim 15, wherein the sanitizing engine is further configured to replace any identifying information contained in the medical report with randomly generated characters or a pre-determined character string.

“19. The system of claim 18, wherein the randomly generated characters have no relation to the patient information being replaced.

“20. The system of claim 15, wherein the server utilizes machine learning to analyze sanitized medical images over time.”

For additional information on this patent, see: Benitz, Malcolm. Systems and methods for de-identifying medical and healthcare data. U.S. Patent Number 11270027, filed September 23, 2020, and published online on March 8, 2022. Patent URL: http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=11270027.PN.&OS=PN/11270027RS=PN/11270027

(Our reports deliver fact-based news of research and discoveries from around the world.)

Patent Issued for Systems and methods for de-identifying medical and healthcare data (USPTO 11270027): Medicom Technologies Inc.

Advisor News

Annuity News

Health/Employee Benefits News

Life Insurance News

Patent Issued for Systems and methods for de-identifying medical and healthcare data (USPTO 11270027): Medicom Technologies Inc.

Advisor News

Annuity News

Health/Employee Benefits News

Life Insurance News

Sign in with your Insider Pro Account