Patent Issued for Anonymizing genetic datasets in a disparate computing environment (USPTO 11501880): Massachusetts Mutual Life Insurance Company
2022 DEC 06 (NewsRx) -- By a
The patent’s inventors are Ross, Gareth (
This patent was filed on
From the background information supplied by the inventors, news correspondents obtained the following quote: “Recent scientific improvements have made genome sequencing more accurate and easier to obtain. Scientists are now able to provide accurate data and insights as to genetic predispositions to different health conditions and forecast possible future health risks. Many institutions providing health-related services, however, are legally and technically prevented from using genetic data to determine a user’s eligibility to enroll or determine a price structure for said health-related services. Because human mind is incapable of decoupling having access to certain data and anonymizing said data, many institutions have attempted to anonymize genetic (and other health-derived data) associated with their customers using computers. As the processing power of computers allow for greater computer functionality and the Internet technology era allows for interconnectivity between computing systems, many institutions utilize computer infrastructures to maintain/store genetic data associated with their customers. However, since the implementation of these more sophisticated computer infrastructures, several shortcomings in these technologies have been identified and have created a new set of challenges.
“Existing and conventional methods, systems, and software solutions fail to provide fast and efficient anonymization due to a high volume of customer information existing on different networks and computing infrastructures. Managing such information on different platforms is difficult due to number, size, content, or relationships of the data associated with the customers. Therefore, there is a desire for a computing technology to address these challenges.”
Supplementing the background information on this patent, NewsRx reporters also obtained the inventors’ summary information for this patent: “For the aforementioned reasons, there is a need for a more efficient and faster system and method for processing large user datasets and generating anonymized datasets, which would allow institutions to anonymize genetically driven dataset a more efficient manner than possible with conventional computer data-driven analysis. There is a need for a network and computer-specific set of rules to produce efficient and accurate results when facing a high number of datasets.
“Disclosed herein are systems and methods capable of addressing the above-described technical shortcomings. In an embodiment, a method comprises receiving, by a server from a client computing device, a request to generate an anonymized pool dataset, wherein the request comprises a number of users, a selection of at least one genetic condition category, a threshold corresponding to a genetic attribute associated with each genetic condition category, and a percentage of users associated with each genetic condition category; querying, by the server, a second server to receive a set of datasets corresponding to a plurality of users stored onto a database associated with the second server, wherein each dataset comprises data associated with each user comprising at least each respective user’s genetic data; upon querying the second server, receiving, by the server, the set of datasets from the second server; generating, by the server, an anonymized set of datasets, wherein each dataset within the anonymized set of datasets corresponds to each user from the plurality of users, and wherein the anonymized set of datasets does not contain any identifiable information corresponding to any of the users; determining, by the server, which user within the anonymized set of datasets satisfies the threshold corresponding to the genetic attribute associated with each genetic condition category from the at least one genetic condition category; and generating, by the server, the anonymized pool dataset from the anonymized set of datasets, wherein the anonymized pool dataset comprises one or more users from users who satisfy the threshold, wherein a number of one or more users within the anonymized pool dataset corresponds to the number of users received from the client computing device, wherein each user from the one or more users is placed in the selected at least one genetic condition category that corresponds to each user’s genetic data received from the client computing device, and wherein the number of users within each genetic condition category corresponds to the percentage of users associated with each genetic condition category received from the client computing device.
“In another embodiment, a computer system comprises a first server in communication with a client computing device via a graphical user interface displayed on the client computing device, the graphical user interface generated by the first server; and a second server in communication only with the first server, wherein the first server is configured to: receive, from a client computing device, a request to generate an anonymized pool dataset, wherein the request comprises a number of users, a selection of at least one genetic condition category, a threshold corresponding to a genetic attribute associated with each genetic condition category, and a percentage of users associated with each genetic condition category; query a second server to receive a set of datasets corresponding to a plurality of users stored onto a database associated with the second server, wherein each dataset comprises data associated with each user comprising at least each respective user’s genetic data; upon querying the second server, receive the set of datasets from the second server; generate an anonymized set of datasets, wherein each dataset within the anonymized set of datasets corresponds to each user from the plurality of users, and wherein the anonymized set of datasets does not contain any identifiable information corresponding to any of the users; determine which user within the anonymized set of datasets satisfies the threshold corresponding to the genetic attribute associated with each category from the at least one genetic condition category; and generate the anonymized pool dataset from the anonymized set of datasets, wherein the anonymized pool dataset comprises one or more users from users who satisfy the threshold, wherein a number of one or more users within the anonymized pool dataset corresponds to the number of users received from the client computing device, wherein each user from the one or more users is placed in the selected at least one genetic condition category that corresponds to each user’s genetic data received from the client computing device, and wherein the number of users within each genetic condition category corresponds to the percentage of users associated with each genetic condition category received from the client computing device.
“It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are intended to provide further explanation of this disclosures and claims.”
The claims supplied by the inventors are:
“1. A method comprising: receiving, by a server from a client computing device, a request to generate an anonymized pool dataset, wherein the request comprises a number of a first set of users from which to generate the anonymized pool dataset, a selection of at least one genetic condition category, a threshold corresponding to a genetic attribute associated with each genetic condition category, and a percentage of the first set of users associated with each genetic condition category; querying, by the server, a second server to receive a set of datasets corresponding to the first set of users stored onto a database associated with the second server, wherein each dataset comprises data associated with each user comprising at least each respective user’s genetic data, wherein the client computing device is prevented from communicating with the second server; upon querying the second server, receiving, by the server, the set of datasets from the second server; generating, by the server, an anonymized set of datasets, wherein each dataset within the anonymized set of datasets corresponds to each user from the first set of users, and wherein the anonymized set of datasets does not contain any identifiable information corresponding to any of the first set of users; executing, by the server, a trained machine-learning model to identify each dataset within the anonymized set of datasets that satisfies the threshold corresponding to the genetic attribute associated with each genetic condition category from the at least one genetic condition category; generating, by the server, the anonymized pool dataset from the anonymized set of datasets, wherein the anonymized pool dataset comprises a second set of users from the first set of users who satisfy the threshold, wherein a number of the second set of users within the anonymized pool dataset corresponds to the number of the first set of users from which to construct the anonymized pool dataset received from the client computing device, wherein each user from the second set of users is placed in the selected at least one genetic condition category that corresponds to each user’s genetic data received from the client computing device, and wherein the number of the first set of users within each genetic condition category corresponds to the percentage of the first set of users associated with each genetic condition category received from the client computing device; and transmitting, by the server, the anonymized pool dataset to a third server configured to determine a mortality value associated with each user within the anonymized pool dataset and associated with the genetic attribute of each genetic condition category.
“2. The method of claim 1 further comprising, upon transmitting the anonymized set of datasets to the third server, receiving, by the server, a set of mortality values corresponding to each user within the anonymized pool dataset.
“3. The method of claim 2, further comprising identifying, by the server, one or more common identifiers within the anonymized pool dataset.
“4. The method of claim 2, wherein the second server and the third server do not communicate without an intermediary processor.
“5. The method of claim 1, wherein the genetic attribute is associated with at least one of BRCA1, BRCA2, and OCA2.
“6. The method of claim 2, wherein the request further comprises an attribute associated with the mortality value.
“7. The method of claim 6, wherein the anonymized pool dataset ranks each user of the anonymized pool dataset based on their respective mortality value.
“8. The method of claim 1, wherein the identifiable information to be removed from the anonymized set of datasets is received from the client computing device.
“9. The method of claim 1, wherein the genetic attribute comprises an attribute of a SNP of each user.
“10. A computer system comprising: a first server communicatively coupled with a client computing device via a graphical user interface displayed on the client computing device, the graphical user interface generated by the first server; and a second server communicatively coupled only with the first server, wherein the first server comprises a non-transitory storage device having machine-executable instructions embodied thereon, wherein the machine-executable instructions, when executed by the first server, cause the first server to: receive, from the client computing device, a request to generate an anonymized pool dataset, wherein the request comprises a number of a first set of users from which to generate the anonymized pool dataset, a selection of at least one genetic condition category, a threshold corresponding to a genetic attribute associated with each category, and a percentage of the first set of users associated with each genetic condition category; query the second server to receive a set of datasets corresponding to the first set of users stored onto a database associated with the second server, wherein each dataset comprises data associated with each user comprising at least each respective user’s genetic data, wherein the client computing device is prevented from communicating with the second server; upon querying the second server, receive the set of datasets from the second server; generate an anonymized set of datasets, wherein each dataset within the anonymized set of datasets corresponds to each user from the first set of users, and wherein the anonymized set of datasets does not contain any identifiable information corresponding to any of the first set of users; execute a trained machine-learning model to identify each dataset within the anonymized set of datasets that satisfies the threshold corresponding to the genetic attribute associated with each genetic condition category from the at least one genetic condition category; generate the anonymized pool dataset from the anonymized set of datasets, wherein the anonymized pool dataset comprises a second set of users from the first set of users who satisfy the threshold, wherein a number of the second set of users within the anonymized pool dataset corresponds to the number of the first set of users received from the client computing device, wherein each user from the second set of users is placed in the selected at least one genetic condition category that corresponds to each user’s genetic data received from the client computing device, and wherein the number of the first set of users within each genetic condition category corresponds to the percentage of the first set of users associated with each genetic condition category received from the client computing device; and transmit the anonymized pool dataset to a third server configured to determine a mortality value associated with each user within the anonymized pool dataset and associated with the genetic attribute of each category.
“11. The system of claim 10, wherein the first server is further configured to, upon transmitting the anonymized set of datasets to the third server, receive a set of mortality values corresponding to each user within the anonymized pool dataset.
“12. The system of claim 10, wherein the first server is further configured to: identify one or more common identifiers within the anonymized pool dataset.
“13. The system of claim 11, wherein the second server and the third server do not communicate without an intermediary processor.
“14. The system of claim 10, wherein the genetic attribute is associated with at least one of BRCA1, BRCA2, and OCA2.
“15. The system of claim 11, wherein the request further comprises an attribute associated with the mortality value.
“16. The system of claim 15, wherein the anonymized pool dataset ranks each user of the anonymized pool dataset based on their respective mortality value.
“17. The system of claim 10, wherein the identifiable information to be removed from the anonymized set of datasets is received from the client computing device.
“18. The system of claim 10, wherein the genetic attribute comprises an attribute of a SNP of each user.”
For the URL and additional information on this patent, see: Ross, Gareth. Anonymizing genetic datasets in a disparate computing environment.
(Our reports deliver fact-based news of research and discoveries from around the world.)
New Findings in Technology Described from Shanghai University of International Business and Economics (Optimal investment-reinsurance strategy with derivatives trading under the joint interests of an insurer and a reinsurer): Technology
Technological University Dublin Reports Findings in Managed Care (Toward Universal Eye Health Coverage-Key Outcomes of the World Health Organization Package of Eye Care Interventions: A Systematic Review): Managed Care
Advisor News
Annuity News
Health/Employee Benefits News
Life Insurance News