Patent Issued for Training machine learning algorithms with temporally variant personal data, and applications thereof (USPTO 11568302): Veda Data Solutions LLC
2023 FEB 20 (NewsRx) -- By a
The assignee for this patent, patent number 11568302, is
Reporters obtained the following quote from the background information supplied by the inventors:
“Field
“This field is generally related to processing information.
“Background
“As technology advances, an ever increasing amount of personal data is becoming digitized, and as a result, more and more personal data is becoming lawfully accessible. The increased accessibility of personal data has spawned new industries focused on lawfully mining personal data.
“A personal data record may include a number of properties. A data record representing an individual may include properties such as the name of the individual, his or her city, state, and ZIP code. In addition to demographic information, data records can include information about a person’s behavior. Data records from different sources may comprise different properties. Systems exist for collecting information describing characteristics or behavior of separate individuals. Collecting such personal information has many applications, including in national security, law enforcement, marketing, healthcare and insurance.
“In healthcare for example, a healthcare provider may have inconsistent personal information, such as address information, from a variety of data sources, including the national provider identifier registration,
“As records receive more updates from different sources, they also have a greater risk of inconsistency and errors associated with data entry. In these ways, data records all describing the same individual can be incongruous, inconsistent, and erroneous in their content. From these various sources, a single healthcare provider can have many addresses, perhaps as many as 200 addresses. The sources may disagree about what the right address is. Some healthcare providers have multiple correct addresses. For this reason, the fact that a provider may have a more recent address does not mean that older addresses are incorrect.
“Some health and dental insurance companies have staff tasked with manually calling healthcare providers in an effort to determine their correct address. However, this manual updating is expensive because a healthcare provider’s address information may change frequently. In addition to address information, similar issues are present with other demographic information relating to a healthcare provider, such as its phone number.
“In addition, fraudulent claims are enormous problems in healthcare. By some estimates, fraudulent claims may steal in excess of
“Data-directed algorithms, known as machine learning algorithms, are available to make predictions and conduct certain data analysis. Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed. Within the field of data analytics, machine learning is a method used to devise complex models and algorithms can be used for prediction and estimation.
“To develop such models, they first must be trained. Generally, the training involves inputting a set of parameters, called features, and known correct or incorrect values for the input features. After the model is trained, it may be applied to new features for which the appropriate solution is unknown. By applying the model in this way, the model predicts, or estimates, the solution for other cases that are unknown. These models may uncover hidden insights through learning from historical relationships and trends in the database. The quality of these machine learning models may depend on the quality and quantity of the underlying training data.
“Systems and methods are needed to improve identification and forecasting of the correct personal information, such as a healthcare provider’s demographic information and propensity for fraud, or a data source.”
In addition to obtaining background information on this patent, NewsRx editors also obtained the inventors’ summary information for this patent: “In an embodiment, a computer-implemented method trains a machine learning algorithm with temporally variant personal data. At a plurality of times, a data source is monitored to determine whether data relating to a person has updated. When data for the person has been updated, the updated data is stored in a database such that the database includes a running log specifying how the person’s data has changed over time. The person’s data includes values for a plurality of properties relating to the person. An indication is received that a value for the particular property in the person’s data was verified as accurate or inaccurate at a particular time. From the database based on the particular time, the person’s data is retrieved, including values for the plurality of properties that were up-to-date at the particular time. Using the retrieved data and the indication, a model is trained such that the model can predict whether another person’s value for the particular property is accurate. In this way, having the retrieved data be current to the particular time maintains the retrieved data’s significance in training the model.
“System and computer program product embodiments are also disclosed.
“Further embodiments, features, and advantages of the invention, as well as the structure and operation of the various embodiments, are described in detail below with reference to accompanying drawings.
“The drawing in which an element first appears is typically indicated by the leftmost digit or digits in the corresponding reference number. In the drawings, like reference numbers may indicate identical or functionally similar elements.”
The claims supplied by the inventors are:
“1. A computer-implemented method for training a machine learning algorithm with temporally variant data, comprising: (a) at a plurality of times, monitoring a data source to determine whether data relating to a first healthcare provider has updated; (b) when the data for the first healthcare provider has been updated, storing the updated data in a database such that the database includes a running log specifying how the first healthcare provider’s data has changed over time, wherein the first healthcare provider’s data includes values for a plurality of properties relating to the first healthcare provider; © receiving an indication that a value for a particular property in the first healthcare provider’s data was verified as accurate or inaccurate at a particular time; (d) retrieving, from the database based on the particular time, the first healthcare provider’s data, including the values for the plurality of properties, that were up-to-date at the particular time; (e) training a plurality of models using the retrieved data, each model utilizing a different type of machine learning algorithm; (f) evaluating accuracy of the plurality of models using available training data; and (g) selecting a model from the plurality of models determined based on the evaluated accuracy to predict whether a second healthcare provider’s value for the particular property is accurate, whereby having the retrieved data be current to the particular time maintains the retrieved data’s significance in training the model, and wherein the first healthcare provider and the second healthcare provider are not the same.
“2. The method of claim 1, further comprising: (h) determining, based on the first healthcare provider’s data retrieved in (d), a plurality of features, each of the plurality of features describing a fact about the first healthcare provider’s data retrieved in (d), wherein the training (e) comprises training the model using the determined features.
“3. The method of claim 2, wherein the determining (h) comprises determining the features based on which of the plurality of properties is the particular property.
“4. The method of claim 1, wherein the plurality of models comprises two or more of logistic regression, naive Bayes, elastic nets, neural networks, Bernoulli naive Bayes, multimodal naive Bayes, nearest neighbor classifiers, or support vector machines.
“5. The method of claim 1, further comprising: (h) applying the model to predict whether the second healthcare provider’s value in the plurality of properties is accurate.
“6. The method of claim 5, wherein the applying (h) comprises: (i) for respective values in a plurality of values for the particular property of the second healthcare provider, applying the model to the respective value to determine a score; and (ii) selecting at least one value from the plurality of values based on the respective scores determined in (i).
“7. The method of claim 6, wherein the monitoring (a) comprises monitoring a plurality of data sources to determine whether the data relating to the first healthcare provider has updated, and wherein the applying (h) further comprises: (iii) determining which of the plurality of data sources the at least one value selected in (ii) originated from; (iv) determining whether a client has permission to the data source determined in (iii); and (v) if the client lacks permission to the data source determined in (iii), filtering the at least one value from results before the results are presented to the client.
“8. The method of claim 1, wherein the first healthcare provider and the second healthcare provider data includes demographic information.
“9. The method of claim 1, wherein the first healthcare provider’s data includes an indication of whether the first healthcare provider has engaged in fraud.
“10. A non-transitory program storage device having instructions stored thereon that, when executed by at least one computing device, cause the at least one computing device to perform a method for training a machine learning algorithm with temporally variant data, the method comprising: (a) at a plurality of times, monitoring a data source to determine whether data relating to a first healthcare provider has updated; (b) when the data for the first healthcare provider has been updated, storing the updated data in a database such that the database includes a running log specifying how the first healthcare provider’s data has changed over time, wherein the first healthcare provider’s data includes values for a plurality of properties relating to the first healthcare provider; © receiving an indication that a value for a particular property in the first healthcare provider’s data was verified as accurate or inaccurate at a particular time; (d) retrieving, from the database based on the particular time, the first healthcare provider’s data, including the values for the plurality of properties, that were up-to-date at the particular time; (e) training a plurality of models using the retrieved data, each model utilizing a different type of machine learning algorithm; (f) evaluating accuracy of the plurality of models using available training data; and (g) selecting a model from the plurality of models determined based on the evaluated accuracy to predict whether a second healthcare provider’s value for the particular property is accurate, whereby having the retrieved data be current to the particular time maintains the retrieved data’s significance in training the model, and wherein the first healthcare provider and the second healthcare provider are not the same.
“11. The program storage device of claim 10, the method further comprising: (h) determining, based on the person’s data retrieved in (d), a plurality of features, each of the plurality of features describing a fact about the first healthcare provider’s data retrieved in (d), wherein the training (e) comprises training the model using the determined features.
“12. The program storage device of claim 11, wherein the determining (h) comprises determining the features based on and which of the plurality of properties is the particular property.
“13. The program storage device of claim 10, wherein the plurality of models comprises two or more of logistic regression, naive Bayes, elastic nets, neural networks, Bernoulli naive Bayes, multimodal naive Bayes, nearest neighbor classifiers, support vector machines.
“14. The program storage device of claim 10, the method further comprising: (h) applying the model to predict whether the second healthcare provider’s value in the plurality of properties is accurate.
“15. The program storage device of claim 14, wherein the applying (h) comprises: (i) for respective values in a plurality of values for the particular property of the second healthcare provider, applying the model to the respective value to determine a score; and (ii) selecting at least one value from the plurality of values based on the respective scores determined in (i).
“16. The program storage device of claim 15, wherein the monitoring (a) comprises monitoring a plurality of data sources to determine whether the data relating to the first healthcare provider has updated, and wherein the applying (h) further comprises: (iii) determining which of the plurality of data sources the at least one value selected in (ii) originated from; (iv) determining whether a client has permission to the data source determined in (iii); and (v) if the client lacks permission to the data source determined in (iii), filtering the at least one value from results before the results are presented to the client.
“17. The program storage device of claim 10, wherein the first healthcare provider and the second healthcare provider data includes demographic information.
“18. The program storage device of claim 10, wherein the first healthcare provider’s data includes an indication of whether the first healthcare provider has engaged in fraud.
“19. A system for training a machine learning algorithm with temporally variant data, comprising: a computing device; a database that includes a running log specifying how a first healthcare provider’s data has changed over time, wherein the first healthcare provider’s data includes values for a plurality of properties relating to the first healthcare provider; a data ingestion process implemented on the computing device and configured to: (i) at a plurality of times, monitor a data source to determine whether the data relating to the first healthcare provider has updated; and (ii) when the data for the first healthcare provider has been updated, storing the updated data in the database; an API monitor implemented on the computing device and configured to receive an indication that a value for a particular property in the first healthcare provider’s data was verified as accurate or inaccurate at a particular time; a querier implemented on the computing device and configured to retrieve, from the database based on the particular time, the first healthcare provider’s data, including the values for the plurality of properties, that were up-to-date at the particular time; and a trainer implemented on the computing device and configured to: train a plurality of models using the retrieved data, each model utilizing a different type of machine learning algorithm, evaluate accuracy of the plurality of models using available training data, and select a model from the plurality of models determined based on the evaluated accuracy to predict whether a second healthcare provider’s value for the particular property is accurate, whereby having the retrieved data be current to the particular time maintains the retrieved data’s significance in training the model, and wherein the first healthcare provider and the second healthcare provider are not the same.”
There are additional claims. Please visit full patent to read further.
For more information, see this patent: Lindner,
(Our reports deliver fact-based news of research and discoveries from around the world.)



Patent Issued for Computing system implementing morbidity prediction using a correlative health assertion library (USPTO 11568364): Hi.Q Inc.
Researchers Submit Patent Application, “Method And Apparatus For Visualizing Health Status Information By Using Health Space Model”, for Approval (USPTO 20230030787): Patent Application
Advisor News
- Retirement is increasingly defined by a secure income stream
- Addressing the ‘menopause tax:’ A guide for advisors with female clients
- Alternative investments in 401(k)s: What advisors must know
- The modern advisor: Merging income, insurance, and investments
- Financial shocks, caregiving gaps and inflation pressures persist
More Advisor NewsAnnuity News
- Ameritas settles with Navy vet in lawsuit over disputed annuity sale
- NAIC annuity guidance updates divide insurance and advisory groups
- Retirement is increasingly defined by a secure income stream
- Beyond the S&P 500: The case for RILA diversification
- Globe Life Inc. (NYSE: GL) Making Surprising Moves in Monday Session
More Annuity NewsHealth/Employee Benefits News
- Data from Massachusetts General Hospital Provide New Insights into Managed Care (Utilization by high-cost, high-need Medicaid patients receiving social worker care coordination): Managed Care
- Study Results from Johns Hopkins University Bloomberg School of Public Health Provide New Insights into Managed Care and Specialty Pharmacy (Medicaid access to Most Favored Nation through the Pfizer agreement: The unanswered issues): Drugs and Therapies – Managed Care and Specialty Pharmacy
- Gabriel Bosslet: Stewardship over profit — why Indiana must rethink the Medicaid middle
- SHOP SMART FOR HEALTH INSURANCE
- CMS announces moratorium on new Medicare hospice/home health enrollment
More Health/Employee Benefits NewsLife Insurance News
- U-Haul Holding Company Schedules Fourth Quarter Fiscal Year End 2026 Financial Results Release and Investor Webcast
- New Empathy and LIMRA Research: The Overlooked Opportunity to Engage the Next Generation After an Insurance Payout
- Symetra Names Jeff Sealey Vice President, Stop Loss Captives
- 3 ways AI can help close the gap for women’s insurance coverage
- Best’s Market Segment Report: AM Best Revises Outlook on Italy’s Life Insurance Segment to Stable From Negative
More Life Insurance News