Patent Issued for Method of controlling for undesired factors in machine learning models (USPTO 12014426): State Farm Mutual Automobile Insurance Company
2024 JUL 04 (NewsRx) -- By a
The assignee for this patent, patent number 12014426, is
Reporters obtained the following quote from the background information supplied by the inventors: “Machine learning models may be trained to analyze information for particular purposes involving identifying correlations and making predictions. During training, the models may learn to include illegitimate, non-useful, irrelevant, misleading, or otherwise undesired factors, especially if such biases are present in the training data sets. In particular, while training with structured data involves limiting the data that a model considers, training with unstructured data allows the model to consider all available data, including background information and other undesired factors. For example, a neural network trained with unstructured data including people’s appearances to make correlations and predictions about those people may consider such undesired factors as age, sex, ethnicity, and/or race in its subsequent analyses.”
In addition to obtaining background information on this patent, NewsRx editors also obtained the inventors’ summary information for this patent: “Embodiments of the present technology relate to machine learning models that control for consideration of one or more undesired factors which might otherwise be considered by the machine learning model when analyzing new data. For example, one embodiment of the present invention may be configured for training and using a neural network that controls for consideration of one or more undesired factors which might otherwise be considered by the neural network when analyzing new data as part of an underwriting process to determine an appropriate insurance premium.
“In a first aspect, a method of training and using a machine learning model that controls for consideration of one or more undesired factors which might otherwise be considered by the machine learning model may broadly comprise the following. The machine learning model may be trained using a training data set that contains information including the undesired factors. The undesired factors and one or more relevant interaction terms between the undesired factors may be identified. The machine learning model may then be caused to not consider the identified undesired factors when analyzing the new data to control for undesired prejudice or discrimination in machine learning models.
“In a second aspect, a computer-implemented method for training and using a machine learning model to evaluate an insurance applicant as part of an underwriting process to determine an appropriate insurance premium, wherein the machine learning model controls for consideration of one or more undesired factors which might otherwise be considered by the machine learning model, may broadly comprise the following. The machine learning model may be trained to probabilistically correlate an aspect of appearance with a personal and/or health-related characteristic by providing machine learning model with a training data set of images of individuals having known personal or health-related characteristics, including the undesired factors. The undesired factors and one or more relevant interaction terms between the undesired factors may be identified. An image of the insurance applicant may be received via a communication element. The machine learning model may analyze the image of the insurance applicant to probabilistically determine the personal and/or health-related characteristics for the insurance applicant, wherein such analysis excludes the identified undesired factors. The machine learning model may then suggest the appropriate insurance premium based at least in part on the probabilistically determined personal and/or health-related characteristic but not on the undesired factors.
“Various implementations of these aspects may include any one or more of the following additional features. Identifying the undesired factors and relevant interaction terms may include training a second machine learning model using a second training data set that contains only the undesired factors and the relevant interaction terms. Further, causing the machine learning model to not consider the identified undesired factors when analyzing the new data may include combining the machine learning model and the second machine learning model to eliminate a bias created by the undesired factors from the machine learning model’s consideration prior to employing the machine learning model to analyze the new data. Alternatively or additionally, identifying the undesired factors and relevant interaction terms may include training the machine learning model to identify the undesired factors and the one or more relevant interaction terms. Further, causing the machine learning model to not consider the identified undesired factors when analyzing the new data may include instructing the machine learning model to not consider the identified undesired factors while analyzing the new data. The machine learning model may be a neural network. The second machine learning model may be a linear model. The machine learning model may be trained to analyze the new data as part of an underwriting process to determine an appropriate insurance premium, and the new data may include images of a person applying for life insurance or health insurance or images of a piece of property for which a person is applying for property insurance. The machine learning model may be further trained to analyze the new data as part of the underwriting process to determine one or more appropriate terms of coverage.
“Advantages of these and other embodiments will become more apparent to those skilled in the art from the following description of the exemplary embodiments which have been shown and described by way of illustration. As will be realized, the present embodiments described herein may be capable of other and different embodiments, and their details are capable of modification in various respects. Accordingly, the drawings and description are to be regarded as illustrative in nature and not as restrictive.”
The claims supplied by the inventors are:
“1. A computer-implemented method for training and using a machine learning model comprising, via one or more processors: receiving an unstructured training data set including one or both of images or audio of a plurality of individuals, including individuals having a range of different ages, genders, ethnicities, and races; training the machine learning model using the unstructured training data set to produce a first trained machine learning model that contains at least one undesired factor; identifying one or more of the at least one undesired factor, including factors relating to one or more of age, gender, ethnicity, or race, contained in the first trained machine learning model; training the first trained machine learning model based upon the identified one or more undesired factors to produce a second trained machine learning model trained to identify the identified one or more undesired factors; and wherein the second trained machine learning model is usable to analyze one or both of images or audio of an insurance applicant for underwriting while excluding the identified one or more undesired factors.
“2. The computer-implemented method of claim 1, wherein: identifying the one or more undesired factors comprises identifying one or more relevant interaction terms between the at least one undesired factor; and training the first trained machine learning model comprises training the first trained machine learning model based upon the identified one or more undesired factors and the identified one or more relevant interaction terms.
“3. The computer-implemented method of claim 1, wherein identifying the one or more undesired factors comprises identifying factors relating to age, gender, ethnicity, and race.
“4. The computer-implemented method of claim 1, further comprising: receiving one or both of the images or the audio of the insurance applicant; and analyzing one or both of the images or the audio of the insurance applicant based upon the second trained machine learning model to produce underwriting information free from the identified one or more undesired factors.
“5. The computer-implemented method of claim 4, wherein analyzing one or both of the images or the audio comprises identifying one or both of personal or health-related characteristics of the insurance applicant free from the identified one or more undesired factors.
“6. The computer-implemented method of claim 4, further comprising generating an insurance policy quote, including one or both of a premium or terms of coverage, based upon the underwriting information.
“7. The computer-implemented method of claim 6, wherein generating the insurance policy quote includes generating a life insurance policy quote, a health insurance policy quote, or a property insurance policy quote.
“8. The computer-implemented method of claim 1, wherein identifying the one or more undesired factors comprises identifying factors relating to age, gender, ethnicity, and race.
“9. The computer-implemented method of claim 1, further comprising: receiving one or both of the images or the audio of the insurance applicant; and analyzing one or both of the images or the audio of the insurance applicant based upon the second trained machine learning model to produce underwriting information free from the identified one or more undesired factors.
“10. The computer-implemented method of claim 9, wherein analyzing one or both of the images or the audio comprises identifying one or both of personal or health-related characteristics of the insurance applicant free from the identified one or more undesired factors.
“11. A computer system configured to train a machine learning model, comprising one or more processors configured to: receive an unstructured training data set including one or both of images or audio of a plurality of individuals, including individuals having a range of different ages, genders, ethnicities, and races; train the machine learning model using the unstructured training data set to produce a first trained machine learning model that contains at least one undesired factor; identify one or more of the at least one undesired factor, including factors relating to one or more of age, gender, ethnicity, or race, contained in the first trained machine learning model; train the first trained machine learning model based upon the identified one or more undesired factors to produce a second trained machine learning model trained to identify the identified one or more undesired factors; and wherein the second trained machine learning model is usable to analyze one or both of images or audio of an insurance applicant for underwriting while excluding the identified one or more undesired factors.
“12. The computer system of claim 11, wherein the one or more processors are configured to: identify the one or more undesired factors by identifying one or more relevant interaction terms between the at least one undesired factor; and train the first trained machine learning model by training the first trained machine learning model based upon the identified one or more undesired factors and the identified one or more relevant interaction terms.
“13. The computer system of claim 11, wherein the one or more processors are configured to identify the one or more undesired factors by identifying factors relating to age, gender, ethnicity, and race.
“14. The computer system of claim 11, wherein the one or more processors are configured to: receive one or both of the images or the audio of the insurance applicant; and analyze one or both of the images or the audio of the insurance applicant based upon the second trained machine learning model to produce underwriting information free from the identified one or more undesired factors.
“15. The computer system of claim 14, wherein the one or more processors are configured to: identify one or both of personal or health-related characteristics of the insurance applicant free from the identified one or more undesired factors; and generate an insurance policy quote, including one or both of a premium or terms of coverage, based upon the underwriting information.
“16. A non-transitory computer-readable medium storing instructions that, when executed by one or more processors of a computing device, cause the computing device to: receive an unstructured training data set including one or both of images or audio of a plurality of individuals, including individuals having a range of different ages, genders, ethnicities, and races; train a machine learning model using the unstructured training data set to produce a first trained machine learning model that contains at least one undesired factor; identify one or more of the at least one undesired factor, including factors relating to one or more of age, gender, ethnicity, or race, contained in the first trained machine learning model; train the first trained machine learning model based upon the identified one or more undesired factors to produce a second trained machine learning model trained to identify the identified one or more undesired factors; and wherein the second trained machine learning model is usable to analyze one or both of images or audio of an insurance applicant for underwriting while excluding the identified one or more undesired factors.
“17. The non-transitory computer-readable medium of claim 16, wherein the instructions, when executed by the one or more processors, cause the computing device to: identify the one or more undesired factors by identifying one or more relevant interaction terms between the at least one undesired factor; and train the first trained machine learning model by training the first trained machine learning model based upon the identified one or more undesired factors and the identified one or more relevant interaction terms.
“18. The non-transitory computer-readable medium of claim 16, wherein the instructions, when executed by the one or more processors, cause the computing device to identify the one or more undesired factors by identifying factors relating to age, gender, ethnicity, and race.
“19. The non-transitory computer-readable medium of claim 16, wherein the instructions, when executed by the one or more processors, cause the computing device to: receive one or both of the images or the audio of the insurance applicant; and analyze one or both of the images or the audio of the insurance applicant based upon the second trained machine learning model to produce underwriting information free from the identified one or more undesired factors.
“20. The non-transitory computer-readable medium of claim 19, wherein the instructions, when executed by the one or more processors, cause the computing device to: identify one or both of personal or health-related characteristics of the insurance applicant free from the identified one or more undesired factors; and generate an insurance policy quote, including one or both of a premium or terms of coverage, based upon the underwriting information.”
For more information, see this patent: Bernico,
(Our reports deliver fact-based news of research and discoveries from around the world.)
New Economics Study Results Reported from Federal Reserve Bank (Signalling With Private Monitoring): Economics
Keyes Coverage Further Expands Footprint with Strategic Acquisitions: Keystone Agency Partners
Advisor News
Annuity News
Health/Employee Benefits News
Life Insurance News