Newswires

December 21, 2020 Newswires

Researchers Submit Patent Application, “Systems And Methods For Machine Learning Of Voice Attributes”, for Approval (USPTO 20200380957)

Insurance Daily News

2020 DEC 21 (NewsRx) -- By a News Reporter-Staff News Editor at Insurance Daily News -- From Washington, D.C., NewsRx journalists report that a patent application by the inventors Edwards, Erik (Oakland, CA); De Zilwa, Shane (Oakland, CA); Irwin, Nicholas (Hallandale Beach, FL); Poorjam, Amir (Copenhagen, DK); Avila, Flavio (Oakland, CA); Lew, Keith L. (Larchmont, NY); Sirota, Christopher (Brooklyn, NY), filed on June 1, 2020, was made available online on December 3, 2020.

The patent’s assignee is Insurance Services Office Inc. (Jersey City, New Jersey, United States).

News editors obtained the following quote from the background information supplied by the inventors: “Technical Field

“The present disclosure relates generally to the field of machine learning technology. More specifically, the present disclosure relates to systems and methods for machine learning of voice attributes.

“Related Art

“In the machine learning space, there is significant interest in developing computer-based machine learning systems which can identify various characteristics of a person’s voice. Such systems are of particular interest in the insurance industry. As the life insurance industry moves toward increased use of accelerated underwriting, a major concern is premium leakage from smokers who do not self-identify as being smokers. For example, it is estimated that a 60-year-old male smoker will pay approximately $50,000 more in premiums for a 20-year term life policy than a non-smoker. Therefore, there is clear incentive for smokers to attempt to avoid self-identifying as smokers, and it is estimated that 50% of smokers do not correctly self-identify on life insurance applications. In response, carriers are looking for solutions to identify smokers in real-time, so that those identified as having a high likelihood of smoking can be routed through a more comprehensive underwriting process.

“An extensive body of academic literature shows that smoking cigarettes leads to irritation of the vocal folds (e.g., vocal cords), which manifests itself in numerous changes to a person’s voice, such as changes to the fundamental frequency, perturbation characteristics (e.g., shimmer and jitter), and tremor characteristics. These changes make it possible to identify whether an individual speaker is a smoker or not by analysis of their voice.

“In addition to detecting voice attributes such as whether a speaker is a smoker, there is also tremendous value in being able to detect other attributes of the speaker by analysis of the speaker’s voice, as well as analysis of other attributes such as video analysis, photo analysis, etc. For example, in the medical field, it would be highly beneficial to detect whether an individual is suffering from an illness based on evaluation of the individual’s voice or other sounds emanating from the vocal tract, such as respiratory illnesses, neurological disorders, physiological disorders, and other impairment and conditions. Still further, it would be beneficial to detect the progression of the aforementioned conditions over time through periodic analysis of individuals’ voices, and to undertake various actions when conditions of interest have been detected, such as physically locating the individual, providing health alerts to one or more individuals (e.g., targeted community-based alerts, larger broadcasted alerts, etc.), initiating medical care in response to detected conditions, etc. Moreover, it would be highly beneficial to be able to remotely conduct community surveillance and detection of illnesses and other conditions using commonly-available communications devices such as cellular telephones, smart speakers, computers, etc.

“Therefore, there is a need for systems and methods for machine learning to learn voice and other attributes and to detect a wide variety of conditions and criteria relating to individuals and communities. These and other needs are addressed by the systems and methods of the present disclosure.”

As a supplement to the background information on this patent application, NewsRx correspondents also obtained the inventors’ summary information for this patent application: “The present disclosure relates to systems and methods for machine learning of voice and other attributes. The system first receives input data, which can be human speech, such as one or more recordings of a person speaking (e.g., a monologue, a speech, etc.) and/or one or more conversations between two or more speakers (e.g., a recorded conversation, a telephone conversation, a Voice over Internet Protocol ‘VoIP’ conversation, a group conversation, etc.). The system then isolates a speaker of interest by performing a speaker diarization which partitions an audio stream into homogeneous segments according to the speaker identity. Next, the system isolates predetermined sounds from the isolated speech of the speaker of interest, such as vowel sounds, to generate features. The features are mathematical variables describing the sound spectrum of the speaker’s voice over small time intervals. The system then summarizes the features to generate variables that describe the speaker. Finally, the system generates a predictive model, which can be applied to vocal data to detect a desired feature of a person (e.g., whether or not the person is a smoker). For example, the system generates a modeling dataset comprising tags together with generated functionals, where the tags indicate a speaker’s gender, age, smoker status (e.g., a smoker or a non-smoker), etc. The predictive model allows for modeling of a smoker status using smoker status tags as the target variables, and other tags (e.g., gender, age, etc.) as predictive variables.

“Also provided are systems and methods for detecting one or more attributes of a speaker based on analysis of voice samples or other types of digitally-stored information (e.g, videos, photos, etc.). An audio sample of a person is obtained from one or more sources, such as pre-recorded samples (e.g., voice mail samples) or live audio samples recorded from the speaker. Such samples could be obtained using a wide variety of devices, such as a smart speaker, a smart phone, a personal computer system, a web browser, or other device capable of recording samples of a speaker’s voice. The system processes the audio sample using a predictive voice model to detect whether a pre-determined attribute exists. If a pre-determined attribute exists, the system can indicate the attribute to the user (e.g., using the user’s smart phone, smart speaker, personal computer, or other device), and optionally, one or more additional actions can be taken. For example, the system can identify the physical location of the user (e.g., using one or more geolocation techniques), perform cluster analysis to identify whether clusters of individuals exhibiting the same (or, similar) attribute exist and are located, broadcast one or more alerts, or transmit the detected attribute to one or more third-party computer systems (e.g., via secure transmission using encryption, or through some other secure means) for further processing. Optionally, the system can obtain further voice samples from the individual (e.g., periodically over time) in order to detect and track the onset of a medical condition, or progression of such condition.”

The claims supplied by the inventors are:

“1. A machine learning system for detecting at least one voice attribute from input data, comprising: a processor in communication with a database of input data; and a predictive voice model executed by the processor, the predictive voice model: receiving the input data from the database; processing the input data to identify a speaker of interest from the input data; isolating one or more predetermined sounds corresponding to the speaker of interest; generating a plurality of vectors from the one or more predetermined sounds; generating a plurality of features from the one or more predetermined sounds; processing the plurality of features to generate a plurality of variables that describe the speaker of interest; and processing the plurality of variables and vectors to detect the at least one voice attribute.

“2. The system of claim 1, wherein the predictive model processes one or more of demographic data, voice data, credit data, lifestyle data, prescription data, social media data, or image data.

“3. The system of claim 1, wherein the plurality of vectors comprises a plurality of i-Vectors.

“4. The system of claim 3, where the plurality of variables comprises a plurality of functionals that describe the speaker of interest.

“5. The system of claim 4, wherein the predictive voice model processes the plurality of iVectors and the plurality of functionals to detect the at least one voice attribute.

“6. The system of claim 1, wherein the at least one voice attribute comprises one or more of frequency, perturbation characteristics, tremor characteristics, duration, or timbre.

“7. The system of claim 1, wherein the plurality of features comprise mel-frequency cepstral coefficients.

“8. The system of claim 1, wherein the at least one voice attribute comprises an indication of whether an individual is a smoker.

“9. The system of claim 1, wherein the at least one voice attribute indicates one or more of a respiratory condition, age, gender, general vocal pathology, regional accent, body size, attractiveness, sexuality, social status, personality, emotion, deception, sleepiness, hydration, stress, Sjogren’s syndrome, arthritis, dementia, Parkinson’s disease, schizophrenia, reflux, alcohol intoxication, epidemiology, cannabis intoxication, blood oxygen levels, a medical condition, a respiratory symptom, a respiratory ailment, an illness, a neurological illness, a neurological disorder, a mood, a physiological characteristic, or an attribute that manifests through perceptible changes in the person’s voice.

“10. A machine learning method for detecting at least one voice attribute from input data, comprising the steps of: receiving input data from a database; processing the input data to identify a speaker of interest from the input data; isolating one or more predetermined sounds corresponding to the speaker of interest; generating a plurality of vectors from the one or more predetermined sounds; generating a plurality of features from the one or more predetermined sounds; processing the plurality of features to generate a plurality of variables that describe the speaker of interest; and processing the plurality of variables and vectors to detect the at least one voice attribute.

“11. The method of claim 10, further comprising processing one or more of demographic data, voice data, credit data, lifestyle data, prescription data, social media data, or image data.

“12. The method of claim 10, wherein the plurality of vectors comprises a plurality of i-Vectors.

“13. The method of claim 12, where the plurality of variables comprises a plurality of functionals that describe the speaker of interest.

“14. The method of claim 13, further comprising processing the plurality of iVectors and the plurality of functionals to detect the at least one voice attribute.

“15. The method of claim 10, wherein the at least one voice attribute comprises one or more of frequency, perturbation characteristics, tremor characteristics, duration, or timbre.

“16. The method of claim 10, wherein the plurality of features comprise mel-frequency cepstral coefficients.

“17. The method of claim 10, wherein the at least one voice attribute comprises an indication of whether an individual is a smoker.

“18. The method of claim 10, wherein the at least one voice attribute indicates one or more of a respiratory condition, age, gender, general vocal pathology, regional accent, body size, attractiveness, sexuality, social status, personality, emotion, deception, sleepiness, hydration, stress, Sjogren’s syndrome, arthritis, dementia, Parkinson’s disease, schizophrenia, reflux, alcohol intoxication, epidemiology, cannabis intoxication, blood oxygen levels, a medical condition, a respiratory symptom, a respiratory ailment, an illness, a neurological illness, a neurological disorder, a mood, a physiological characteristic, or an attribute that manifests through perceptible changes in the person’s voice.

“19. A machine learning system for generating one or more vocal metrics from input data, comprising: a processor receiving at least one voice signal; a perceptual subsystem executed by the processor, the perceptual subsystem processing the at least one voice signal using a human auditory perception process; a functionals subsystem executed by the processor, the functionals subsystem processing the at least one voice signal to generate derived functional from the at least one voice signal; a deep convolutional neural network (CNN) subsystem executed by the processor, the deep CNN subsystem applying one or more CNNs to the at last one voice signal; and an ensemble model executed by the processor, the ensemble model processing information generated by the perceptual subsystem, the functional subsystem, and the deep CNN subsystem to generate one or more vocal metrics based on the information.

“20. The machine learning system of claim 19, wherein the processor performs at least one of digital signal processing, audio segmentation, or speaker diarization on the at least one voice signal.

“21. The machine learning system of claim 19, wherein ensemble model processes posterior probabilities generated by the perceptual subsystem, the functional subsystem, and the deep CNN subsystem and associated confidence scores to generate a final prediction.

“22. The machine learning system of claim 19, wherein the one or more vocal metrics comprises an indication of whether an individual is a smoker.

“23. The machine learning system of claim 19, wherein the one or more vocal metrics indicates one or more of a respiratory condition, age, gender, general vocal pathology, regional accent, body size, attractiveness, sexuality, social status, personality, emotion, deception, sleepiness, hydration, stress, Sjogren’s syndrome, arthritis, dementia, Parkinson’s disease, schizophrenia, reflux, alcohol intoxication, epidemiology, cannabis intoxication, blood oxygen levels, a medical condition, a respiratory symptom, a respiratory ailment, an illness, a neurological illness, a neurological disorder, a mood, a physiological characteristic, or an attribute that manifests through perceptible changes in the person’s voice.

“24. A machine learning method for generating one or more vocal metrics from input data, comprising the steps of: receiving at least one voice signal; processing the at least one voice signal using a perceptual subsystem executed by a processor, the perceptual subsystem processing the at least one voice signal using a human auditory perception process; processing the at least one voice signal using a functionals subsystem executed by the processor, the functionals subsystem processing the at least one voice signal to generate derived functional from the at least one voice signal; processing the at least one voice signal using a deep convolutional neural network (CNN) subsystem executed by the processor, the deep CNN subsystem applying one or more CNNs to the at last one voice signal; and processing information generated by the perceptual subsystem, the functional subsystem, and the deep CNN subsystem using an ensemble model to generate one or more vocal metrics based on the information.

“25. The method of claim 24, further comprising performing at least one of digital signal processing, audio segmentation, or speaker diarization on the at least one voice signal.

“26. The method of claim 24, further comprising processing posterior probabilities generated by the perceptual subsystem, the functional subsystem, and the deep CNN subsystem and associated confidence scores to generate a final prediction.

“27. The method of claim 24, wherein the one or more vocal metrics comprises an indication of whether an individual is a smoker.

“28. The method of claim 24, wherein the one or more vocal metrics indicates one or more of a respiratory condition, age, gender, general vocal pathology, regional accent, body size, attractiveness, sexuality, social status, personality, emotion, deception, sleepiness, hydration, stress, Sjogren’s syndrome, arthritis, dementia, Parkinson’s disease, schizophrenia, reflux, alcohol intoxication, epidemiology, cannabis intoxication, blood oxygen levels, a medical condition, a respiratory symptom, a respiratory ailment, an illness, a neurological illness, a neurological disorder, a mood, a physiological characteristic, or an attribute that manifests through perceptible changes in the person’s voice.”

For additional information on this patent application, see: Edwards, Erik; De Zilwa, Shane; Irwin, Nicholas; Poorjam, Amir; Avila, Flavio; Lew, Keith L.; Sirota, Christopher. Systems And Methods For Machine Learning Of Voice Attributes. Filed June 1, 2020 and posted December 3, 2020. Patent URL: http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220200380957%22.PGNR.&OS=DN/20200380957&RS=DN/20200380957

(Our reports deliver fact-based news of research and discoveries from around the world.)

Older

Council to hear TIF request

Newer

Homeland Security Department; Federal Emergency Management Agency (F.R. Page 80799) – Meeting

Advisor News

More Advisor News

Annuity News

More Annuity News

Health/Employee Benefits News

More Health/Employee Benefits News

Life Insurance News

More Life Insurance News

Researchers Submit Patent Application, “Systems And Methods For Machine Learning Of Voice Attributes”, for Approval (USPTO 20200380957)

Advisor News

Annuity News

Health/Employee Benefits News

Life Insurance News

Sign in with your Insider Pro Account