Researchers Submit Patent Application, “Systems And Methods For Machine Learning Of Voice Attributes”, for Approval (USPTO 20200380957)
2020 DEC 21 (NewsRx) -- By a
The patent’s assignee is
News editors obtained the following quote from the background information supplied by the inventors: “Technical Field
“The present disclosure relates generally to the field of machine learning technology. More specifically, the present disclosure relates to systems and methods for machine learning of voice attributes.
“Related Art
“In the machine learning space, there is significant interest in developing computer-based machine learning systems which can identify various characteristics of a person’s voice. Such systems are of particular interest in the insurance industry. As the life insurance industry moves toward increased use of accelerated underwriting, a major concern is premium leakage from smokers who do not self-identify as being smokers. For example, it is estimated that a 60-year-old male smoker will pay approximately
“An extensive body of academic literature shows that smoking cigarettes leads to irritation of the vocal folds (e.g., vocal cords), which manifests itself in numerous changes to a person’s voice, such as changes to the fundamental frequency, perturbation characteristics (e.g., shimmer and jitter), and tremor characteristics. These changes make it possible to identify whether an individual speaker is a smoker or not by analysis of their voice.
“In addition to detecting voice attributes such as whether a speaker is a smoker, there is also tremendous value in being able to detect other attributes of the speaker by analysis of the speaker’s voice, as well as analysis of other attributes such as video analysis, photo analysis, etc. For example, in the medical field, it would be highly beneficial to detect whether an individual is suffering from an illness based on evaluation of the individual’s voice or other sounds emanating from the vocal tract, such as respiratory illnesses, neurological disorders, physiological disorders, and other impairment and conditions. Still further, it would be beneficial to detect the progression of the aforementioned conditions over time through periodic analysis of individuals’ voices, and to undertake various actions when conditions of interest have been detected, such as physically locating the individual, providing health alerts to one or more individuals (e.g., targeted community-based alerts, larger broadcasted alerts, etc.), initiating medical care in response to detected conditions, etc. Moreover, it would be highly beneficial to be able to remotely conduct community surveillance and detection of illnesses and other conditions using commonly-available communications devices such as cellular telephones, smart speakers, computers, etc.
“Therefore, there is a need for systems and methods for machine learning to learn voice and other attributes and to detect a wide variety of conditions and criteria relating to individuals and communities. These and other needs are addressed by the systems and methods of the present disclosure.”
As a supplement to the background information on this patent application, NewsRx correspondents also obtained the inventors’ summary information for this patent application: “The present disclosure relates to systems and methods for machine learning of voice and other attributes. The system first receives input data, which can be human speech, such as one or more recordings of a person speaking (e.g., a monologue, a speech, etc.) and/or one or more conversations between two or more speakers (e.g., a recorded conversation, a telephone conversation, a Voice over Internet Protocol ‘VoIP’ conversation, a group conversation, etc.). The system then isolates a speaker of interest by performing a speaker diarization which partitions an audio stream into homogeneous segments according to the speaker identity. Next, the system isolates predetermined sounds from the isolated speech of the speaker of interest, such as vowel sounds, to generate features. The features are mathematical variables describing the sound spectrum of the speaker’s voice over small time intervals. The system then summarizes the features to generate variables that describe the speaker. Finally, the system generates a predictive model, which can be applied to vocal data to detect a desired feature of a person (e.g., whether or not the person is a smoker). For example, the system generates a modeling dataset comprising tags together with generated functionals, where the tags indicate a speaker’s gender, age, smoker status (e.g., a smoker or a non-smoker), etc. The predictive model allows for modeling of a smoker status using smoker status tags as the target variables, and other tags (e.g., gender, age, etc.) as predictive variables.
“Also provided are systems and methods for detecting one or more attributes of a speaker based on analysis of voice samples or other types of digitally-stored information (e.g, videos, photos, etc.). An audio sample of a person is obtained from one or more sources, such as pre-recorded samples (e.g., voice mail samples) or live audio samples recorded from the speaker. Such samples could be obtained using a wide variety of devices, such as a smart speaker, a smart phone, a personal computer system, a web browser, or other device capable of recording samples of a speaker’s voice. The system processes the audio sample using a predictive voice model to detect whether a pre-determined attribute exists. If a pre-determined attribute exists, the system can indicate the attribute to the user (e.g., using the user’s smart phone, smart speaker, personal computer, or other device), and optionally, one or more additional actions can be taken. For example, the system can identify the physical location of the user (e.g., using one or more geolocation techniques), perform cluster analysis to identify whether clusters of individuals exhibiting the same (or, similar) attribute exist and are located, broadcast one or more alerts, or transmit the detected attribute to one or more third-party computer systems (e.g., via secure transmission using encryption, or through some other secure means) for further processing. Optionally, the system can obtain further voice samples from the individual (e.g., periodically over time) in order to detect and track the onset of a medical condition, or progression of such condition.”
The claims supplied by the inventors are:
“1. A machine learning system for detecting at least one voice attribute from input data, comprising: a processor in communication with a database of input data; and a predictive voice model executed by the processor, the predictive voice model: receiving the input data from the database; processing the input data to identify a speaker of interest from the input data; isolating one or more predetermined sounds corresponding to the speaker of interest; generating a plurality of vectors from the one or more predetermined sounds; generating a plurality of features from the one or more predetermined sounds; processing the plurality of features to generate a plurality of variables that describe the speaker of interest; and processing the plurality of variables and vectors to detect the at least one voice attribute.
“2. The system of claim 1, wherein the predictive model processes one or more of demographic data, voice data, credit data, lifestyle data, prescription data, social media data, or image data.
“3. The system of claim 1, wherein the plurality of vectors comprises a plurality of i-Vectors.
“4. The system of claim 3, where the plurality of variables comprises a plurality of functionals that describe the speaker of interest.
“5. The system of claim 4, wherein the predictive voice model processes the plurality of iVectors and the plurality of functionals to detect the at least one voice attribute.
“6. The system of claim 1, wherein the at least one voice attribute comprises one or more of frequency, perturbation characteristics, tremor characteristics, duration, or timbre.
“7. The system of claim 1, wherein the plurality of features comprise mel-frequency cepstral coefficients.
“8. The system of claim 1, wherein the at least one voice attribute comprises an indication of whether an individual is a smoker.
“9. The system of claim 1, wherein the at least one voice attribute indicates one or more of a respiratory condition, age, gender, general vocal pathology, regional accent, body size, attractiveness, sexuality, social status, personality, emotion, deception, sleepiness, hydration, stress, Sjogren’s syndrome, arthritis, dementia, Parkinson’s disease, schizophrenia, reflux, alcohol intoxication, epidemiology, cannabis intoxication, blood oxygen levels, a medical condition, a respiratory symptom, a respiratory ailment, an illness, a neurological illness, a neurological disorder, a mood, a physiological characteristic, or an attribute that manifests through perceptible changes in the person’s voice.
“10. A machine learning method for detecting at least one voice attribute from input data, comprising the steps of: receiving input data from a database; processing the input data to identify a speaker of interest from the input data; isolating one or more predetermined sounds corresponding to the speaker of interest; generating a plurality of vectors from the one or more predetermined sounds; generating a plurality of features from the one or more predetermined sounds; processing the plurality of features to generate a plurality of variables that describe the speaker of interest; and processing the plurality of variables and vectors to detect the at least one voice attribute.
“11. The method of claim 10, further comprising processing one or more of demographic data, voice data, credit data, lifestyle data, prescription data, social media data, or image data.
“12. The method of claim 10, wherein the plurality of vectors comprises a plurality of i-Vectors.
“13. The method of claim 12, where the plurality of variables comprises a plurality of functionals that describe the speaker of interest.
“14. The method of claim 13, further comprising processing the plurality of iVectors and the plurality of functionals to detect the at least one voice attribute.
“15. The method of claim 10, wherein the at least one voice attribute comprises one or more of frequency, perturbation characteristics, tremor characteristics, duration, or timbre.
“16. The method of claim 10, wherein the plurality of features comprise mel-frequency cepstral coefficients.
“17. The method of claim 10, wherein the at least one voice attribute comprises an indication of whether an individual is a smoker.
“18. The method of claim 10, wherein the at least one voice attribute indicates one or more of a respiratory condition, age, gender, general vocal pathology, regional accent, body size, attractiveness, sexuality, social status, personality, emotion, deception, sleepiness, hydration, stress, Sjogren’s syndrome, arthritis, dementia, Parkinson’s disease, schizophrenia, reflux, alcohol intoxication, epidemiology, cannabis intoxication, blood oxygen levels, a medical condition, a respiratory symptom, a respiratory ailment, an illness, a neurological illness, a neurological disorder, a mood, a physiological characteristic, or an attribute that manifests through perceptible changes in the person’s voice.
“19. A machine learning system for generating one or more vocal metrics from input data, comprising: a processor receiving at least one voice signal; a perceptual subsystem executed by the processor, the perceptual subsystem processing the at least one voice signal using a human auditory perception process; a functionals subsystem executed by the processor, the functionals subsystem processing the at least one voice signal to generate derived functional from the at least one voice signal; a deep convolutional neural network (CNN) subsystem executed by the processor, the deep
“20. The machine learning system of claim 19, wherein the processor performs at least one of digital signal processing, audio segmentation, or speaker diarization on the at least one voice signal.
“21. The machine learning system of claim 19, wherein ensemble model processes posterior probabilities generated by the perceptual subsystem, the functional subsystem, and the deep
“22. The machine learning system of claim 19, wherein the one or more vocal metrics comprises an indication of whether an individual is a smoker.
“23. The machine learning system of claim 19, wherein the one or more vocal metrics indicates one or more of a respiratory condition, age, gender, general vocal pathology, regional accent, body size, attractiveness, sexuality, social status, personality, emotion, deception, sleepiness, hydration, stress, Sjogren’s syndrome, arthritis, dementia, Parkinson’s disease, schizophrenia, reflux, alcohol intoxication, epidemiology, cannabis intoxication, blood oxygen levels, a medical condition, a respiratory symptom, a respiratory ailment, an illness, a neurological illness, a neurological disorder, a mood, a physiological characteristic, or an attribute that manifests through perceptible changes in the person’s voice.
“24. A machine learning method for generating one or more vocal metrics from input data, comprising the steps of: receiving at least one voice signal; processing the at least one voice signal using a perceptual subsystem executed by a processor, the perceptual subsystem processing the at least one voice signal using a human auditory perception process; processing the at least one voice signal using a functionals subsystem executed by the processor, the functionals subsystem processing the at least one voice signal to generate derived functional from the at least one voice signal; processing the at least one voice signal using a deep convolutional neural network (CNN) subsystem executed by the processor, the deep
“25. The method of claim 24, further comprising performing at least one of digital signal processing, audio segmentation, or speaker diarization on the at least one voice signal.
“26. The method of claim 24, further comprising processing posterior probabilities generated by the perceptual subsystem, the functional subsystem, and the deep
“27. The method of claim 24, wherein the one or more vocal metrics comprises an indication of whether an individual is a smoker.
“28. The method of claim 24, wherein the one or more vocal metrics indicates one or more of a respiratory condition, age, gender, general vocal pathology, regional accent, body size, attractiveness, sexuality, social status, personality, emotion, deception, sleepiness, hydration, stress, Sjogren’s syndrome, arthritis, dementia, Parkinson’s disease, schizophrenia, reflux, alcohol intoxication, epidemiology, cannabis intoxication, blood oxygen levels, a medical condition, a respiratory symptom, a respiratory ailment, an illness, a neurological illness, a neurological disorder, a mood, a physiological characteristic, or an attribute that manifests through perceptible changes in the person’s voice.”
For additional information on this patent application, see:
(Our reports deliver fact-based news of research and discoveries from around the world.)



Council to hear TIF request
Homeland Security Department; Federal Emergency Management Agency (F.R. Page 80799) – Meeting
Advisor News
- Allianz studies why 42% of Americans retire sooner than expected
- Why advisors should be talking about life settlements
- Millennials are ready to bring their advisor to the family table
- How healthcare inflation can eat up a client’s retirement income
- Global economy ‘resilient’ in the wake of massive disruption
More Advisor NewsAnnuity News
- NAIC regulators continue pushing for annuity illustration updates
- Wink: Flat first-quarter annuity sales fall just short of $100B
- 26North Re Agrees to Acquire 100% of Independent Insurance Group
- Matthew Michelini named Athene president, with an eye on annuity growth
- Lincoln Financial Announces Executive Leadership Transitions
More Annuity NewsHealth/Employee Benefits News
- Tom Campbell: We're paying too much for poor health care
- Self-pay and dental care: Can paying cash without insurance help you save?
- These Connecticut-based companies made this year's Fortune 500 list with revenue up to $275 billion
- Surgery transforms epilepsy patient's life
- Arizona AG accuses health insurance companies of illegal price fixing
More Health/Employee Benefits NewsLife Insurance News
- Prudential announces more layoffs as insurer continues to restructure
- Pradip Patiath Joins Securian Financial Board of Directors
- Over $107 million in life insurance benefits located for Tennesseans in 2025
- Study Data from National Institutes of Health Provide New Insights into Law and the Biosciences (Taking actuarial fairness seriously: what is required for the ethical use of genetics in insurance?): Legal Issues – Law and the Biosciences
- 26North Re Agrees to Acquire 100% of Independent Insurance Group
More Life Insurance News