Patent Issued for Transcription analysis platform (USPTO 11837214): United Services Automobile Association
2023 DEC 22 (NewsRx) -- By a
The assignee for this patent, patent number 11837214, is
Reporters obtained the following quote from the background information supplied by the inventors: “Generally, vendors providing transcription services each use their own formulas to present and market their speech to text accuracy.”
In addition to obtaining background information on this patent, NewsRx editors also obtained the inventors’ summary information for this patent: “Various embodiments of the present disclosure generally relate to a transcription analysis platform. More specifically, various embodiments of the present disclosure relate to methods and systems for analyzing and evaluating transcriptions.
“Transcriptions can have poor accuracy, particularly where the audio file includes multiple speakers and where the vendor has no prior experience with the speakers (independent speech recognition). Generally, vendors providing transcription services each use their own formulas to present and market their speech to text accuracy. Prior to the current technology, no solution existed for performing objective accuracy testing across multiple vendors.
“According to various implementations of the present disclosure, a set of baseline transcriptions are created from a set of audio files. The baseline transcriptions can be used as ground truth transcriptions of the audio files (i.e., considered as an accurate transcription of the audio file). The baseline transcriptions can be created by humans or by a machine using extremely accurate transcription technology (e.g., dependent speech recognition). The same audio files can be sent to various vendors for transcription. Upon receiving the transcriptions of the audio files from the vendors, the baseline transcriptions and the vendor transcriptions can be normalized. For example, the words that can be spelled different ways can be changed to a standardized spelling of the word in the text (e.g., “uhm” and “umm” can be changed to “um”) and spoken numbers can be written in a standardized manner (e.g., “twelve hundred” or “one thousand, two hundred” can be changed to “1200”). After the transcriptions are normalized, the system can determine error rates of each transcription by comparing the vendor transcriptions to the baseline transcriptions.
“The system can determine various error rates. For example, the system can determine a word error rate, a phrase error rate, and speaker error rate. To determine a word error rate, each word of the baseline transcription can be put into a separate row in the same column and aligned with each word of the vendor transcription. To create/maintain alignment, the system can add a row to the vendor transcription where a word was deleted from the vendor transcription or add a row to the baseline transcription where a word was inserted in the vendor transcription. The system can assign a differentiator label to each word/space of the vendor transcription. The differentiator labels can indicate whether the word was transcribed correctly, and, if the word was not transcribed correctly, an indication of the error (e.g., inserted, deleted, or substituted).
“The system can use the differentiator labels to calculate a word error rate. For example, a word error rate can be calculated by dividing the number of correct words by a sum of the number of substituted, deleted, inserted and correct words). The error rates for numerous transcribed audio files can be evaluated, and the results can be displayed graphically (e.g., using heat maps) and compared to error rates of other entities. A similar analysis can be done for phrases (e.g. a grouping of words) or speakers (e.g., indicating a change in speaker) in addition to or instead of analyzing the word error rate.”
The claims supplied by the inventors are:
“1. A computerized method comprising: normalizing, by one or more processors, a first transcription of an audio file and a baseline transcription of the audio file, wherein the baseline transcription is used as an accurate transcription of the audio file; vertically aligning, by the one or more processors, portions of the first transcription with a corresponding portion of the baseline transcription, by: assigning a differentiator label to each portion of the first transcription based on a comparison of the portion of the first transcription with the corresponding portion of the baseline transcription; and in response to the first transcription including an inserted portion as compared to the baseline transcription, inserting a section into the baseline transcription corresponding to the inserted portion in the first transcription, such that portions of the first transcription are aligned with the corresponding portions of the baseline transcription; determining, by the one or more processors, an error rate of the first transcription based on determinations of whether each portion, as vertically aligned between the first transcription and the baseline transcription, has a differentiator label corresponding to correct, inserted, deleted, or substituted in the differentiator labels assigned to each portion of the first transcription; and generating a graphical display of performance, wherein the graphical display includes a heat map based on A) the error rate and B) determined word lengths of the first transcription and/or the baseline transcription.
“2. The computerized method of claim 1 further comprising normalizing the first transcription and the baseline transcription by automatically changing words and/or numbers to a standardized spelling or appearance.
“3. The computerized method of claim 1, wherein the portions of the first transcription are individual words.
“4. The computerized method of claim 1, further comprising removing personally identifiable information from the first transcription and/or from the audio file.
“5. The computerized method of claim 1, wherein each of the correct, inserted, deleted, or substituted differentiator labels is used at least once in the first transcription.
“6. The computerized method of claim 1 further comprising: determining that the first transcription has a deleted word, as compared to the baseline transcription; and in response, inserting a second section into the first transcription at a location where the deleted word was deleted from the first transcription.
“7. The computerized method of claim 1, wherein determining the error rate comprises dividing A) a number of correct portions by B) a sum of at least a number of portions with a label corresponding to inserted, a number of portions with a label corresponding to deleted, and a number of portions with a label corresponding to substituted.
“8. The computerized method of claim 1, wherein the error rate is an error rate of incorrect phrases between the first transcription and the baseline transcription.
“9. The computerized method of claim 1 further comprising: determining additional error rates for additional transcriptions generated by an entity that generated the first transcription; and generating the graphical display includes a display of performance of the entity that includes an indication based on the error rate and the additional error rates.
“10. A non-transitory computer-readable storage medium storing instructions that, when executed by a computing system, cause the computing system to perform a process comprising: receiving, by one or more processors, a first transcription of an audio file and a baseline transcription of the audio file, wherein the baseline transcription is used as an accurate transcription of the audio file; vertically aligning, by the one or more processors, portions of the first transcription with a corresponding portion of the baseline transcription, by: assigning a differentiator label to each portion of the first transcription based on a comparison of the portion of the first transcription with the corresponding portion of the baseline transcription; and in response to the first transcription including an inserted portion as compared to the baseline transcription, inserting a section into the baseline transcription corresponding to the inserted portion in the first transcription, such that portions of the first transcription are aligned with the corresponding portions of the baseline transcription; determining, by the one or more processors, an error rate of the first transcription based on determinations of whether each portion, as vertically aligned between the first transcription and the baseline transcription, has a differentiator label corresponding to correct, inserted, deleted, or substituted in the differentiator labels assigned to each portion of the first transcription; and generating a graphical display of performance, wherein the graphical display includes a heat map based on A) the error rate and B) determined word lengths of the first transcription and/or the baseline transcription.
“11. The non-transitory computer-readable storage medium of claim 10, wherein the process further comprises normalizing the first transcription and the baseline transcription by automatically changing words and/or numbers to a standardized spelling or appearance.
“12. The non-transitory computer-readable storage medium of claim 10, wherein the portions of the first transcription are individual words.
“13. The non-transitory computer-readable storage medium of claim 10, wherein the process further comprises: determining that the first transcription has a deleted word, as compared to the baseline transcription; and in response, inserting a second section into the first transcription at a location where the deleted word was deleted from the first transcription.
“14. The non-transitory computer-readable storage medium of claim 10, wherein determining the error rate comprises dividing A) a number of correct portions by B) a sum of at least a number of portions with a label corresponding to inserted, a number of portions with a label corresponding to deleted, and a number of portions with a label corresponding to substituted.
“15. The non-transitory computer-readable storage medium of claim 10, wherein the process further comprises: determining additional error rates for additional transcriptions generated by an entity that generated the first transcription; and generating the graphical display includes a display of performance of the entity that includes an indication based on the error rate and the additional error rates.
“16. A computing system comprising: one or more processors; and one or more memories storing instructions that, when executed by the one or more processors, cause the computing system to perform a process comprising: receiving a first transcription of an audio file and a baseline transcription of the audio file, wherein the baseline transcription is used as an accurate transcription of the audio file; vertically aligning portions of the first transcription with a corresponding portion of the baseline transcription, by: assigning a differentiator label to each portion of the first transcription based on a comparison of the portion of the first transcription with the corresponding portion of the baseline transcription; and in response to the first transcription including an inserted portion as compared to the baseline transcription, inserting a section into the baseline transcription corresponding to the inserted portion in the first transcription, such that portions of the first transcription are aligned with the corresponding portions of the baseline transcription; determining an error rate of the first transcription based on determinations of whether each portion, as vertically aligned between the first transcription and the baseline transcription, has a differentiator label corresponding to correct, inserted, deleted, or substituted in the differentiator labels assigned to each portion of the first transcription; and generating a graphical display of performance, wherein the graphical display includes a heat map based on A) the error rate and B) determined word lengths of the first transcription and/or the baseline transcription.
“17. The computing system of claim 16, wherein the process further comprises normalizing the first transcription and the baseline transcription by automatically changing words and/or numbers to a standardized spelling or appearance.
“18. The computing system of claim 16, wherein determining the error rate comprises dividing A) a number of correct portions by B) a sum of at least a number of portions with a label corresponding to inserted, a number of portions with a label corresponding to deleted, and a number of portions with a label corresponding to substituted.
“19. The computing system of claim 16, wherein the process further comprises: determining additional error rates for additional transcriptions generated by an entity that generated the first transcription; and generating the graphical display includes a display of performance of the entity that includes an indication based on the error rate and the additional error rates.”
For more information, see this patent: Chavez, Carlos. Transcription analysis platform.
(Our reports deliver fact-based news of research and discoveries from around the world.)
Patent Issued for System or method for real-time analysis of remote health data aggregated with vital signs to provide remote assistance (USPTO 11837361): Aetna Inc.
Patent Issued for Automated vehicle ownership support (USPTO 11836737): United Services Automobile Association
Advisor News
Annuity News
Health/Employee Benefits News
Life Insurance News