Patent Application Titled “System And Method To Represent Conversational Flows As Graph Embeddings And To Conduct Classification And Clustering Based On Such Embeddings” Published Online (USPTO 20220269859): Patent Application
2022 SEP 13 (NewsRx) -- By a
No assignee for this patent application has been made.
Reporters obtained the following quote from the background information supplied by the inventors: “Conversational data is generated by interacting with natural language interfaces such as voice interfaces and chatbots. A designer of the natural language interface cannot easily determine how well a natural language interface will handle a conversation.
“Users of the natural language interfaces have meta data that characterize them, such as their location, age, gender, etc. Further, the natural language interface may collect other conversational meta data, such as an outcome variable for the conversation that characterizes and/or describes the conversation in some way. For example, the outcome variable may be one of: a categorial variable that describes whether the conversation was successful or not, based on some metric; a numerical variable indicative of a length of the conversation, such as a number of times the user interacted with the natural language interface; and an ordinal variable indicative of the user’s indicated satisfaction ranking of the conversation based on rating in the range of one-unsatisfied to five-very satisfied.”
In addition to obtaining background information on this patent application, NewsRx editors also obtained the inventor’s summary information for this patent application: “One aspect of the present embodiments includes the realization that there is a nascent but growing space called conversational analytics with a need for tools that facilitate development of natural language interfaces. The present embodiments solve this problem by providing a conversational analytics toolset that generates reports of summary statistics of popular intents and entities appearing in conversational transcripts and control flow diagrams that are generates to describe the conversations in a graphical representation. Advantageously, the conversational analytics toolset processes conversational data and generates summary statistics reports and graphical representations that allow the developer to see problems with the intents used by the natural language interface and learn how to adjust the intents to improve the quality of the natural language interface.
“Another aspect of the present embodiments includes the realization that a natural language interface could steer a current conversation towards a positive outcome if it knew that the current conversation was likely to have a negative outcome. The present embodiments solve this problem by using graph embedding to identify a previous conversation that is similar to the current conversation and then determining whether that previous conversation had a negative outcome. When the previous conversation had a negative outcome, the natural language interface may be controlled to steer the current conversation towards a more positive outcome.
“Another aspect of the present embodiments includes the realization that when sharing conversational datasets (potentially for research and development analysis purposes with other practitioners or researchers or to the public-for transparency purposes when possible) there are a few challenges that can arise: privacy and anonymization-textual content of conversations typically includes personally identifying information (PII), which is sensitive private information that a person or small group of people may not wish to disclose, including information that may lead to the person or group of people being identifiable, information about a nature of the conversation, and other sensitive information about the person or group. Advantageously, the embodiments described herein solve this problem by grouping such conversations and ensuring k-anonymity.
“Another aspect of the present embodiments includes the realization that there is increasing regulatory requirements for privacy and personal information disclosure. Existing regulations include the General Data Protection Regulation (GDPR), which is a European regulation implemented in 2018 to enhance EU citizens’ control over the personal data that companies can legally hold, Health Insurance Portability and Accountability Act (HIPPA) Privacy Rule implemented on
“In one embodiment, a method develops natural language interface. Conversational data including user utterances is received for a plurality of conversations from a natural language interface. Each of the conversations is classified to determine one or more intents for each of the user utterances, and, for each of the conversations, a control flow diagram showing the intents and sequential flow of the conversation is generated. Each of the control flow diagrams is processed to generate a graph embedding representative of the conversation.
“In another embodiment, a method directs a current conversation with a natural language interface. A most recent utterance from a user in the current conversation is received and classified to determine an intent of the user. A current control flow diagram of the current conversation is generated and a current graph embedding is generated for the current control flow diagram. At least one previous graph embedding, nearest to the current graph embedding, is selected and corresponds to at least one previous conversation. A previous outcome of the at least one previous conversation is determined and a predicted outcome of the current conversation is determined based upon the previous outcome. When the predicted outcome is not positive, response outputs of the natural language interface for the current conversation are steered based upon the predicted outcome.
“In another embodiment, a software product, has instructions, stored on computer-readable media, wherein the instructions, when executed by a computer, perform steps for natural language interface development. The software product includes instructions for receiving, from a natural language interface and for a plurality of conversations, conversational data including user utterances; instructions for classifying each of the conversations to determine intents for each of the user utterances; instructions for generating, for each of the conversations, a control flow diagram showing the intents and sequential flow of the conversation; and instructions for processing each of the control flow diagrams to generate a graph embedding representative of the conversation.
“In another embodiment, a method ensures k-anonymity in shared conversation datasets. The method includes generating graph embeddings for each of a plurality of conversations from conversational data for N different users; determining at least one cluster of the graph embeddings using a clustering algorithm; determining number K of points in the at least one cluster; and sharing at least part of the conversational data corresponding to the at least one cluster when K is greater than or equal to N.
“In another embodiment, a method provides efficient searching of conversations. The method includes generating graph embeddings for each of a plurality of conversations from conversational data for different users; determining at least one cluster of the graph embeddings using a clustering algorithm; determining a representative conversation of the at least one cluster; storing the representative conversation in a cache; and searching the cache to find the representative conversation based on input parameters.
“In another embodiment, a method identifies change in conversations at a natural language interface. The method includes generating first graph embeddings for each of a plurality of conversations from different users received at the natural language interface during a first period; determining, using a clustering algorithm, first cluster data including at least one cluster of the first graph embeddings; generating second graph embeddings for each of a plurality of conversations from different users received at the natural language interface during a second period; determining, using the clustering algorithm, second cluster data including at least one cluster of the second graph embeddings; and comparing the first cluster data and the second cluster data to detect changes in the conversations over time”
The claims supplied by the inventors are:
“1. A method for ensuring k-anonymity in shared conversation datasets, comprising: generating graph embeddings for each of a plurality of conversations from conversational data for N different users; determining at least one cluster of the graph embeddings using a clustering algorithm; determining number K of points in the at least one cluster; and sharing at least part of the conversational data corresponding to the at least one cluster when K is greater than or equal to N.
“2. The method of claim 1, further comprising extracting at least one representative conversation corresponding to at least one graph embedding within the at least one cluster to form the at least part of the conversational data.
“3. The method of claim 2, the at least one graph embedding corresponding to a centroid of the at least one cluster.
“4. The method of claim 2, further comprising: storing the at least one representative conversation in a cache for fast access; and storing the conversational data in secondary storage having slower access than the cache.
“5. The method of claim 1, the clustering algorithm implementing one or both of k-means and k-medioids.
“6. The method of claim 1, further comprising: determining results by filtering the conversational data based on at least one of a metadata dimension and an outcome variable; and processing the results to generate filtered graph embeddings related to each of the at least one of a metadata dimension and an outcome variable.
“7. The method of claim 6, further comprising: clustering the filtered graph embeddings using the clustering algorithm; identifying clusters having fewer than a threshold value T of graph embeddings; and indicating that the identified clusters require collection of more conversational data corresponding to the at least one of the metadata dimension and/or the outcome variable.
“8. The method of claim 7, further comprising generating an alert to indicate that the corresponding types of conversation are under-represented in conversational data.
“9. The method of claim 1, further comprising displaying, for each cluster, a histogram indicative of at least one attribute of the cluster.
“10. The method of claim 9, the attribute being age of a person having the conversation.
“11. A method for efficient searching of conversations, comprising: generating graph embeddings for each of a plurality of conversations from conversational data for different users; determining at least one cluster of the graph embeddings using a clustering algorithm; determining a representative conversation of the at least one cluster; storing the representative conversation in a cache; and searching the cache to find the representative conversation based on input parameters.
“12. The method of claim 11, wherein the representative conversation derived from the cluster of graph embeddings removes repetitive information.
“13. The method of claim 11, the searching of the cache allowing representative searches to be performed rapidly.
“14. A method for identifying change in conversations at a natural language interface, comprising: generating first graph embeddings for each of a plurality of conversations from different users received at the natural language interface during a first period; determining, using a clustering algorithm, first cluster data including at least one cluster of the first graph embeddings; generating second graph embeddings for each of a plurality of conversations from different users received at the natural language interface during a second period; determining, using the clustering algorithm, second cluster data including at least one cluster of the second graph embeddings; and comparing the first cluster data and the second cluster data to detect changes in the conversations over time.
“15. The method of claim 14, further comprising determining a difference between a first centroid of the at least one cluster of the first cluster data and a second centroid of the at least one cluster of the second cluster data, wherein the difference indicates change in conversations.
“16. The method of claim 14, further comprising detecting new clusters in second cluster data that were not present in the first cluster data, wherein the new clusters are indicative of new types of conversation at the natural language interface.
“17. The method of claim 14, further comprising displaying, for each of the at least one cluster in the first cluster data and the at least one cluster of the second cluster data, a histogram indicative of at least one attribute of the cluster.
“18. The method of claim 17, the attribute being at least one of location, gender, and age range of a person having the conversation.”
For more information, see this patent application: Topol, Zvi. System And Method To Represent Conversational Flows As Graph Embeddings And To Conduct Classification And Clustering Based On Such Embeddings. Filed
(Our reports deliver fact-based news of research and discoveries from around the world.)
New Research on Crop Insurance from Vytautas Magnus University Summarized (New Approach to the Public Authorities’ Activities Development in the Crop Insurance System: Lithuanian Case): Agriculture – Crop Insurance
USAA Survey Shows Inflation Is Affecting Spending
Advisor News
Annuity News
Health/Employee Benefits News
Life Insurance News