Embeddings

Introduction

Embeddings are numerical representations of data (vectors) that capture the underlying patterns and relationships within the input. In the context of audio analysis, speaker embeddings are compact vector representations that encapsulate the unique characteristics of a speaker’s voice, such as pitch, tone, and speaking style. Similarly, behavioral embeddings represent the various behavioral traits extracted from the audio, including emotion, engagement, and politeness. These embeddings transform complex audio signals into fixed-dimensional vectors, making it easier to perform various machine learning tasks, such as clustering, classification, and similarity measurement. By converting raw audio data into these standardized formats, embeddings enable more efficient storage, retrieval, and processing of information, facilitating a wide range of analytical and predictive applications.

Common use cases for embeddings include speaker recognition, where embeddings help identify or verify a speaker’s identity across different recordings, even in varying acoustic environments. In customer service, embeddings can be used to analyze and improve interactions by identifying behavioral patterns and tailoring responses accordingly. For instance, recognizing when a customer is frustrated can prompt a system to escalate the call to a human representative. In media and content analysis, embeddings assist in indexing and retrieving audio segments based on speaker characteristics or emotional content, enhancing search capabilities. Additionally, embeddings enable advanced analytics, such as detecting trends in customer sentiment or engagement over time, which can be invaluable for market research and business intelligence.

How to retrieve embeddings from the API

Our API offers two types of embeddings. The embeddings are calculated per utterance:

  • Speaker Embedding: A 1024-dimensional embedding that corresponds to the speaker's tone and unique characteristics of voice. Can be used for speaker identification/verification.
  • Behavioral Embedding: A 1024-dimensional embedding that encapsulates the behavioral characteristics of speech. These include the emotion, positivity, and strength of the utterance.

For embeddings to be included in the response, the user must set the embeddings query param in the submit audio request:

curl --request POST \
     --url https://api.behavioralsignals.com/clients/your-client-id/processes/audio \
     --header 'X-Auth-Token: your-api-token' \
     --header 'accept: application/json' \
     --header 'content-type: multipart/form-data' \
     --form name=my-awesome-audio \
     --form embeddings=true \
     --form 'meta={"key": "value"}'