Retrieve results
Results at the file/call level
Request
When a processing request is completed successfully you can proceed in fetching the results. The results are provided in a JSON object format and are described in more detail in this page. The endpoint to call is:
curl --request GET \
--url https://api.behavioralsignals.com/clients/your-client-id/processes/pid/results \
--header 'X-Auth-Token: your-api-token' \
--header 'accept: application/json'
The GET results method requires the client ID (long: {cid}) and the process ID (long: {pid}) to be passed as path parameters. On invocation, it returns the result of the processing in JSON format. In case the specified cid or pid is not found or the status of the job is not set to completed, a corresponding error response is sent to the user.
Response schema
The response is a JSON with the following structure:
{
"pid": 0,
"cid": 0,
"code": 0,
"message": "string",
"results": [
{
"id": "1",
"startTime": "0.209",
"endTime": "7.681",
"task": "<task>",
"prediction": [
{
"label": "<label>",
"posterior": "0.754",
"dominantInSegments": [
0
]
},
...
],
"finalLabel": "<label>",
"level": "utterance",
"embedding": "[11.614513397216797, -15.228992462158203, -4.92175817489624, ...]"
}
]
}
results
is an array, where each element corresponds to a prediction for a specific task and utterance/segment. The available tasks are the following:
- diarization: Contains the speaker label of the utterance, e.g: SPEAKER_00, SPEAKER_01, .... If the
embeddings
query param is defined, the speaker embeddings are also returned. - asr: Contains the verbal content of the utterance
- gender: The sex of the speaker
- age: The age estimation of the speaker
- language: The detected language
- emotion: The detected emotion. Class labels:
happy
,angry
,neutral
,sad
- strength: The detected arousal of speech: Class labels:
weak
,neutral
,strong
- positivity: The sentiment of speech. Class labels:
negative
,neutral
,positive
- speaking_rate: How fast/slow the speaker talks. Class labels:
slow
,normal
,fast
- hesitation: Whether there are signs of hesitation in speech. Class labels:
no
,yes
- politeness: The politeness based on the tone of speech. Class labels:
rude
,normal
,polite
- features: This task is only present when the
embeddings
query param is defined. It contains the behavioral embeddings of the speaker.
The id
of each result is used to indicate the utterance/segment id. The startTime
, endTime
indicate the start/end of the utterance/segment.
Each result has a prediction
array. This includes the values of each class for the specific task. For example in case of the emotion
task, an example prediction
object would be:
"prediction": [
{
"label": "sad",
"posterior": "0.7969",
"dominantInSegments": [
0, 1, 2
]
},
{
"label": "neutral",
"posterior": "0.1931",
"dominantInSegments": [4]
},
{
"label": "happy",
"posterior": "0.007"
},
{
"label": "angry",
"posterior": "0.0029"
}
]
The posterior
indicates the probability of this class label being present in the utterance/segment. In case of utterances, thedominantInSegments
indicates the segments in which the label was dominant. In our example, in the first three segments of the utterance, the speaker was sad. The finalLabel
in the result object, indicates the dominant class label.
The level
field indicates whether this result corresponds to a segment or utterance. The hierarchy is that an utterance contains 1-N segments, and usually corresponds to a speaker turn or sentence. The segment is the smallest unit of speech corresponding to 2 seconds.
The embedding
field contains the speaker or behavioral embedding. This field is empty in most tasks except two:
- diarization: Here the
embedding
field corresponds to the speaker embedding - features: The
embedding
field corresponds to the behavioral features
This field is present only when the embeddings
query param is present.
Updated 5 months ago