CyberVox API
1.0.0

Cybervox is the CyberLabs voice solution platform. Using Deep Learning techniques, we develop voice solutions in Brazilian Portuguese that are state of the art in Artificial Intelligence.

For a working example follow the links bellow.

Example code

This is the documentation for version 1.0.0 of the API. Last update on Apr 28, 2021.

Base URL
https://api.cybervox.ai

Authentication

Request

curl --request POST \
  --url https://api.cybervox.ai/auth \
  --header 'content-type: application/json' \
  --data '{"client_id":"(( your client id ))","client_secret":"(( your secret ))","audience":"https://api.cybervox.ai","grant_type":"client_credentials"}'

Response

{
  "access_token": "(( jwt token ))",
  "token_type": "Bearer"
}

Check the jwt token expiration time (exp) and only get a new one when necessary!

Call the API

curl --request GET \
  --url https://api.cybervox.ai/.../?access_token=(( jwt token ))

Vox_fala (tts)

This api converts a given text block (the requests's payload/text field) into an audio wave file (the response's payload/audio_url field)

Text-to-speech

Please note this is a websocket api (wss://). After connecting to the websocket, send the described request.

Request
  • payload/text (string, mandatory) the text to be converted to audio.
  • payload/voice (string, optional) voice used to generate the audio.
  • payload/timestamp (number, optional) the timestamp will be sent back in the response, if provided (for your benchmarking).
Response
  • payload/success (boolean, mandatory) if true, returns audio_url; if false, returns reason.
  • payload/reason (string, if success == false) the failure reason.
  • payload/audio_url (string, if success == true) the audio url.
  • payload/timestamp (number, if timestamp was received) the same timestamp sent in request, for benchmarking purposes.
notes

To listen to the generated audio, use the audio_url returned on the response's payload field:

  • the audio_url is the generated speech url to download
  • the audio_url format is self contained, just concatenate it with the api URL: https://api.cybervox.ai + (( audio_url ))
  • the audio_url uses the play API documented bellow (but you don't need to know about it)
  • the audio_url does not need an access token
  • the audio_url is valid for 1 minute after it is returned in response

Request

{
  "emit": "tts",
  "payload": {
    "text": (( string: text to be converted )),
    "voice": (( string: voice used to generate the audio )),
    "timestamp": (( number: Date.now() ))
  }
}

Response

{
  "event": "tts",
  "payload": {
    "success": (( bool )),
    "reason": (( string )),
    "audio_url": (( string )),
    "timestamp": (( number ))
  }
}

Example

echo '{"emit":"tts","payload":{"text":"ola mundo"}}' | \
  websocat -n1 'wss://api.cybervox.ai/ws?access_token=(( jwt token ))'

> {"event":"tts","payload":{"success":true,"reason":"","audio_url":"/play/(( upload_id ))"}}

Upload

This api stores a given binary file (the binary payload) and returns it's uuid (the response's payload/upload_id field)

Upload audio file

Please note this is a websocket api (wss://). After connecting to the websocket, send the described request.

Request
  • payload/max_uploads (number, mandatory) the number of files to be uploaded (should always be 1 for vox_txto and at least 4 for vox_id).
Response
  • payload/upload_id (string, mandatory) the upload_id to be used with stt call
  • payload/timestamp (number, if timestamp was received) the same timestamp sent in request, for benchmarking purposes.
notes
  • now that you uploaded your audio file you can make a request to convert it to text or compare audio speakers

Request

{
  "emit": "upload",
  "payload": {
    "max_uploads": 1,
    "timestamp": (( number: Date.now() ))
  }
}

... followed by the audio file contents (websocket binary message)

Response

{
  "event": "upload",
  "payload": {
    "upload_id": (( string ))
    "timestamp"`: (( number ))
  }
}

Example

websocat, the tool we used has no support for text + binary websocket messages

Vox_txto (stt)

This api converts a given binary file (the request's payload/upload_id field) and returns the transcribed text (the response's payload/text field)

Speech-to-text

Please note this is a websocket api (wss://). After connecting to the websocket, send the described request.

Request
  • payload/upload_id (string, mandatory) the upload_id received from upload action.
  • payload/timestamp (number, optional) the timestamp will be sent back in the response, if provided (for your benchmarking).
Response
  • payload/success (boolean, mandatory) if true, returns text; if false, returns reason.
  • payload/reason (string, if success == false) the failure reason.
  • payload/text (string, if success == true) the transcribed audio.
  • payload/timestamp (number, if timestamp was received) the same timestamp sent in request, for benchmarking purposes.
notes

Request

{
  "emit": "stt",
  "payload": {
    "upload_id": (( string: upload_id received from upload action ))
    "timestamp": (( number: Date.now() ))
  }
}

Response

{
  "event": "stt",
  "payload"`: {
    "success": (( bool )),
    "reason": (( string )),
    "text": (( string )),
    "timestamp": (( number ))
  }
}

Example

echo '{"emit":"stt","payload":{"upload_id":"..."}}' | \
  websocat -n1 'wss://api.cybervox.ai/ws?access_token=(( jwt token ))'

> {"event":"stt","payload":{"success":true,"reason":"","text":"..."}}

Vox_id (dna)

This api calculates the similarity score (the response's payload/score field) between voice files (the request's payload/upload_id field)

You should upload at least 4 audio files using the upload action.
The first audio of the upload batch will be compared with the remaining ones assuming they are from the same person.

VoxID

Please note this is a websocket api (wss://). After connecting to the websocket, send the described request.

Request
  • payload/upload_id (string, mandatory) the upload_id received from upload action (with at least 4 audio/voice files).
  • payload/timestamp (number, optional) the timestamp will be sent back in the response, if provided (for your benchmarking).
Response
  • payload/success (boolean, mandatory) if true, returns score; if false, returns reason.
  • payload/reason (string, if success == false) the failure reason.
  • payload/score (float, if success == true) the similarity score.
  • payload/timestamp (number, if timestamp was received) the same timestamp sent in request, for benchmarking purposes.
notes
  • TODO explain score scale

Request

{
  "emit": "voxid",
  "payload": {
    "upload_id": (( string: upload_id received from upload action ))
    "timestamp": (( number: Date.now() ))
  }
}

Response

{
  "event": "voxid",
  "payload"`: {
    "success": (( bool )),
    "reason": (( string )),
    "score": (( float )),
    "timestamp": (( number ))
  }
}

Example

echo '{"emit":"voxid","payload":{"upload_id":"..."}}' | \
  websocat -n1 'wss://api.cybervox.ai/ws?access_token=(( jwt token ))'

> {"event":"voxid","payload":{"success":true,"reason":"","score":"..."}}

Retrieves jwt token

POST /auth

Given a client_id and a client_secret, returns the jwt access_token.

Body

The credentials to validate.

Responses
  • 200

    OK

  • 401

    Unauthorized

  • 500

    Error retrieving jwt access token

POST /auth
curl \
 -X POST https://api.cybervox.ai/auth \
 -d '{"client_id":"string","client_secret":"string","audience":"string","grant_type":"string"}'
Request example
{
  "client_id": "string",
  "client_secret": "string",
  "audience": "string",
  "grant_type": "string"
}

Health check.

GET /hc

Empty 200 response to check if service is available.
This API should not be directly called.
It's meant to be used by internal tooling.

Responses
GET /hc
curl \
 -X GET https://api.cybervox.ai/hc

Downloads audio.

GET /play/{audio_uuid}

Given an audio uuid, returns the audio download url.

Path parameters
  • audio_uuid Required / string

    The audio uuid returned by the websocket.

Responses
  • 200 file

    OK

  • 404

    Audio not found or expired.

GET /play/{audio_uuid}
curl \
 -X GET https://api.cybervox.ai/play/{audio_uuid}
Response example (200)
"string"

Upgrades connection to websocket.

GET /ws

The websocket protocol documentation is detailed above.

Responses
  • 101

    Switching Protocols

  • 200

    OK

GET /ws
curl \
 -X GET https://api.cybervox.ai/ws