Making A Text2Speech Application In Android Using Amazon Web Services (AWS Polly)

Amazon Web Services - Using Custom UI Templates in Sagemaker

A text-to-speech application is an application that converts text into life-like speech. In this project, we will be making use of Android Studio IDE for our front end and Amazon Web Services (AWS) as our back end. AWS provides a lot of services including AWS Polly which we will be using in this project. AWS Polly uses deep learning technologies to synthesize natural-sounding human speech.

In this article, we will be focusing on how to use AWS Polly and integrate it with our frontend.

AWS services used in this project:

AWS Polly: To convert text to life-like speech.
AWS Lambda Function: To receive a text from the front end and provide it to AWS Polly.
S3 Bucket: To store the generated.mp3 file.
Amazon API Gateway: To call the lambda function with text.

Note: All of these services will be used within free-tier limits.

Here is the architecture of the project:

Architecture

Note: This tutorial assumes you have some basic knowledge about AWS and its services like how to create and use one.

First Step

The first step is to create an S3 bucket so that our lambda function can store the .mp3 file generated by AWS Polly

After the bucket creation we can move on towards making our backend, AWS Lambda function.

AWS Lambda Function

AWS Lambda function is a serverless function which manages all communications in the application.

The lambda function serves an important task of calling the AWS Polly and getting the .mp3 file to store it inside an S3 bucket and to return a temporary URL back to our application.

It can be done like this:

import boto3
import string
import random
import json

def lambda_handler(event, context):
      # Getting Text and Voice ID as input from API Call
    text = event['text']
    speaker = event['speaker']

    # Making variables to use AWS Polly and S3 Bucket
    polly = boto3.client('polly')
    s3 = boto3.client('s3')
    
    # Generating random strings to store out .mp3 file 
    # Doing this so that if 2 people make simultaneous API calls the .mp3 files will not overwrite themselves and give wrong output
    random_string = ''.join(random.choices(string.ascii_lowercase + string.digits, k=5))
    MP3_Name = f"{random_string}.mp3"
    print(MP3_Name)
    try:
        # Synthesize speech using Polly
        response = polly.synthesize_speech(Text=text, OutputFormat='mp3', VoiceId=speaker)

        # Store audio in a temporary S3 bucket
        s3.put_object(
            Bucket='Your-Bucket-Name',
            Key=MP3_Name, 
            Body=response['AudioStream'].read()
        )


        # Generate a presigned temporary URL for the audio file
        presigned_url = s3.generate_presigned_url(
            ClientMethod='get_object',
            Params={'Bucket': 'Your-Bucket-Name', 'Key': MP3_Name},
            ExpiresIn=3600 
        )

        return {
            'statusCode': 200,
            'body': presigned_url
        }

    except Exception as e:
        print(f"Error: {e}")
        return {
            'statusCode': 500,
            'body': 'Error processing text-to-speech'
        }

Remember to give your Lambda function the necessary permissions. For simplicity we can give our Lambda function AWS Polly Full Access permission, S3 Bucket Full Access Permission

Time to test the Lambda function. Configure the lambda test event as shown in the image below.

test event configuration.

Your Lambda Function output must look something like this:

Lambda Function Output

And if you check your S3 Bucket you will notice a .mp3 file with a random 5-character name, as we have programmed in the lambda function code.

API Gateway

Now, it’s time to make an API gateway using Amazon API Gateway and direct it to the lambda function we created.

Build a new REST API and make a resource ‘/Speech’. This will be the path we need to call our API on. Enable CORS. Create a POST method and direct it to the Lambda Function you made previously.

It should look something like this.

Deploy your API, create a new stage say ‘dev’ and deploy it. it will generate an API URL which will be used to call the API.

Android Application

To make the frontend, we will be using Android Studio IDE and Java as the programming language.

Create an empty activity android project. First we will add the required dependencies and permissions inside the application. For API calling purpose we will be using volley. Head to build.gradle (Module: app) and add this dependency.

implementation 'com.android.volley:volley:1.2.1'

Sync your Gradle file and then head to AndroidManifest.xml. Here we need to add Internet permission so that the application has access to the internet so that it can call the API. Add the following line in your AndroidManifext.xml file

<uses-permission android:name="android.permission.INTERNET"/>

In this tutorial I will be only explaining the Java code.

Summary of the Java code:

When the ‘Convert’ button is pressed, the program will get the text and VoiceID you selected. If nothing is selected it will handle the exceptions as needed
If everything is file, a new Volley request object with the API URL is made with the JSON String as input. The JSON String contains the text and VoiceID. It’s the same format which we used to test the Lambda Function.
After the request object is successfully made it will add the request into queue to be run and call the API. If everything goes as planned the API should return a JSON String with Status Code as 200 and a pre-signed temporary URL pointing to the saved .mp3 file inside the S3 bucket.
The Application will receive the temporary URL and will attempt to play the audio.
If everything goes file, the audio will be played.

Note: Whenever you call the AWS Lambda Function for the first time after a while, it goes through a state called ‘Cold Start’ where it first allocates the memory to the program which adds to the delay. But this will only happen once when you call it after a long time. After the Cold Start the program is already inside the memory so it delay much after that.

package com.example.text2speech;

import androidx.appcompat.app.AppCompatActivity;

import android.os.Bundle;
import android.view.View;
import android.widget.AdapterView;
import android.widget.ArrayAdapter;
import android.widget.Button;
import android.widget.EditText;
import android.media.MediaPlayer;
import android.widget.Spinner;
import android.widget.Toast;

import com.android.volley.Request;
import com.android.volley.RequestQueue;
import com.android.volley.Response;
import com.android.volley.VolleyError;
import com.android.volley.toolbox.JsonObjectRequest;
import com.android.volley.toolbox.Volley;

import org.json.JSONException;

import org.json.JSONObject;

public class MainActivity extends AppCompatActivity {
    // Declaring all the variables
    EditText text;
    Button convert;
    MediaPlayer mediaPlayer;
    private RequestQueue requestQueue;
    Spinner dropDown;
    private String Text = "";
    private String Speaker = "";

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);
        text = findViewById(R.id.text);
        convert = findViewById(R.id.convertBTN);
        dropDown = findViewById(R.id.dropDown);
        requestQueue = Volley.newRequestQueue(this);
        
          // This is for drop down menu made using Spinner in XML
        String[] speaker = new String[]{"Select Speaker", "Joanna", "Joey"};
        ArrayAdapter<String> adapter = new ArrayAdapter<>(this, android.R.layout.simple_spinner_dropdown_item, speaker);
        dropDown.setAdapter(adapter);

        convert.setOnClickListener(new View.OnClickListener() {
            @Override
            public void onClick(View view) {
                Text = text.getText().toString();
                  
                  // If text is empty or Speaker is not selected it will tell tell the user to add text and select the speaker first
                if(Speaker.equals("") || Text.equals("") || Speaker.equals("Select Speaker")){
                    Toast.makeText(MainActivity.this, "Please enter text and select a speaker", Toast.LENGTH_SHORT).show();
                }
                else {
                    CallAPI(Text, Speaker);
                }
            }
        });
        
        dropDown.setOnItemSelectedListener(new AdapterView.OnItemSelectedListener() {
            @Override
            public void onItemSelected(AdapterView<?> parent, View view, int position, long id) {
                Speaker = parent.getItemAtPosition(position).toString();
            }

            @Override
            public void onNothingSelected(AdapterView<?> parent) {

            }
        });
    }
    
    // Calling the API
    void CallAPI(String Text, String speaker){
        // Please paste your API URL here
        String apiUrl = "secret";
        
        // Creating a JSON String to pass into the API.
        JSONObject jsonBody = new JSONObject();
        try {
            jsonBody.put("text", Text); // Adding the text we want to convert to speech
            jsonBody.put("speaker", speaker); // Adding the Voice ID
        } catch (JSONException e) {
            e.printStackTrace();
        }
        
        // Creating a request object with JSON String as Input
        JsonObjectRequest request = new JsonObjectRequest(Request.Method.POST, apiUrl, jsonBody, new Response.Listener<JSONObject>() {
            @Override
            public void onResponse(JSONObject response) {
                try {
                    String statusCode = response.getString("statusCode");
                    if(statusCode.equals("500")){
                        Toast.makeText(MainActivity.this, "Error while using API", Toast.LENGTH_SHORT).show();
                    }
                    else {
                        // If StatusCode = 200, we will extract the temporary URL from the response body. 
                        String presignedUrl = response.getString("body");
                        // Using the temporary url to play the speech
                        playAudio(presignedUrl);
                    }
                } catch (JSONException e) {
                    e.printStackTrace();
                }
            }
        }, new Response.ErrorListener() {
                @Override
                public void onErrorResponse(VolleyError error) {
                    System.out.println(error.toString());
                }
            }
        );
        // Actually calling the API from here.
        requestQueue.add(request);
    }

    private void playAudio(String presignedUrl) {
        if (mediaPlayer != null) {
            mediaPlayer.release();
        }
        // Creating MediaPlayer object and assigning it to play the .mp3 file from the temporary URL.
        mediaPlayer = new MediaPlayer();
        try {
            mediaPlayer.setDataSource(presignedUrl);
            mediaPlayer.prepare();
            mediaPlayer.start();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    @Override
    protected void onDestroy() {
        super.onDestroy();
        if (mediaPlayer != null) {
            mediaPlayer.release();
        }
    }
}

Output

Application Interface

Conclusion

AWS Polly is a powerful deep learning tool that is used to convert text into life like speech. In this article we learned that how we can use different AWS services and integrate with a frontend technology to create a full stack working application.

Text2Speech application in Android using AWS Polly – FAQ’s

Do I have to pay AWS for making this project?

Everything done in this project is under AWS Free tier which provides some free services for 12 months for learning purpose. You do need to delete everything before your free tier ends to not to get charged.

How is the Android Application able to communicate with AWS Polly?

We are using an API Gateway and AWS Lambda as a means of communication between Android Application and AWS Polly.

Does AWS Polly support multiple languages?

AWS Polly does support multiple languages other than English. If you want your output in some other language, you then need to pass some more arguments from the main application to get the desired output.

Tags:

#Amazon #Dev Scripter 2024 #Amazon Web Services #Dev Scripter #Amazon