Text-to-Speech web app with Vue, OpenAI and Go // The GitFather

Before we started working with React at e.GO Mobile, we used Vue.

This time I will show you how to quickly build a nice looking Single Page Application with Vuetify, which converts text to audio (in other words: reads the text aloud), using Open AI’s text-to-speech API.

As usual, there is an example project on GitHub for trying out:

Screenshot1

The backend is written in Go with support of automatic restart on code changes.

Requirements

You need an installation of Docker with docker-compose to run both, the frontend and backend, via docker-compose.yml at the same time.

The rest is available and will be handled in the containers.

Backend

The Go backend uses FastHTTP and the compatible fasthttprouter for handling the requests with optional CORS support.

Automatic restart of the backend on code change is not supported automatically. For this I had to setup the backend service in docker-compose.yml with Air by Rick Yu:

  backend:
    image: cosmtrek/air
    working_dir: /go/src/app
    env_file:
      - ./.env
      - ./backend/.env
      - ./backend/.env.local
    environment:
      - GO111MODULE=on
    ports:
      - 4000:4000
    volumes:
      - ./backend:/go/src/app

Before all will work, you have to update the .env.local in backend subfolder with a valid OpenAI API key by adding OPENAI_API_KEY entry.

The REST service will provide a POST endpoint at http://localhost:4000, which expects the text and the voice for the transmission:

// ...

	router.POST("/", func(ctx *fasthttp.RequestCtx) {
		sendError := func(err error) {
			ctx.SetStatusCode(500)
			ctx.WriteString(err.Error())
		}

		// read data from request
		postBodyData := ctx.PostBody()

		var postBody TextToSpeechRequestBody
		err := json.Unmarshal(postBodyData, &postBody)
		if err != nil {
			sendError(err) // parsing input as JSON failed
			return
		}

		// setup request for OpenAI
		requestBody := map[string]interface{}{
			"model":           "tts-1",
			"input":           postBody.Text,
			"response_format": "opus",
			"voice":           postBody.Voice,
		}

		requestBodyData, err := json.Marshal(requestBody)
		if err != nil {
			sendError(err) // could not create JSON string from `requestBody`
			return
		}

		// start the request
		request, err := http.NewRequest("POST", "https://api.openai.com/v1/audio/speech", bytes.NewBuffer(requestBodyData))
		if err != nil {
			sendError(err) // failed
			return
		}

		// setup headers like API key
		request.Header.Set("Content-Type", "application/json")
		request.Header.Set("Authorization", fmt.Sprintf("Bearer %s", OPENAI_API_KEY))

		client := &http.Client{}
		response, err := client.Do(request)
		if err != nil {
			sendError(err) // could not doing the request
			return
		}

		responseBodyData, err := io.ReadAll(response.Body)
		if err != nil {
			sendError(err) // could not read the response
			return
		}

		if response.StatusCode != 200 {
			sendError(fmt.Errorf("unexpected response: %v", response.StatusCode))
			return
		}

		// now prepare response data
		// for returning as data URI
		base64ResponseBodyData := base64.StdEncoding.EncodeToString(responseBodyData)
		responseBodyDataURI := fmt.Sprintf("data:audio/ogg;base64,%s", base64ResponseBodyData)
		responseBodyDataURIData := []byte(responseBodyDataURI)

		ctx.SetStatusCode(200)
		ctx.Response.Header.Add("Content-Length", fmt.Sprint(len(responseBodyDataURIData)))
		ctx.Response.Header.Add("Content-Type", "text/plain; charset=UTF-8")
		ctx.Write(responseBodyDataURIData)
	})

// ...

Frontend

The frontend is set uped with Vuetify, Vite and TypeScript.

If you want to create a new Vuetify project, simply run

npm create vuetify

in your terminal and follow the instructions.

When you start a project with npm run dev, the frontend will be reloaded on each code change.

The web app uses the previously mentioned backend endpoint and plays the audio data as a data URI using the browser’s Audio API:

<script lang="ts">
  import axios from 'axios';

  export default {
    data: () => {
      return {
        isSendingText: false,
        lastStatus: '',
        textValue: '',
        voice: 'Alloy'
      };
    },
    
    methods: {
      async SendText() {        
        const textToSend = String(this.textValue).substring(0, 4000).trim();
        if (textToSend === "") {
          return;  // not enough data
        }

        const voice = String(this.voice).toLowerCase().trim() || "alloy";

        this.isSendingText = true;

        try {
          const {
            data,
            status
          } = await axios.post('http://localhost:4000/', {
            text: textToSend,
            voice
          }, {
            responseType: 'text'
          });

          if (status !== 200) {
            throw new Error(`Unexpected response: ${data}`);
          }

          const audio = new Audio(data);
          audio.volume = 0.5;
          audio.play();

          console.log('Playing audio ...');
        } catch (error) {
          console.error('[ERROR]', 'SendText()', error)
        } finally {
          this.isSendingText = false;
        }
      }
    }
  }
</script>

Start the demo

Before you can run

docker compose up

you have to ensure that the following folders contain a .env.local file:

After anything started, you should be able to open the frontend with http://localhost:3000

Conclusion

You see again: No big deal 😎

Vuetify is a very nice and powerful open source UI framework for SPAs.

Now with information from my previous posts, you should be able now to create your own kind of Alexa 😉

Have fun while trying it out! 🎉