Getting Started
Well need to get an operating system on that pi for you to use. Once we have that we can start the fun part. See below:
DOWNLOAD THE IMAGE
Official images for recommended operating systems are available to download from the Raspberry Pi website Downloads page.
Alternative distributions are available from third-party vendors.
After downloading the .zip
file, unzip it to get the image file (.img
) for writing to your SD card.
WRITING AN IMAGE TO THE SD CARD
With the image file of the distribution of your choice, you need to use an image writing tool to install it on your SD card.
See our guide for your system:
Ok, now you have an OS on your pi. Let’s focus on the voice recognition.
Raspberry Pi Speech Recognition Introduction
This tutorial demonstrate how to use voice recognition on the Raspberry Pi. By the end of this demonstration, we should have a working application that understand and answers your oral question.
This is going to be a simple and easy project because we have a few free API available for all the goals we want to achieve. It basically converts our spoken question into to text, process the query and return the answer, and finally turn the answer from text to speech. I will divide this demonstration into four parts:
- Speech to text
- Query processing
- Text to speech
- Putting Them Together
Hardware and Preparation
You can use an USB Microphone, but I don’t have one so I am using the built-in Mic on my webcam. It worked straight away without any driver installation or configuration.
Any webcam will work.
Of course, the Raspberry Pi as well.
You will also need to have internet connection on your Raspberry Pi.
Speech To Text
Speech recognition can be achieved in many ways on Linux (so on the Raspberry Pi), but personally I think the easiest way is to use Google voice recognition API. I have to say, the accuracy is very good, given I have a strong accent as well. To ensure recording is setup, you first need to make sure ffmpeg is installed:
sudo apt-get install ffmpeg
To use the Google’s voice recognition API, I use the following bash script. You can simply copy this and save it as ‘speech2text.sh‘
[sourcecode language=”bash”]
#!/bin/bash
echo “Recording… Press Ctrl+C to Stop.”
arecord -D “plughw:1,0” -q -f cd -t wav | ffmpeg -loglevel panic -y -i – -ar 16000 -acodec flac file.flac > /dev/null 2>&1
echo “Processing…”
wget -q -U “Mozilla/5.0” –post-file file.flac –header “Content-Type: audio/x-flac; rate=16000” -O – “https://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium” | cut -d” -f12 >stt.txt
echo -n “You Said: ”
cat stt.txt
rm file.flac > /dev/null 2>&1
[/sourcecode]
What it does is, it starts recording and save the audio in a flac file. You can stop the recording by pressing CTRL+C. The audio file is then sent to Google for conversion and text will be returned and saved in a file called “stt.txt”. And the audio file will be deleted.
And to make it executable.
chmod +x speech2text.sh
To run it
./speech2text.sh
The screen shot shows you some tests I did.
Ok that’s it for today. Play around with that and try some of your own tweaks. Part 2 we will focus on getting deeper into the voice recognition and a basic AI. See you next time.