Create a voice chatbot in python using NLTK, Speech Recognition, Google (text-to-speech)& Scikit-learn

Nitin K. Chauhan
6 min readMay 21, 2020

--

icon used :https://www.flaticon.com/premium-icon/chatbot_1698586

Chatbots are a class of intelligent, conversational systems that works by natural language input that can be in the form of text, voice, or both. They provide conversational output in response, and are sometimes used for task execution. However, chatbot technologies have existed since the 1960s and have influenced user interface development in games since the early 1980s.

- Radziwill, Nicole M., and Morgan C. Benton. “Evaluating quality of chatbots and intelligent conversational agents.” arXiv preprint arXiv:1704.04579 (2017).

Disclaimer :

Since this guide is for beginners, this is a relatively simple implementation of nltk and speech recognition. It does not talk about using heavy stuff like integrating frameworks like RASA for NLU or RDB to store QA improvement. With all this in mind, let us get started.

We will create a chatbot interacting via voice input and voice output like popular personal assistant apps like Siri and Alexa in python.

Our Bot uses an offline backend corpus as a knowledge base which user can change by merely tweaking the backend corpus by adding their personalization to answers from Bot. It is a simple command-line implementation for beginners, but to make it look interesting will be adding things like emotion detection, greeting function, and a color pallet to distinguish between questions and answers.

Want to read this story later? Save it in Journal.

key steps include

  1. Backend Corpus serving as Bot’s knowledge base.
  2. User initializing verbal input via a microphone
  3. Conversion of input query information into respective text form.
  4. Classification of input type using Naive Bayes between question or emotion type.
  5. Processing of input using NLP & Computing answers from the Corpus.
  6. Computing best possible answers via TF-IDF score between question and answers for Corpus
  7. Conversion of best Answer into Voice output.

Downloading and installing packages. We will be installing python libraries nltk, NumPy, gTTs (google text-to-speech), scikit-learn and, SpeechRecognition using pip. Rest, we will be installing mpg123, portaudio, for accessing the microphone from the system.

pip install SpeechRecognition numpy gTTs sklearn
sudo apt-get install mpg123
sudo apt-get install portaudio19-dev python-all-dev python3-all-dev

Calling Libraries, Done with installing the needful dependencies, we will start with our script from the import section. Also, note that nltk.download command will download the mentioned corpora on the first run, after which its recommended to comment these “.download” commands; otherwise, it will repeatedly search for mentioned corpora adding to run time.

import io
import random
import string
import warnings
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import warnings
from gtts import gTTS
import os
warnings.filterwarnings('ignore')
import speech_recognition as sr
import nltk
from nltk.stem import WordNetLemmatizer
#for downloading package files can be commented after First run
nltk.download('popular', quiet=True)
nltk.download('nps_chat',quiet=True)
nltk.download('punkt')
nltk.download('wordnet')

We will write a function to classify user input, which uses nps_chat corpora and naive Bayes classifier to categorize the input type by classifying them into listed categories.Where else Bot will answer only to Question type classes.

#Greet
#”Bye”>
#”Clarify”>
#”Continuer”>
#”Emotion”>
#”Emphasis”>
#”Greet”>
#”Reject”>
#”Statement”>
#”System”>
#”nAnswer”>
#”whQuestion”>
#”yAnswer”>
#”ynQuestion”>
#”Other”

posts = nltk.corpus.nps_chat.xml_posts()[:10000]# To Recognise input type as QUES. 
def dialogue_act_features(post):
features = {}
for word in nltk.word_tokenize(post):
features['contains({})'.format(word.lower())] = True
return features
featuresets = [(dialogue_act_features(post.text), post.get('class')) for post in posts]
size = int(len(featuresets) * 0.1)
train_set, test_set = featuresets[size:], featuresets[:size]
classifier = nltk.NaiveBayesClassifier.train(train_set)

To make our chatbot more engaging and interactive, we will create a greeting function.

# Keyword Matching
GREETING_INPUTS = ("hello", "hi", "greetings", "sup", "what's up","hey",)
GREETING_RESPONSES = ["hi", "hey", "*nods*", "hi there", "hello", "I am glad! You are talking to me"]
def greeting(sentence):
"""If user's input is a greeting, return a greeting response"""
for word in sentence.split():
if word.lower() in GREETING_INPUTS:
return random.choice(GREETING_RESPONSES)

We will call Corpus of our chatbot and perform some NLP pre-processing steps on it, i.e., Sentence and Word Tokenization, Lemmatization, Normalisation. We are calling our Corpus as intro_join. Here We have taken an article from Wikipedia about a fungal disorder Tinea pedis (athlete’s foot) pasted into the text file. You can use your choice data by merely placing it in a file named “intro_join.”

#Reading in the input_corpus
with open('intro_join','r', encoding='utf8', errors ='ignore') as fin:
raw = fin.read().lower()
#TOkenisation
sent_tokens = nltk.sent_tokenize(raw)# converts to list of sentences
word_tokens = nltk.word_tokenize(raw)# converts to list of words
# Preprocessing
lemmer = WordNetLemmatizer()
def LemTokens(tokens):
return [lemmer.lemmatize(token) for token in tokens]
remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)
def LemNormalize(text):
return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))

To make it look appealing, we will add the color pallet, which is a list of functions using the format command, for adding color to std terminal output.

#colour palet
def prRed(skk): print("\033[91m {}\033[00m" .format(skk))
def prGreen(skk): print("\033[92m {}\033[00m" .format(skk))
def prYellow(skk): print("\033[93m {}\033[00m" .format(skk))
def prLightPurple(skk): print("\033[94m {}\033[00m" .format(skk))
def prPurple(skk): print("\033[95m {}\033[00m" .format(skk))
def prCyan(skk): print("\033[96m {}\033[00m" .format(skk))
def prLightGray(skk): print("\033[97m {}\033[00m" .format(skk))
def prBlack(skk): print("\033[98m {}\033[00m" .format(skk))

Now we will create a function for processing the user response converting it into a vectorized form and get the best result from Corpus via computing TF-IDF cosine similarity between the question and answer.

# Generating response and processing 
def response(user_response):
robo_response=''
sent_tokens.append(user_response)
TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english')
tfidf = TfidfVec.fit_transform(sent_tokens)
vals = cosine_similarity(tfidf[-1], tfidf)
idx=vals.argsort()[0][-2]
flat = vals.flatten()
flat.sort()
req_tfidf = flat[-2]
if(req_tfidf==0):
robo_response=robo_response+"I am sorry! I don't understand you"
return robo_response
else:
robo_response = robo_response+sent_tokens[idx]
return robo_response

Now we are almost ready to print the First message from Bot. Being an Ironman fan, I have named mine after him. 😃

#Recording voice input using microphone 
file = "file.mp3"
flag=True
fst="My name is Jarvis. I will answer your queries about Science. If you want to exit, say Bye"
tts = gTTS(fst, 'en')
tts.save(file)
os.system("mpg123 " + file )
r = sr.Recognizer()
prYellow(fst)

Now the snippet below uses all the functions we created so far. It is using speech recognition for registering user input using the microphone. Converting it into text form, Searching for its answers from the processed Corpus, and returning the output using text-to-speech. It will continue taking user input and answering until the user says Bye/Goodbye.

while(flag==True):
with sr.Microphone() as source:
audio= r.listen(source)
try:
user_response = format(r.recognize(audio))
print("\033[91m {}\033[00m" .format("YOU SAID : "+user_response))
except sr.UnknownValueError:
prYellow("Oops! Didn't catch that")
pass

#user_response = input()
#user_response=user_response.lower()
clas=classifier.classify(dialogue_act_features(user_response))
if(clas!='Bye'):
if(clas=='Emotion'):
flag=False
prYellow("Jarvis: You are welcome..")
else:
if(greeting(user_response)!=None):
print("\033[93m {}\033[00m" .format("Jarvis: "+greeting(user_response)))
else:
print("\033[93m {}\033[00m" .format("Jarvis: ",end=""))
res=(response(user_response))
prYellow(res)
sent_tokens.remove(user_response)
tts = gTTS(res, 'en')
tts.save(file)
os.system("mpg123 " + file)
else:
flag=False
prYellow("Jarvis: Bye! take care..")

Complete Final script is Here. Now let us run the Bot. I have pasted a few screenshots of the final output that will look like this.

Conclusion

We have demonstrated development of a Voice chatbot for voice input-output. As mentioned in the disclaimer, this is a pretty straightforward implementation accompanied by enhanced interfacing using speech recognition.

Furthermore, if you are feeling motivated and thrilled for chatbots, you can plan on considering the integration of Natural Language Understanding(NLU) using frameworks like RASA. You may also consider using relational and cloud-based databases like SPARQL which would be great to store the Q&A log, and later on, applying some deep learning will improve your Bot a lot. Later on, GUI can also be integrated using the Django framework.

The original piece of the script was Sourced from Here.

I hope you find this article useful .!!! Here is the project link.

Also, thanks for the response on the previous version of this story, That offered chance for me to voluntarily take that down for correction and improvements .

📝 Save this story in Journal.

👩‍💻 Wake up every Sunday morning to the week’s most noteworthy stories in Tech waiting in your inbox. Read the Noteworthy in Tech newsletter.

--

--

Nitin K. Chauhan

Hello i am Doctoral Fellow at School of Computational and Integrative Sciences(SC&IS) works in the field of Text and Data mining and NLP.