Classifier
The classifier is the part of the Chatto bot that takes the user input and decides what command
(intent) it represents. This classification is passed on to the Finite State Machine to decide what transition to execute.
The training text for the classifier is provided in the clf.yml file:
classification:
- command: "turn_on"
texts:
- "turn on"
- "on"
- command: "turn_off"
texts:
- "turn off"
- "off"
Under classification
you can list the commands and their respective training data under texts
.
Currently, there are two types of classifiers: Naïve-Bayes and K-Nearest Neighbors.
Naïve-Bayes
By default, Chatto uses a Naïve-Bayes classifier. This model takes the words from the texts as features for classification. The Naïve-Bayes Classifier requires at least two classes to be added.
You can optionally turn on Tf-Idf (Term frequency – Inverse document frequency) with the parameters
field, in the clf.yml file', model
object:
model:
classifier: naive_bayes # this could be omitted, as naive_bayes is the default classifier
parameters:
tfidf: true
K-Nearest Neighbors
You can choose a K-Nearest Neighbors (KNN) classifier which uses the average of the fastText word vectors as features for classification. You can specify the number of neighbors under parameters
:
model:
classifier: knn
parameters:
k: 5 # by default k is set to 1
Word vectors
In order to use the word vectors, you must download your language's model and indicate where this file is located using file_name
. In case you don't want to use all the words from the file, you can indicate how much to load using truncate
(this should be a number between 0 and 1). Lastly, you can decide whether or not to skip the words that are not in the vectors file. If these words are not skipped, their vector will be a zero vector.
Your model
object for KNN would look like this:
model:
classifier: knn
parameters:
k: 5
word_vectors:
file_name: ./vectors/wiki.en.vec # where the word vectors file is locatedd
truncate: 0.01 # only 1% of the words will be used
skip_oov: true
Model save & load
You can save your trained model and/or load your saved model by setting the save
and load
fields in the model
object. The field directory
tells Chatto where to read and write the files to.
For example, you could firstly:
model:
classifier: naive_bayes
directory: ./my_model/
save: true # the trained model will be saved to ./my_model/
And then:
model:
classifier: naive_bayes
directory: ./my_model/
load: true # the saved model will be laoded from ./my_model/
Both save
and load
will default to false
, in which case the classifier will only be stored in memory during the bot's execution. The default value for directory
is ./model/
.
Warning
If both save
and load
are set to true, the loaded model will be overwritten.
Pipeline
You can optionally configure the pipeline steps by adding the pipeline
object to the clf.yml file:
pipeline:
remove_symbols: true
lower: true
threshold: 0.3
Currenty, the pipeline steps are:
- Removal of symbols (default
true
) - Conversion into lowercase (default
true
) - Classification threshold (default
0.1
)
Test
You can generate a classification report and confusion matrix from the trained classifier by running the test
command:
chatto test --path ./your/data
The output of this command will look something like this:
INFO[0000] ---- Confusion matrix ----
INFO[0000] greet good bad yes no
INFO[0000] greet 13 0 0 0 0
INFO[0000] good 0 14 0 0 0
INFO[0000] bad 0 0 14 0 0
INFO[0000] yes 0 0 0 5 0
INFO[0000] no 0 0 0 0 5
INFO[0000] ---- Classification report ----
INFO[0000] Precision Recall F1-Score Support
INFO[0000] greet 1.0000 1.0000 1.0000 13
INFO[0000] good 1.0000 1.0000 1.0000 14
INFO[0000] bad 1.0000 1.0000 1.0000 14
INFO[0000] yes 1.0000 1.0000 1.0000 5
INFO[0000] no 1.0000 1.0000 1.0000 5
INFO[0000] Accuracy 1.0000 51
INFO[0000] Macro Avg 1.0000 1.0000 1.0000 51
INFO[0000] Weighted Avg 1.0000 1.0000 1.0000 51