[Dclib-devel] CNN for Text Classification
Brought to you by:
davisking
|
From: Stefan S. <st...@sc...> - 2017-03-09 10:50:49
|
Hi dlib-users,
I guess this question is both related to MITIE and dlib: I would really
like to contribute an example for text classification with a CNN. I'm
working with the MNIST example code and I've got a few questions:
In MITIE I can use pretrained word embeddings from the english model
(total_word_feature_extractor.dat). E.g. the wordrep tool shows the
feature vector for a word with --test <Word>.
What is the correct way to the get a sentence representation? At the
moment I' using the following code:
matrix<float,0,1> sentence_matrix;
for (auto &word : sentence) {
matrix<float,0,1> feats;
fe.get_feature_vector(word,feats);
join_rows(sentence_matrix, feats);
}
testing_set.emplace_back(sentence_matrix);
testing_labels.emplace_back(1);
Of course I have to pad the sentence, but would this code create a
correct sentence representation which I could later use to train the
network? The intention is to create a n x k representation for the
sentence (n = length of sentence, k is length of feature vector from
word embedding). For the concatenation of each word I found the
join_rows method, or should I use something else?
Thanks in advance + regards,
Stefan
|