[Dclib-devel] CNN for Text Classification
Brought to you by:
davisking
From: Stefan S. <st...@sc...> - 2017-03-09 10:50:49
|
Hi dlib-users, I guess this question is both related to MITIE and dlib: I would really like to contribute an example for text classification with a CNN. I'm working with the MNIST example code and I've got a few questions: In MITIE I can use pretrained word embeddings from the english model (total_word_feature_extractor.dat). E.g. the wordrep tool shows the feature vector for a word with --test <Word>. What is the correct way to the get a sentence representation? At the moment I' using the following code: matrix<float,0,1> sentence_matrix; for (auto &word : sentence) { matrix<float,0,1> feats; fe.get_feature_vector(word,feats); join_rows(sentence_matrix, feats); } testing_set.emplace_back(sentence_matrix); testing_labels.emplace_back(1); Of course I have to pad the sentence, but would this code create a correct sentence representation which I could later use to train the network? The intention is to create a n x k representation for the sentence (n = length of sentence, k is length of feature vector from word embedding). For the concatenation of each word I found the join_rows method, or should I use something else? Thanks in advance + regards, Stefan |