Hi, I'd like to create a raccomandation system which will use transduction
reasoning...is it possible using waffles? By the way, where can I find a good
documentation about transductive recommendation system ?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Waffles can do transduction. It won't help with collecting data, interfacing
with a database, building a web site, attracting customers, e-commerce, or any
of those other things you typically find in such a system.
Here's how I'd approach it: Start by collecting a lot of data. (The more the
better. People are complex creatures. You can't expect to accurately predict
them if you only have a couple thousand training samples.) Next, tinker with
Waffles (or Weka, or any other machine learning package) to figure out which
model and features give good predictions. Finally, integrate the best
predictive model with your system.
I have never looked into it, but I suspect that you will find a dearth of good
documentation about which features are useful for making good predictions
about people's purchasing preferences. Business-type-people are likely to
hoard such information because it makes them money, and academic-type-people
aren't likely to think such info is worthy of publishing.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
...if you already have loads of data, and you're trying to enter something
akin to the Netflix competition, then that's a little bit different. The first
problem you'll probably encounter is that there is more data than you can load
into your computer's memory.
If your data comes from unstructured text, or other sparse forms of data,
Waffles enables you to encode your data as a sparse matrix and train without
ever actually expanding the matrix. This allows you to use huge tables of data
without filling up your memory. Unfortunately, only the Naive Bayes and Neural
Net models have been tested with sparse-matrix training. If one of those two
choices meets your needs, then you're all set.
If your dataset is too big, it's usually a good idea to sub-sample, and then
use PCA to reduce the dimensionality of your data to create a dataset of
reasonable size. Then, you can try cross-validation with lots of different
models and different parameters without having to wait for days for each
experiment to complete. After you find one that works reasonably well on the
smallish dataset, then try it on the large one.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, I'd like to create a raccomandation system which will use transduction
reasoning...is it possible using waffles? By the way, where can I find a good
documentation about transductive recommendation system ?
Waffles can do transduction. It won't help with collecting data, interfacing
with a database, building a web site, attracting customers, e-commerce, or any
of those other things you typically find in such a system.
Here's how I'd approach it: Start by collecting a lot of data. (The more the
better. People are complex creatures. You can't expect to accurately predict
them if you only have a couple thousand training samples.) Next, tinker with
Waffles (or Weka, or any other machine learning package) to figure out which
model and features give good predictions. Finally, integrate the best
predictive model with your system.
I have never looked into it, but I suspect that you will find a dearth of good
documentation about which features are useful for making good predictions
about people's purchasing preferences. Business-type-people are likely to
hoard such information because it makes them money, and academic-type-people
aren't likely to think such info is worthy of publishing.
...if you already have loads of data, and you're trying to enter something
akin to the Netflix competition, then that's a little bit different. The first
problem you'll probably encounter is that there is more data than you can load
into your computer's memory.
If your data comes from unstructured text, or other sparse forms of data,
Waffles enables you to encode your data as a sparse matrix and train without
ever actually expanding the matrix. This allows you to use huge tables of data
without filling up your memory. Unfortunately, only the Naive Bayes and Neural
Net models have been tested with sparse-matrix training. If one of those two
choices meets your needs, then you're all set.
If your dataset is too big, it's usually a good idea to sub-sample, and then
use PCA to reduce the dimensionality of your data to create a dataset of
reasonable size. Then, you can try cross-validation with lots of different
models and different parameters without having to wait for days for each
experiment to complete. After you find one that works reasonably well on the
smallish dataset, then try it on the large one.
If you are still interested, I added a demo recommendation system to the
latest release. You might find it to be helpful.