Is Waffles thread safe?

Xinyu Zhou
  • Xinyu Zhou

    Xinyu Zhou - 2013-08-26

    I am using learners in waffles for boosting, so it is important that these learners are thread safe so I can exploit computing power on multi-processor computer.

    • doityth777

      doityth777 - 2013-08-28

      that is also i want to know!

      At 2013-08-27 00:51:50,"Xinyu Zhou" wrote:

      I am using learners in waffles for boosting, so it is important that these learners are thread safe so I can exploit computing power on multi-processor computer.

      Is Waffles thread safe?

      Sent from because you indicated interest in

      To unsubscribe from further messages, please visit

  • Mike Gashler

    Mike Gashler - 2013-08-26

    With one exception, I am careful to implement all of my learners in a manner that never depends on any external or shared resources, so they are all implicitly thread-safe.

    Regarding the one exception, nearly all of my learners depend on a GRand object to generate random numbers. This GRand object is passed to their constructors, and it does not do any synchronization. (For most applications, it is probably not a big deal if the state of the GRand object is corrupted due to some concurrency issue, since it only generates pseudo-random numbers anyway, but this could technically result in some non-deterministic behavior, which I would consider to be a bug.) There are two solutions: 1. Make sure that all of the learners are constructed with different GRand objects, or 2. Add synchronization to the methods in GRand.

    Solution 1 should be pretty straightforward, but watch out if you save your models to a file. The deserialization constructors all use a common GRand object when they reconstruct the models.

    A good way to implement solution 2 would be to make a class called GRandSynchronized that inherits from GRand, and overrides the GRand::next method with one that is synchronized. Then, everything would be thread-safe. Hmm, now that you mention it, I should probably provide such a class with Waffles.

    • Xinyu Zhou

      Xinyu Zhou - 2013-08-27
      • By solution 1 you mentioned, you mean that deserialization of models must be sequential?

      • I suppose that each classifier should use different RNG. If a single Synchronized RNG is passed to different classifiers, I am worrying that the efficiency may suffer from synchronization when use some random algorithms.

      • I am looking forward to the introduction of thread-level parallelism in Waffles, at least learners like RandomForest, that'll be great help and moving Waffles forward.

  • Mike Gashler

    Mike Gashler - 2013-08-27

    The deserializing constructor currently has this signature:
    GResamplingAdaBoost(GDomNode* pNode, GLearnerLoader& ll);
    If you call this constructor, the GLearnerLoader object uses the same GRand object to call the constructor for each model in the ensemble. If you then use these models in different threads, there might be a race condition in GRand::next(), because they will all have the same GRand object in common. I do not think this race condition will have any adverse effects, but it is not technically thread-safe.

    I would like to make it easier to use Waffles in parallel environments in the future, but I am uncertain which of the dominant paradigms would be best to design it for:
    1- Many threads on one machine (with or without OpenMP?),
    2- Cluster computing (with or without MPI?),
    3- GPU parallelism (with CUDA or OpenCL?).
    Which paradigm do you think would be the most productive?

  • Xinyu Zhou

    Xinyu Zhou - 2013-08-29

    That's a knotty problem...

    • Which paradigms one is willing to use depends on the scale of problem they are facing and the computing environments they have. I used to use MPI because of the lack of computing power on single machine. But as now I have access to a really-many-core server, I would prefer using thread for its low overhead. If I am constructing waffles, I would prefer to implement threaded version first, for it is more portable, and less requirement (both hardware and software) needed.

    • And in another aspect, parallelism may introduce in different level: train many classifiers at the same time, or train one classifier using specific parallel algorithm.

    Another question, I've noticed that GMatrix isn't passed with const qualifier, so I am concerning about whether I can pass the same GMatrix to all the classifiers and train then in parallel? Currently I am passing different copies of GMatrix to them. I am sorry that I am not reading through Waffles source code, as I do not have that much time.

    BTW, Waffles seems not applicable to my task. The main problem is that it runs too slow on large datasets. I am training 2M instances with 200 dimensions (about 500MB in size). I've tried a single GDecisionTree, GLinearRegressor, or a GRandomForest with random decision trees, but none of them give me the result afters hours and hours. Till now, the only working algorithm is linear regressor I wrote myself, trained with gradient descent only, with a reasonable number of iteration.

    So I am thinking of expose more parameters that can tuned to an specific Learner (like implementation used for training. for trade-off between time and accuracy). That also makes Waffles more applicable to real tasks.

    Sorry for so verbose.

  • Mike Gashler

    Mike Gashler - 2013-08-30

    You are right, GSupervisedLearner::train should accept const Matrix. I think it is implemented not to alter the matrix, but I should adjust the code to enforce it.

    For a dataset so large, you might consider moving away from batch paradigms toward incremental training algorithms. For example, my GNeuralNet class supports incremental training. I would not be surprised if a neural network with one hidden layer would converge to a better model after seeing only a small portion of your data than a linear regressor after training on all of it. To get an unbiased estimate of current accuracy, you can predict each sample before you train with it. Unfortunately, I have not yet figured out how to expose this via the command-line tools, so you would have to write some C++ code to do this.

    It would also be interesting to profile it to see what is the bottleneck. I have found that the tools "valgrind --tool=callgrind" and "kcachegrind" make it very easy to find performance issues. If you can identify some specific methods that are slow with your application, I would be happy to examine them and see if I could improve their performance.



Cancel  Add attachments