I want to report a bug and propose enhancement (and help to implement it:)).
The bug: WAG model (waffles_learn)
You can train a WAG model and serialise it. However there is no way to predict with it since it can’t be de-serializsed. By looking in the code GLearnerLoader::loadLearner() does not handle the WAG class and exception is always being thrown.
The Enhancement: I/O
I am scripting waffles_learn processes and it is great that the model is being output to stdout. However waffles_learn does not support input from stdin (or named pipe). The reason is that that GTokenizer class seeks at the end of the file in order to determine it’s length. The same is done in GFile used by CSV parser when loading the file contents.
For my use case scenarios it will be great if I can feed data through stdin or named pipe. In order to do so I have two options:
2.1. Write a wrapper around LearnerLib and use it instead of waffles_learn.
2.2. Patch/Reimplement parts of the I/O in order to support input from streams (like stdin and named pipes).
IMHO, the latter is better and can be useful for other developers as well. If you agree with this I volunteer to implement it and submit patches here. Will have few questions of course like: what is the preferred way to specify the input format of the data (right now it is deduced from the file extension)?
Again, thanks for the library.
BR
Vlad
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you for pointing out these issues! I have added GWag to the GLearnerLoader::loadLearner method.
Your proposed I/O enhancement sounds like a great contribution. Changing the GTokenizer class to work without measuring the length of the file sounds like the right solution to me. I suppose it should read until it finds EOF. I did not think of this before. Your patches would be very welcome.
I do not know the best way to determine the data format. Perhaps, one solution might be to just assume the most common format, and let the user specify a flag to indicate when some other format is used.
Mike
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
First of all thank you for the great library.
I want to report a bug and propose enhancement (and help to implement it:)).
You can train a WAG model and serialise it. However there is no way to predict with it since it can’t be de-serializsed. By looking in the code GLearnerLoader::loadLearner() does not handle the WAG class and exception is always being thrown.
I am scripting waffles_learn processes and it is great that the model is being output to stdout. However waffles_learn does not support input from stdin (or named pipe). The reason is that that GTokenizer class seeks at the end of the file in order to determine it’s length. The same is done in GFile used by CSV parser when loading the file contents.
For my use case scenarios it will be great if I can feed data through stdin or named pipe. In order to do so I have two options:
2.1. Write a wrapper around LearnerLib and use it instead of waffles_learn.
2.2. Patch/Reimplement parts of the I/O in order to support input from streams (like stdin and named pipes).
IMHO, the latter is better and can be useful for other developers as well. If you agree with this I volunteer to implement it and submit patches here. Will have few questions of course like: what is the preferred way to specify the input format of the data (right now it is deduced from the file extension)?
Again, thanks for the library.
BR
Vlad
Vlad,
Thank you for pointing out these issues! I have added GWag to the GLearnerLoader::loadLearner method.
Your proposed I/O enhancement sounds like a great contribution. Changing the GTokenizer class to work without measuring the length of the file sounds like the right solution to me. I suppose it should read until it finds EOF. I did not think of this before. Your patches would be very welcome.
I do not know the best way to determine the data format. Perhaps, one solution might be to just assume the most common format, and let the user specify a flag to indicate when some other format is used.
Mike