From: Ted P. <dul...@gm...> - 2008-05-20 01:11:58
|
Hi Sid and Varada, marimba got updated today, and is now running a very current kernel (2.6.24) and is now an unbuntu server, which should make keeping it updated with patches etc. much easier than it was before (when it was running a very old version of White Box Linux). marimba also has 12GB of RAM, so it's a fairly formidable machine. I have installed WordNet-Similarity and brought up the web server, and that all went pretty well. A few notes from the install, as it's a different kind of setup than we used to do on White Box (which was more or less following RedHat/Fedora conventions). I installed apache2 as follows: apt-get install apache2 I also needed to install the perl CGI module from CPAN. cpan > install CGI The document base on unbuntu/apache is /var/www and the cgi-bin location is /usr/lib/cgi-bin - these are the same as Varada observed on our development machine charango. I needed to change either similarity.cgi or wps.cgi to have doc_base of "../../similarity" - one of them already had it set as that, so probably it would be good to change both to have the same (since normally they would need to be the same). I also noticed that there is a very small bug in the index.html file that we have in docs - that refers to similarity.cgi, but really it should refer to ../cgi-bin/similarity/similarity.cgi - this is a very tiny thing, but again probably worth fixing. Otherwise, I don't think I needed to modify or change anything else. Right now I have similarity_server.pl running without any options, but I'll add an error log (going to /var/log/similarity_server.log I think....and then give it a stoplist as well (our standard one). I will also modify /etc/rc.local to start similarity_server.pl on reboot, once we get closer to making this our production server. I'm going to hold off on doing that shift until we have the new release of WordNet-Similarity out I think, as that will make for a nice transition point. But, meanwhile you should generally find the web server up and running on our newly upgraded marimba, so do give it a try and see what you think. http://marimba.d.umn.edu has a pointer to the marimba web server for WordNet-Similarity. talisker remains the "production" machine for now... I will make these small changes mentioned above to the cvs version of WordNet-Similarity in the next day or two...I might also add some info about the usual ubuntu defaults since I feel more comfortable with that now... Do let me know if you notice anything strange or if there are other small changes we might want to consider making! Thanks, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse |
From: Ted P. <dul...@gm...> - 2008-05-20 01:21:22
|
BTW, I attempted to install WordNet using apt-get, as in .... apt-get install wordnet and while that did install the wn command, it did not seem to install tk or tcl or any of the WordNet dict files, so it was mostly worthless for our purposes. :) So I went back and installed WordNet by downloading the tar file from WordNet, and configuring, etc. as usual. It did not find tk and tcl on my system, so I installed those via apt-get install tcl apt-get install tk but that did not seem to satisfy WordNet, as it was looking for certain header files in those distributions I think....so I had to install Tk and Tcl as usual (by downloading, etc.) Anyway, once I did the usual WordNet install, then I installed WordNet- Similarity via CPAN cpan >install WordNet-Similarity and that worked just fine (as it usually does). So, even in the ubuntu world of apt-get, we'll still need to install WordNet and Tk, Tcl as before, although I don't expect we'll have to do that too often. Thanks! Ted On Mon, May 19, 2008 at 8:11 PM, Ted Pedersen <dul...@gm...> wrote: > Hi Sid and Varada, > > marimba got updated today, and is now running a very current kernel > (2.6.24) and > is now an unbuntu server, which should make keeping it updated with patches > etc. much easier than it was before (when it was running a very old > version of White > Box Linux). marimba also has 12GB of RAM, so it's a fairly formidable machine. > > I have installed WordNet-Similarity and brought up the web server, and > that all went > pretty well. A few notes from the install, as it's a different kind of > setup than we used > to do on White Box (which was more or less following RedHat/Fedora conventions). > > I installed apache2 as follows: > > apt-get install apache2 > > I also needed to install the perl CGI module from CPAN. > > cpan >> install CGI > > The document base on unbuntu/apache is /var/www and the > cgi-bin location is /usr/lib/cgi-bin - these are the same as Varada > observed on our development machine charango. > > I needed to change either similarity.cgi or wps.cgi to have doc_base > of "../../similarity" - one of them already had it set as that, so probably > it would be good to change both to have the same (since normally > they would need to be the same). > > I also noticed that there is a very small bug in the index.html file that > we have in docs - that refers to similarity.cgi, but really it should refer > to ../cgi-bin/similarity/similarity.cgi - this is a very tiny thing, but again > probably worth fixing. > > Otherwise, I don't think I needed to modify or change anything else. > > Right now I have similarity_server.pl running without any options, but > I'll add an error log (going to /var/log/similarity_server.log I think....and > then give it a stoplist as well (our standard one). I will also modify > /etc/rc.local to start similarity_server.pl on reboot, once we get closer > to making this our production server. > > I'm going to hold off on doing that shift until we have the new release of > WordNet-Similarity out I think, as that will make for a nice transition > point. But, meanwhile you should generally find the web server up and > running on our newly upgraded marimba, so do give it a try and > see what you think. > > http://marimba.d.umn.edu > > has a pointer to the marimba web server for WordNet-Similarity. talisker > remains the "production" machine for now... > > I will make these small changes mentioned above to the cvs version of > WordNet-Similarity in the next day or two...I might also add some info > about the usual ubuntu defaults since I feel more comfortable with that > now... > > Do let me know if you notice anything strange or if there are other small > changes we might want to consider making! > > Thanks, > Ted > > -- > Ted Pedersen > http://www.d.umn.edu/~tpederse > -- Ted Pedersen http://www.d.umn.edu/~tpederse |
From: Ted P. <dul...@gm...> - 2008-05-20 01:40:04
|
Am now running similarity server using what will likely be the default setup on marimba. I am in fact tempted to raise maxchild to something greater than 4 (given the expanded power of marimba now) but will hold off on that for a while I think. stoplist.txt is the stoplist that we provide in the samples directory... This is also the command the rc.local will start upon reboot. This is actually exactly what talisker is doing now, so nothing has changed... marimba(25): similarity_server.pl --stoplist /root/stoplist.txt --logfile /var/log/similarity_server.log Error log = /var/log/similarity_server.log Stoplist = /root/stoplist.txt Local port = 31134 Maxchild = 4 Loading modules... done. Starting server... going into background. Closing output to terminal. All future messages will be routed to the log file. On Mon, May 19, 2008 at 8:21 PM, Ted Pedersen <dul...@gm...> wrote: > BTW, I attempted to install WordNet using apt-get, as in .... > > apt-get install wordnet > > and while that did install the wn command, it did not seem to > install tk or tcl or any of the WordNet dict files, so it was mostly worthless > for our purposes. :) > > So I went back and installed WordNet by downloading the tar file > from WordNet, and configuring, etc. as usual. It did not find tk and tcl > on my system, so I installed those via > > apt-get install tcl > apt-get install tk > > but that did not seem to satisfy WordNet, as it was looking for certain > header files in those distributions I think....so I had to install Tk and Tcl > as usual (by downloading, etc.) > > Anyway, once I did the usual WordNet install, then I installed WordNet- > Similarity via CPAN > > cpan >>install WordNet-Similarity > > and that worked just fine (as it usually does). > > So, even in the ubuntu world of apt-get, we'll still need to install WordNet > and Tk, Tcl as before, although I don't expect we'll have to do that too often. > > Thanks! > Ted > > On Mon, May 19, 2008 at 8:11 PM, Ted Pedersen <dul...@gm...> wrote: >> Hi Sid and Varada, >> >> marimba got updated today, and is now running a very current kernel >> (2.6.24) and >> is now an unbuntu server, which should make keeping it updated with patches >> etc. much easier than it was before (when it was running a very old >> version of White >> Box Linux). marimba also has 12GB of RAM, so it's a fairly formidable machine. >> >> I have installed WordNet-Similarity and brought up the web server, and >> that all went >> pretty well. A few notes from the install, as it's a different kind of >> setup than we used >> to do on White Box (which was more or less following RedHat/Fedora conventions). >> >> I installed apache2 as follows: >> >> apt-get install apache2 >> >> I also needed to install the perl CGI module from CPAN. >> >> cpan >>> install CGI >> >> The document base on unbuntu/apache is /var/www and the >> cgi-bin location is /usr/lib/cgi-bin - these are the same as Varada >> observed on our development machine charango. >> >> I needed to change either similarity.cgi or wps.cgi to have doc_base >> of "../../similarity" - one of them already had it set as that, so probably >> it would be good to change both to have the same (since normally >> they would need to be the same). >> >> I also noticed that there is a very small bug in the index.html file that >> we have in docs - that refers to similarity.cgi, but really it should refer >> to ../cgi-bin/similarity/similarity.cgi - this is a very tiny thing, but again >> probably worth fixing. >> >> Otherwise, I don't think I needed to modify or change anything else. >> >> Right now I have similarity_server.pl running without any options, but >> I'll add an error log (going to /var/log/similarity_server.log I think....and >> then give it a stoplist as well (our standard one). I will also modify >> /etc/rc.local to start similarity_server.pl on reboot, once we get closer >> to making this our production server. >> >> I'm going to hold off on doing that shift until we have the new release of >> WordNet-Similarity out I think, as that will make for a nice transition >> point. But, meanwhile you should generally find the web server up and >> running on our newly upgraded marimba, so do give it a try and >> see what you think. >> >> http://marimba.d.umn.edu >> >> has a pointer to the marimba web server for WordNet-Similarity. talisker >> remains the "production" machine for now... >> >> I will make these small changes mentioned above to the cvs version of >> WordNet-Similarity in the next day or two...I might also add some info >> about the usual ubuntu defaults since I feel more comfortable with that >> now... >> >> Do let me know if you notice anything strange or if there are other small >> changes we might want to consider making! >> >> Thanks, >> Ted >> >> -- >> Ted Pedersen >> http://www.d.umn.edu/~tpederse >> > > > > -- > Ted Pedersen > http://www.d.umn.edu/~tpederse > -- Ted Pedersen http://www.d.umn.edu/~tpederse |
From: Siddharth P. <si...@cs...> - 2008-05-20 03:31:15
|
Hi Ted, > BTW, I attempted to install WordNet using apt-get, as in .... > > apt-get install wordnet > > and while that did install the wn command, it did not seem to > install tk or tcl or any of the WordNet dict files, so it was mostly worthless > for our purposes. :) Actually, they've divided wordnet into a number of different packages. Doing something like: apt-get install wordnet wordnet-base wordnet-dev wordnet-grind wordnet-sense-index should install all the WordNet stuff. Also, these packages install wordnet data file in a different location than the official sources. You should find the data files under: /usr/share/wordnet But even so, it is mostly worthless for our Perl modules... not because the data files don't exist, but because of the way our modules locate these data files. We currently rely on WNHOME to locate the base directory for WordNet, and the modules then append "/dict" to this to get the data directory. But for the above installation there is no base directory and dict directory. So I think the fault is with our (and QueryData) modules (to assume that the data files would always occur under a "dict" directory). If our modules don't find the data files under WNHOME then they default to looking under /usr/local/WordNet-3.0. So, this default is different for ubuntu systems. For this reason, I was imagining creating ubuntu packages for WordNet-QueryData and WordNet-Similarity, which can use these different defaults and depend on these existing packages... and somewhere in the future we could simply install these by: apt-get install libwordnet-querydata-perl libwordnet-similarity-perl :) > So I went back and installed WordNet by downloading the tar file > from WordNet, and configuring, etc. as usual. It did not find tk and tcl > on my system, so I installed those via > > apt-get install tcl > apt-get install tk > > but that did not seem to satisfy WordNet, as it was looking for certain > header files in those distributions I think....so I had to install Tk and Tcl > as usual (by downloading, etc.) Actually, again they've separated tcl and tk into separate modules... and I think installing tcl-dev and tk-dev: apt-get install tcl-dev tk-dev should install the required libraries to install WordNet. > Anyway, once I did the usual WordNet install, then I installed WordNet- > Similarity via CPAN > > cpan > >install WordNet-Similarity > > and that worked just fine (as it usually does). > > So, even in the ubuntu world of apt-get, we'll still need to install WordNet > and Tk, Tcl as before, although I don't expect we'll have to do that too often. I think, for now, we do need to install WordNet from source (even though the WordNet data files do get installed using apt-get), but everything else is available through apt-get. Also, I think it would be good to modify our modules so that they can be easily used with the non-standard location of WordNet data files. Thanks. -- Sid. |
From: Ted P. <dul...@gm...> - 2008-05-20 03:49:08
|
Thanks Sid, As you can see I have a lot to learn about managing an ubuntu system. :) This is very helpful indeed, and it's good to know that I can get tk and tcl from apt-get in a form that WordNet can use. They are sometimes a little tricky to deal with for some reason. I do agree, I think it would be nice to move away from relying on WNHOME and dict - I think some of the Perl testers have mentioned that about QueryData, in that it's a fairly unusual thing for a Perl module to require an environment variable to be set like that. And, ubuntu really does seem to be catching on, so I suspect more users may potentially come to use with apt-get style installs, and it really is a pretty nice way to do things, so it would be nice to support that. So I think ubuntu style distributions of WordNet-Similarity etc. would be quite nice. And in fact for some of our other packages I have noticed that people have put together rpms and things like that. For example, there are rpms for Text-NSP for a few different distributions I think, which is nice. Thanks again, Ted On Mon, May 19, 2008 at 10:31 PM, Siddharth Patwardhan <si...@cs...> wrote: > Hi Ted, > >> BTW, I attempted to install WordNet using apt-get, as in .... >> >> apt-get install wordnet >> >> and while that did install the wn command, it did not seem to >> install tk or tcl or any of the WordNet dict files, so it was mostly worthless >> for our purposes. :) > > Actually, they've divided wordnet into a number of different packages. > Doing something like: > > apt-get install wordnet wordnet-base wordnet-dev wordnet-grind wordnet-sense-index > > should install all the WordNet stuff. Also, these packages install > wordnet data file in a different location than the official sources. > You should find the data files under: /usr/share/wordnet > > But even so, it is mostly worthless for our Perl modules... not because > the data files don't exist, but because of the way our modules locate > these data files. We currently rely on WNHOME to locate the base > directory for WordNet, and the modules then append "/dict" to this to > get the data directory. But for the above installation there is no > base directory and dict directory. So I think the fault is with our > (and QueryData) modules (to assume that the data files would always > occur under a "dict" directory). > > If our modules don't find the data files under WNHOME then they default > to looking under /usr/local/WordNet-3.0. So, this default is different > for ubuntu systems. For this reason, I was imagining creating ubuntu > packages for WordNet-QueryData and WordNet-Similarity, which can use > these different defaults and depend on these existing packages... and > somewhere in the future we could simply install these by: > > apt-get install libwordnet-querydata-perl libwordnet-similarity-perl > > :) > >> So I went back and installed WordNet by downloading the tar file >> from WordNet, and configuring, etc. as usual. It did not find tk and tcl >> on my system, so I installed those via >> >> apt-get install tcl >> apt-get install tk >> >> but that did not seem to satisfy WordNet, as it was looking for certain >> header files in those distributions I think....so I had to install Tk and Tcl >> as usual (by downloading, etc.) > > Actually, again they've separated tcl and tk into separate modules... > and I think installing tcl-dev and tk-dev: > > apt-get install tcl-dev tk-dev > > should install the required libraries to install WordNet. > >> Anyway, once I did the usual WordNet install, then I installed WordNet- >> Similarity via CPAN >> >> cpan >> >install WordNet-Similarity >> >> and that worked just fine (as it usually does). >> >> So, even in the ubuntu world of apt-get, we'll still need to install WordNet >> and Tk, Tcl as before, although I don't expect we'll have to do that too often. > > I think, for now, we do need to install WordNet from source (even > though the WordNet data files do get installed using apt-get), but > everything else is available through apt-get. > > Also, I think it would be good to modify our modules so that they can be > easily used with the non-standard location of WordNet data files. > > Thanks. > > -- Sid. > > > -- Ted Pedersen http://www.d.umn.edu/~tpederse |