My company is looking into creating a website for clothes shopping built around a recommendation engine. User criteria would be past purchases and items that have been looked at (à la Amazon), measurements, personal tastes (colours, clothing styles…).
Items criteria would be colours, clothing styles, etc.
It is not decided yet if we try and build our own engine or if we can benefit from an existing one.
1/ How active is Duine development? The last commit is from April if I'm not mistaken. When was the RoadMap written? The website looks like it hasn't been updated since February 2009. Was it? If the RoadMap is not up-to-date, is there a list of improvements somewhere?
2/ I believe Duine can be used in a commercial software, free of charge, due to the LGPL license. Am I mistaken?
3/ How long did it take to develop the Duine framework?
4/ Do you have an estimate of the limitations in terms of the amount of data? In some places in the documentation, it is written that the system does not scale very well for large number of items/users.
5/ How long do you reckon it would take for a programmer used to Java & database management to get used to the framework and use it effectively in a real-world app?
Sorry if these are vague questions, I'm just starting researching on this subject!
Let me answer your questions one by one:
1. The activity in the project depends on whether somebody wants to pay for the efforts. The last addition was a Metadata Storage component as part of the realization of a recommender-as-a-service for the European iNEM4U project (www.inem4u.eu). The source code is in SVN, but there is no formal release yet. The service can work with TV-Anytime metadata. Other formats can be supported as well, provided that you write the XSLT scripts for conversion to the Duine storage format.
2. LGPL means in short that you can use it in your commercial software, but that you are obligated to feed back changes to the library into the source code repository when you distribute the library with the changes. A new predictor that is not in the current library is not a derivative work of Duine in my eyes and does not need to be disclosed on distribution.
3. The code was originally developed as part of the PhD thesis of Mark van Setten somewhere in 2002-2003. Later on we have reused it in several other projects. In 2008 we had the opportunity for a redesign, resulting in the open source version in 2009. How much effort? At least a person year. If you are an experienced java software developer, know about machine learning and recommendation and know beforehand what you want to create, then it takes less of course.
4. There are no hard limitations, it is more that performance will degrade when you add more and more items and users. It means that if your non-functional requirements on scaleability gets more important, you have to pay attention to caching, offline calculation of profiles etc. I don't have numbers, but you can try it out with the Movielens data set.
5. Lets assume an experienced java developer that is familiar with Maven and Spring, machine learning basics and recommendation. It will take him a less than a day to get the Movielens example working. For using it in a real world app it really depends on the requirements. Also the availability of data plays an important role. You need (a lot of) data to verify the proper working of the recommendation engine and you have to think carefully about how to validate the results.
There are also commercial parties offering recommedation services. You might want to check them out as a third possibility (build yourself from scratch, build yourself with open source, buy-in).
Hope this helps you with your decison,