I've just added my python module (cuecat.py) and a test program (test_cuecat.py) to CVS under python/. I might note that it'd be helpful if people started using CVS and the anon-ftp area rather than posting their code in the forum. That keeps everything in one easy-to-find place. Anyway my module does the base64-type encoding and is based on descriptions of the encoding I read in the forum and my own MIME base64 code.

Jack, I read your post about user applications. I couldn't agree more that user apps are the ultimate goal and the real benefit. I think though that there are a lot of different applications people want. I personally want an application which I can use to check out CDs to friends just like they do in a library. So, I guess somebody has to start by defining exactly what app they'd like to have and start coding the thing.

One thing that I think would be of benefit to any application would be a database of codes for products. I see this database as a hierarchy which would allow richer data where possible for different types of things. I think it would be a good idea to start a system which would allow anyone to contribute to the database. Building a system like this is not too hard. The first step of coming up with a data structure which is rich is one hurdle but isn't too bad. The somewhat more important issues bound to arise are:

1. Accuracy. You just can't count on everyone to exercise the same care as you would in entering accurate data. Worse, there are people who would think it's funny to pollute the database with misinformation.

2. Commercial hijacking. Witness what happened to the CDDB. Scenario: I decide to host the internet barcode database to benefit all out of the goodness of my heart and donate my hardware/bandwidth to the cause. For a time many contribute, the database flourishes, and many enjoy the fruits of the database. A victim of my success, I find myself spending large amounts of time pruning the database of bad data and upgrading the system. I also find my internet connection increasingly filled with database requests. Then I come up with a great idea. Hey, this database is worth money and even though it was built by people donating their information I feel like I've done a lot of work on it and hey, I've been paying for the internet connection and so I now feel justified in making this database a commercial product in some shape or form and making money off of it. Now everyone who uses and contributes to the database is quite rightfully pissed off that I took everyone elses hard work and am not trying to get rich off of it.

I think/hope we can come up with a system which will not suffer these problems. With respect to 2) there are certainly conditions under which it's impossible to escape. You could see a situation where such a service was so widely used that the requests could easily saturate a T1 on a normal day. If this happened there is no one who is going to provide a T1 for the purpose without some gain, be it direct or indirect. If that happened it would only be fair for such a person/company to try to get some gain by providing this if not to simply cover the cost of the connection. Of course, it would never be ok under any circumstance for someone to collect all the data and then one day just throw the switch and say 'hey, now this is a commercial service and you have to pay if you want the data'. If the data is provided by the public, it belongs to the public.

To deal with accuracy we do it like this. Each person who wants to contribute must "register" themselves under their email address. Registration will proceed as it does with newsgroups. You put in your request to register, you receive an email with a link. Once you are registered you can submit new data with your username (email address) and password. Then we just need a way for the system to attach trust to a user. Also, we keep track of the user who submitted each entry so if we find someone is just pumping in bogus data we can just go in and delete all entries they've submitted. It would be nice to have some kind of trust system so that we could put quality control on the data. When a non-trusted user submits an entry into the system it goes to a staging area. Trusted users are permitted to verify the data and pass it on to the trusted area.

To deal with the Commercial hijacking problem I propose that all data entered into the system be put under a kind of GPL. Anyone can download the data. If you download the data and offer it to others you must always support a free interface (which will be specified). People are free to limit bandwidth to their server (which may make it difficult for people to download the data what can you do). Also, right from the start we should devise a way for the server to replicate itself to other servers on the fly. One option here which I favor is providing ldap as the search interface. The openldap server already provides a way to replicate itself to other servers. I could envision a kind of tree of servers who replicate to each other. This is the strongest defense against a situation where a single site becomes the victim of its own sucess and either going down because the (unknowing) provider of the internet connection demands it be shut down or the provider getting the bright idea to start charging money for it.

Those are my thoughts. Anyone care to add or poke holes?