Welcome all. This thread is for ideas that people may have to help develop Eolas, whether they be developers or not. PLease feel free to submit! Thanks.
Maybe you could start us off Derm with a quick run down on what this doohickey will actually do?
Why do we need it? What problems it will solve? How will it be used (via a browser or dedicated client)? Whats going to be at the backend?
All that good stuff.
Eolas is currently an xml based database where teachers can store details of documents, such as category, subject, path, key words, author, date edited etc.
I provisionally call this Eolas Solo, where the user has individual xml files stored under their own login directory (C:\document and settings\... under windows).
At a later stage I think we could consider Eolas Chorus, which incorporates an SQL database, which would be aimed at schools for example. Chorus would provide a more robust database backend for many users.
Once a teacher has added a document's details to the database (solo or Chorus), (s)he can also create a new file from the original. This is the "why we need it" part; these changes will enhance the accessibility of the original document.
For example, say we have a file called geography_assign.doc. This is a text document with a couple of pictures. Not very accessible to students with dyslexia problems or reduced vision. There are proprietary applications that will help these students read this document but there are usually expensive. Eolas will attempt to give the teacher the opportunity to convert this document to an mp3 or an accessible webpage. Furthermore Eolas will provide functionality to convert the document to an openoffice document (and vice versa), to encrypt the doc, convert to a wiki page or a WAP page (allowing notes to be accessed via mobile phones), the doc can be encrypted and/or converted to pdf. Finally Eolas aims to simplify adding an animated character to a slideshow, and may even attempt to create its own character (I was thinking along the lines of a cartoon salmon, as in the salmon of knowledge)
Ok I get what yer saying. Sounds like a neat project.
An app that centralises (sic?!) document conversion tools. You'll want to make this app easily extensible so that as new doc formats (and open source conversion tools) arise the user can add them to the apps tool repetoire.
Presumably you dont intend on writing all of these conversion tools yourself.
What are your thoughts on document revision history? Would you consider hooking a subversion type backend to the app so that all of that could be automatically taken care of for the user? I'd imagine that type of power could be very useful for users to be able to drop back to earlier versions of the document.
In fact you could even make use of subersions tag system to hold all the meta data you described above(author, category, etc).
In this setup the front end would become a decent graphical front end to a doc versioning system with an extendible set of tools to convert doc formats.
Which would be really nice! :-)
And potentially useful to a lot more environments than just the area of education...
You might consider Python to conrol your backend and webserver and use a web frontend for your app?
The webserver might be a customised sourceforge engine which already has version control hooks.
Thinking more about this last night and I'm convinced that a webserver, cvs/svn setup is the way forward.
Users just log into the site from their favourite browser and can upload (if they have admin priviledges) new documents to project areas. Normal users can log in and join projects. From the project page they can then access the set of documentation associated with that project.
This documentation at the backend is actually being stored under subversion control. Meta information is actually svn tags (which are themselves also under version control - sweet!).
After selecting a document the user can then run a selection of tools on the doc (2Mp3, 2Pdf, 2Doc, 2Ppt, etc). These tools are actually scripts living on the webserver that take a doc as input and produce a doc output. This is where your tool extensibility comes in.
The results (new docs) are then automatically stored under svn and the doc versions are linked.
Eg. MyFoo.doc version 2145 and MyFoo.mp3 version 3312 are from the same source MyFoo.rtf version 1001
Of course the user need only be shown a single doc from the UI - MyFoo. They use the tools to decide which one they want. This is either grabbed from svn (cos it was created earlier and stored) or on-the-fly converted, stored and presented to the user.
Ok I'm gonna stop now! :)
Just did a google for some text to speech (tts) support under python. Wouldnt ye know it theres a lovely library (pyTTS) that takes care of it for you. Check out this wee python program (this is all the code you need to actually synthesize the string below):
tts = pyTTS.Create()
# set the speech rate, higher value = faster
tts.Rate = 1
# set the speech volume percentage (0-100%)
tts.Volume = 90
# explicitly set a voice
# say something
str = "It doesnt get much easier than this"
And yer done!
It can also be used to tts.SpeakToWav() if you want to create wav files out of the text. Reading text from files using python is a complete doddle...
Promise that this is the last post (tonight).
Take a look at Jython.
Spent a few hours today playing with it.
Its the dogs doo das.
A doddle to write with full access to Java and Python from inside a jvm.
Derm what is your plan for holding the conversion tools?
For example, when a user wants to add a new tool (2MarsFormat) where will the tool actually be held?
Or when a new version of the 2pdf tool (with better table support) appears how will that be integrated into the existing app?
Also how will you associate the results of a tool against the version of the tool used to generate the results. Would you think it important to hold onto older version of the tool so that you can reproduce these results if needed?
Obviously you don't want to have to rebuild the app for any of these scenarios. I was thinking that you could create a tool repository with a defined format /tools/<docFormatType>/bin/2<docFormatType>.
The app could be forced to interrogate this repository (GetNewTools) on request from the user and automagically pull in any new tools you have added in.
Associated with each tool in the repository is its version number (i mean repository version number now, not actual app version number - that can be determined if needs be). You could also have some well defined tags to define any params that may need to be specified for the tool. Eg If a tagname starts with param then its describing a parameter.
When a tool generates a result file that result file will have associated with it the tool and its version number used to generate it.
Subversion will do all this for you if you use it as your database backend.
Ok I've just been talking to a lad (Dave) here who knows way more than I about this sort of thing.
Its a very simple document databbase repository that you can query from his webpage. Interesting? .... :)
He also had a great idea for associating files (eg, MyViewOfTheWorld.doc with MyViewOfTheWorld.mp3 with MyViewOfTheWorld.rtf). You could use the svn copy to copy the original (to a new filetyppe) and then checkin the new file on top of this copy. Subversion will store the fact that this new filetype originated as a copy of another file. You can ask svn to give you a list of associations for any file.
OK I've only skimmed so sorry if I go over old ground.
svn is absolutely the way to go. It makes the versioning, storage and metadata parts easy. What do we bring to the mashup? I think that the unique thing about what John has described is that it encourages -- nay requires -- you to version control not only your "source" artefact but also your "transformer" and the "target" artefact you get by applying the transformer to the source. All kind of arbitrary sets of artefacts and source->transformer->target relationships can be tagged to your hearts content.
This blows away all those artificial distinctions between configuration management systems and content management systems and gives you a new dependency-oriented programming (DOP) paradigm to boot! Maybe it wasn't possible in the old days of expensive storage but things have changed. An Eolas repository could be soooo much better than an RDBMS or a plain old filesystem (POF), imagine writing a script like this:
// Find Derm's Word docs about VB (or find a list we created earlier)
const DOC_LIST_NAME = "eolas://~dbarry/vb_docs.xml";
my derms_vb_docs = EolasGet(DOC_LIST_NAME); // Did I stash this list of docs already?
if (null == derms_vb_docs)
derms_vb_docs = new DocSet(
"select *.doc where subject='VB' and author='dbarry'));
derms_vb_docs.expires = today() + 1; // We'll keep this list for 24 hours
// Btw note that DocSet is not just *about* documents -- it is
// itself a document that we can store
while ( derms_docs.hasNext() )
my talker = new EolasTransformer("pyTTS");
talker.version = Eolas::LATEST; // We don't care what version, as long as its new
my doc_as_text = derms_docs.next();
my make_rule = new EolasMakeRule();
make_rule.source = doc_as_text;
make_rule.transformer = talker;
my mp3 = make_rule.make();
// Creates and stores a new doc (an MP3) unless
// it already exists -- or more precisely, if a
// doc with the corrrect URI already exists.
// The system generates the URI based on the source
// doc and the transformer e.g
// This amounts to make-like behaviour
// Play first 5 seconds
// By default each result (MP3) will remain anonymously in storage
my mp3_name = doc_as_text.replace("doc", "mp3");
my msg = "Keep " + mp3_name + "?";
user_choice = (new YesNoMessageBox(msg)).display();
if (user_choice == "no")
EolasDelete(mp3); // deletes it from storage
(This isn't in any particular language but may resemble one you know. Especially Perl. But it could just as well be VB -- theres various ways to make an API that binds to just about any language)
Other points in no particular order:
-- svn comes bundled with Apache already. Hopefully its a fairly "normal" version of Apache that won't hinder the addition of all our nice new server-side code.
-- svn has a "file portability" mechanism based on MIME that allows it to serve variations
-- the power of tagging/metadata cannot be understated! It gives your content more structure, and that makes it more queryable and more reusable. Talk to me over a pint about SPARQL or XQuery!
-- a big thing thats probably missing from svn is indexing. Lets say an academic network builds up based on this system, it'll need to be searchable.
-- in fact we'll probably want to interact with google tech in various ways before too long
-- we should try to be flexible about what a transformer can be: a binary exe, a script/tool combo (examples: my_obfuscator.py+Python), a codec or a piped combination of all of the above.
ok you totally get it. like i knew you would! :)
you bring up an interesting point about scalability and svn's lack of native indexing.
thats certainly something to think about. nothing springs to mind as an elegant solution. sure there are ways around it. but they stink too much of shoe and horn...
svn has the concept of post hooks. which is a script that runs after a check in.
its possible to make use of this to keep an index of data held on the repos...
other things that could be done is a cron job recreating a light database of tags every evening. all search requests could then be sent toward this db.
not entirely happy with either of these though....hmmmmm.
Conor I like that script!
Having that type of power would be well sweet. (feature creep? hah!)
Could you explain your thoughts on that URI a bit more. I recognise that format from my Clearcase days but its been a while. I'm sure svn can be made do it! :)
I meant to post the URL for svn file portability:
It just means svn already directly supports the idea of different renderings of the same base content. So as long as we can arbitrarily extend the list of MIME types, this saves us some work
We use this at work a lot. Its very handy when developers are writing code under M$ and Linux. Or just not setting up their editors correctly and adding LF all over the bloody place...
Our files are flagged as "native" eol-style which means that svn will sort out all that line ending badness for you depending on which OS you happen to check out the code into.
heres a neat little tutorial to get you started building an svn repos.
linux only im afraid.
its a total doddle under windoze though - check out tortoiseSvn
Glad you like it! Thats only a flavour of it, really we're talking about a network operating system with efficient make-like semantics. But it shouldn't be that hard to prototype. Also in case ye missed the hint in "it could just as well be VB", I think this system really can have the type of community-source model Derm envisioned. Experienced coders maintain the core in SourceForge (for now!) and the Eolas system itself allows its own users to submit their own apps, scripts etc (Which are just documents like any other. Some of them might be transformers). As John said, we decouple the core of the system from the things that can be plugged into it, and the versioning aspect makes this possibly the most complete plugin framework ever :-)
I'm really really stoked -- thanks a million lads!
Re. the UIR formats -- they weren't consciously based on Clearcase but its not surprising it resembles what I'll call a Clearcase "element id".
I guess I'm proposing two ways of composing a URI:
(1) Based on location in a namespace.
(2) Based on the provenance of the document: namely what was its exact source and what exact tool transformed it?
Note that the same document can have multiple type (1) URIs but it would be unusual (though maybe not impossible) for a document to have multiple type (2) URIs.
Two more things:
-- Derm's Eolas Solo prototype uses XML to store the metadata. There's no reason why this shouldn't continue in Eolas Chorus with svn backend. We store documents and any XML fragment can be a document. In many ways XML is the new SQL, only better.
-- There might be merit in downloading a nice slimmed down Linux virtual machine from somewhere (like rPath) and making that your testbed for installing svn etc. That way we can share it and we put ourselves firmly on the road to building an Eolas *appliance*
Ok 'll take your million points one at a time there.
1. make like semantics. - im not sure what you mean by this term. i can see from your script what the make transformer is doing but the comment on make like semantics is lost on me and i feel its probably important. are you talking about makefile semantics by any chance? i still dont see the resemblance - but hey! :)
2. agreed that the front end could be any language so long as the backend api is well defined. interesting.
3. decoupling the tools plugin - yes. possibly we could define transformer properties so that tools need only be dropped into the repos and some props set (input type, output type, any special params needed) and yer away.
4. the uri idea is interesting. its differs from what i had in mind - but thats not necessarily a bad thing!
i was intending on holding that sort of info (source file, source rev, transformer, transformerRev, etc) in file properties held with the file. this flattens the dir structure (no longer have dirs for tools used on a file) but it might also complicate things - need to store multiple foo.mp3 files depending on the version of 2mp3 transformer used on foo.doc - tricky if they need to reside in the same dir...
i have no idea how to implement that kind of uri in svn. but thats fine. its do-able im sure.
> "Note that the same document can have multiple type (1) URIs but it would be unusual (though maybe not impossible) for a document to have multiple type (2) URIs."
now im confused again. my understanding of a uri is that its a path into the svn repos that points at a file. having multiple uri's pointing at the same file seems to suggest that you have other ideas! you'll have to slow down for me a bit! explain. :)
5. i have downloaded virtual box and keep meaning to bloody install fedora on it. i will and shall do this soon. im reliably informed that its the dogs. we could have a blast off it anyways. whatcha think?
btw, i spoke with dave about the indexing issue. agreed that there is no easy way to do this. but if you just ask for a recursive tree of properties from svn "its very quick". our current repository has well over 100,000 files and dave assures me that its returns props very quickly. i'll try and do some measurements on it tomorrow. but we could easily use that system to get started. if this becomes a problem later on then a post commit database might be the way forward.
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.