Jason, thank you for making your impressive software freely available. I am just beginning to explore its capabilities and hope to make serious use of it in future.
I wanted to know what you think about a certain way of implementing a shared database. Your FAQ section mentions a couple of possibilities, but I had something in mind that wasn't covered there. Imagine a GitHub repository that just contains the relevant XML files. Suppose I have a local clone of the repository that functions as my local Hypernomicon database. I update the XML using Hypernomicon in the standard way. Then I push my changes to the remote repository on GitHub.
There are three reasons I'm thinking of a workflow like this:
There doesn't seem to be a problem doing this as a single user. You could even treat the remote repository as a backup, or as an alternative to Dropbox or Google Drive. There's little reason to create multiple branches as a single user (but see below), though the versioning might conceivably come in useful.
It might be possible to treat such a GitHub repository as a collaborative resource. Many individuals could maintain local clones of the repository, updating as necessary and creating pull requests. I'm not sure how the XML is structured, so I'm not sure how feasible it would be to have multiple branches being 'developed' at once, and then all pulled back into master. However, I do know of other collaborative database projects that work along these lines: CHIELD is one, D-PLACE is another. Although you say in the FAQ that "different people understand philosophy in different ways", it certainly ought to be possible to maintain a core, agreed-upon dialectic for very many of the topics of analytic philosophy.
Finally, I've noticed at least one other forum user mention the possibility of using the XML generated by Hypernomicon as a basis for an HTML-navigable interface. Some of the other databases that work in the way I'm envisioning also run HTML interfaces directly from their GitHub-stored databases; see this example from CHIELD. What's more, CHIELD actually receives its pull requests from its HTML interface. So it could be possible to have multiple user-friendly entry-points into a shared database.
Overall, I'm wondering whether there are any aspects of the Hypernomicon database or workflow that would absolutely prohibit something like the above. I am working through your tutorial videos, learning about the XML and actually filling out an initial database on my local machine, so I'd be able to answer my own questions eventually. But I figure since you've been so helpful to the other users on this forum, I might as well let you know my thoughts. It also gives me a chance to say thank you, again, for this fantastic software!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for sharing your thoughts on this. I think you're right, it would be possible for a single user to use git or GitHub for backup purposes, but the user would have to be careful not to commit anything less than all changes in the working tree at a time. A lot of git operations that are (more or less) safe to do in other contexts (like working with source code), could easily break the database, like cherry-picking. This is because the contents of files are highly dependent on each other in ways that aren't obvious just by looking at diffs. Also, the database is intended to include files of other types like images, PDFs, etc. that can't be used effectively with git. This is why I haven't recommended using git/GitHub for this purpose. Other cloud services (e.g. dropbox) have the ability to restore earlier versions to a limited degree in case you accidentally mess something up. For longer-term backing up, my recommendation is to manually copy either the XML folder or the entire root database folder to an external hard drive or cloud data account, or use backup software to maintain backups of that folder. That way there's no danger of anything getting corrupted since the folder is treated as a monolithic unit.
For similar reasons, unfortunately I don't think it would work to use git as a way for multiple people to make edits to a database. If multiple users have made changes, there's no way to merge them that would ensure the database wouldn't get corrupted. For example, if 2 users decide to use the same search key for 2 different records, that would make the database unusable and require manual fixing, and would not be caught by git. Same thing if 2 users both create a new record that gets assigned the same record ID. For these reasons, if it becomes possible to merge database versions in the future, this will have to be done from within Hypernomicon as it will require specialized functionality (reassigning record IDs, prompting the user to specify how duplicate search keys should be handled, etc.). Also, the functionality for merging entire records is currently pretty limited and would have to be significantly expanded. What makes Hypernomicon so good, in my opinion, is the fact that data can be interconnected in so many subtle ways, but the downside is that this would likely make any general-purpose, automatic merging process be prohibitively complex and error-prone. An effective merging process will probably be one that involves a large amount of user interaction and will have to be done within the UI. Then it might be feasible to merge a small amount of changes at a time but probably not feasible to merge a larger amount of changes or two databases that were built independently.
Using a git repo to generate HTML is an interesting idea. I will take a look at how this is done with CHIELD.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Your points about solo backups and multiple users are well-taken. I suppose I didn't appreciate the sheer interdependency of the database (which makes perfect sense in hindsight, given the interconnections in the entries). Considering a solo user, when you say:
the user would have to be careful not to commit anything less than all changes in the working tree at a time
do you just mean to make sure to commit all the XML files at the same time? In the database I'm working with all I have so far are XML files and a couple of jpgs. There's nothing in the other folders (Books, Misc etc).
I think you're right that it wouldn't be a good idea to include PDFs in a Github repository. But I think images would be OK. An HTML UI would preferably include (at least) pictures of authors along with the works, arguments and positions etc.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
As shown in the "Managing PDFs, Other Files, and Folders" video, you can associate file folders and any kinds of file with database records. In this way, Hypernomicon is meant to handle most of your file/folder organizing for research purposes. This means that if you only committed the XML and not the other files to a git repo, the records and files/folders could be out of sync. There's nothing wrong with using git to manage files like images but there also isn't much benefit to that either, since git treats them as "binary" files.
It's very possible that there could be special cases where a git repo would be more beneficial. This might happen if someone was using Hypernomicon in a specialized way, like primarily just to keep track of terminology or documentation (perhaps in a git-friendly format like markdown, rather than something like Microsoft Word) and even positions/arguments, but not to manage works/PDFs. These applications would lend themselves particularly well to generating HTML. The Hypernomicon project actually evolved out of a project I originally created back in 2005 to keep track of philosophy terminology and generate HTML glossaries (back then I called it Hyperglossary).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Jason, thank you for making your impressive software freely available. I am just beginning to explore its capabilities and hope to make serious use of it in future.
I wanted to know what you think about a certain way of implementing a shared database. Your FAQ section mentions a couple of possibilities, but I had something in mind that wasn't covered there. Imagine a GitHub repository that just contains the relevant XML files. Suppose I have a local clone of the repository that functions as my local Hypernomicon database. I update the XML using Hypernomicon in the standard way. Then I push my changes to the remote repository on GitHub.
There are three reasons I'm thinking of a workflow like this:
Overall, I'm wondering whether there are any aspects of the Hypernomicon database or workflow that would absolutely prohibit something like the above. I am working through your tutorial videos, learning about the XML and actually filling out an initial database on my local machine, so I'd be able to answer my own questions eventually. But I figure since you've been so helpful to the other users on this forum, I might as well let you know my thoughts. It also gives me a chance to say thank you, again, for this fantastic software!
Hi Stephen,
Thanks for sharing your thoughts on this. I think you're right, it would be possible for a single user to use git or GitHub for backup purposes, but the user would have to be careful not to commit anything less than all changes in the working tree at a time. A lot of git operations that are (more or less) safe to do in other contexts (like working with source code), could easily break the database, like cherry-picking. This is because the contents of files are highly dependent on each other in ways that aren't obvious just by looking at diffs. Also, the database is intended to include files of other types like images, PDFs, etc. that can't be used effectively with git. This is why I haven't recommended using git/GitHub for this purpose. Other cloud services (e.g. dropbox) have the ability to restore earlier versions to a limited degree in case you accidentally mess something up. For longer-term backing up, my recommendation is to manually copy either the XML folder or the entire root database folder to an external hard drive or cloud data account, or use backup software to maintain backups of that folder. That way there's no danger of anything getting corrupted since the folder is treated as a monolithic unit.
For similar reasons, unfortunately I don't think it would work to use git as a way for multiple people to make edits to a database. If multiple users have made changes, there's no way to merge them that would ensure the database wouldn't get corrupted. For example, if 2 users decide to use the same search key for 2 different records, that would make the database unusable and require manual fixing, and would not be caught by git. Same thing if 2 users both create a new record that gets assigned the same record ID. For these reasons, if it becomes possible to merge database versions in the future, this will have to be done from within Hypernomicon as it will require specialized functionality (reassigning record IDs, prompting the user to specify how duplicate search keys should be handled, etc.). Also, the functionality for merging entire records is currently pretty limited and would have to be significantly expanded. What makes Hypernomicon so good, in my opinion, is the fact that data can be interconnected in so many subtle ways, but the downside is that this would likely make any general-purpose, automatic merging process be prohibitively complex and error-prone. An effective merging process will probably be one that involves a large amount of user interaction and will have to be done within the UI. Then it might be feasible to merge a small amount of changes at a time but probably not feasible to merge a larger amount of changes or two databases that were built independently.
Using a git repo to generate HTML is an interesting idea. I will take a look at how this is done with CHIELD.
Your points about solo backups and multiple users are well-taken. I suppose I didn't appreciate the sheer interdependency of the database (which makes perfect sense in hindsight, given the interconnections in the entries). Considering a solo user, when you say:
do you just mean to make sure to commit all the XML files at the same time? In the database I'm working with all I have so far are XML files and a couple of jpgs. There's nothing in the other folders (Books, Misc etc).
I think you're right that it wouldn't be a good idea to include PDFs in a Github repository. But I think images would be OK. An HTML UI would preferably include (at least) pictures of authors along with the works, arguments and positions etc.
As shown in the "Managing PDFs, Other Files, and Folders" video, you can associate file folders and any kinds of file with database records. In this way, Hypernomicon is meant to handle most of your file/folder organizing for research purposes. This means that if you only committed the XML and not the other files to a git repo, the records and files/folders could be out of sync. There's nothing wrong with using git to manage files like images but there also isn't much benefit to that either, since git treats them as "binary" files.
It's very possible that there could be special cases where a git repo would be more beneficial. This might happen if someone was using Hypernomicon in a specialized way, like primarily just to keep track of terminology or documentation (perhaps in a git-friendly format like markdown, rather than something like Microsoft Word) and even positions/arguments, but not to manage works/PDFs. These applications would lend themselves particularly well to generating HTML. The Hypernomicon project actually evolved out of a project I originally created back in 2005 to keep track of philosophy terminology and generate HTML glossaries (back then I called it Hyperglossary).