BibDesk Wiki

Bibliography manager for Mac OS X

Brought to you by: amaxwell, hofman, mmcc

BibDesk_2.0_Design_Document Log in to Edit

Authors:

About this Document : It is obsolete and for the curious only!
Overall Goals
Data Model
- CoreData Model
  - Ordered relations
- Consequences of Using Core Data
User Interface
System Architecture
Contributions to this Document

About this Document : It is obsolete and for the curious only!

This document is years old and is currently for historical purposes only. The ideas are still of interest, but they don't represent project plans.

This is the BibDesk team's current thinking on how to design the next iteration of BibDesk. If you have suggestions about the design, please use this link: discuss this document.

The results of the discussions will be incorporated into future versions of the document.

Overall Goals

BibDesk, up to and including 1.0, was mostly about preparing and organizing databases of papers and local copies of them. Because it was based on a limited data model, there were some organizing tasks that were impossible, such as easily showing relationships between papers, and between papers and authors. Other features that are very appealing were made harder by the fact that the data format for saving databases was strictly bibtex. Examples of these features are paper groups, and any kind of export that requires saving state on a per-document basis.

Tasks that BibDesk 1 is focused on are editing and adding to bibtex databases, and finding and using citations in LaTeX. BibDesk 1 also improves linking files and keeping local copies organized quite a bit. For those tasks, BibDesk 1 is a very useful tool, but there are some tasks that are more frequent which it fails at, for various reasons. It does no online searching at all, and its support for sharing references between people is underdeveloped.

I submit that for most researchers, more time is spent discovering, searching and sharing than organizing and using references. BibDesk currently does little to help the more frequent tasks.

The emphasis for BibDesk 2 should be to continue the strengths of BibDesk 1 while improving the more frequent tasks listed above. What follows in this section is a discussion of specific goals for the 2.0 redesign.

What will BibDesk 2.0 be?

BibDesk 2.0 will not be an editor for any particular file format that is familiar to bibliography software users. It will have its own file format and support importing of many formats, as well as a flexible export system, which will let you define separate export settings for each group. Exporting in this context means a lot more than just writing to a .bib or .ris file - it could mean adding to a website, uploading to CiteULike, sending an email, posting to a weblog, updating an RSS/Atom syndication file somewhere - anything you would want to do with one or more publications to share them.

Note: This means that BibTeX (or other) files you use for typesetting should be re-generated every time you make a change to the items they correspond to. Don't worry - we'll have BibDesk do that for you - just be aware that with BibDesk 2.0, .bib files are just an output format. They will no longer be the source, and if you want to change them, you should make your edits in BibDesk.

Support Specific Functionality Previous Versions Didn't Allow

Occasionally, we've run up against feature requests that we couldn't support cleanly using BibTeX files as our file format. This section discusses a few of those.

Some of these things could be made possible by keeping separate files for metadata that we can't store in BibTeX, or by abusing BibTeX and defining BibDesk-specific fields and entry types that are used for other data. However, those approaches are not very good solutions, because separate files adds bad data synchronization problems, and abusing .bib files makes the files less usable and doesn't solve our problems very well anyway.

Grouping References

While it is possible to imagine a system which would convert a "Group" field into groups when reading a file, this is not very clean nor necessarily fast, and there are other advantages to saving groups of references explicitly, such as adding annotations about the groups themselves.

Saved Searches

Groups defined by search criterion which update 'live' are a very nice feature, as iTunes has shown. However, there is simply no way to add this to BibTeX without abusing the format.

Linking References

We should be able to represent relationships between references such as cites and comments on, etc. Many such relationships are imaginable, and an ideal design would allow us to represent any kind of relationship, even if we do not necessarily always reveal those relationships in the UI.

Better Annotation Support

Annotations should be able to be made directly on PDFs, and be portable with that PDF, but still indexed and presentable as part of the metadata of a document, for example, being accessible for HTML export. PDFKit seems to provide this ability

Richer State Saving

Using our own data file format will allow us to save state on a per-document, per-group, and even per-item basis, which allows us to add a great deal more functionality in several areas:

File Aliases - MacOS aliases follow files around even if you move them. Text paths break and make you sad. Storing aliases in BibTeX files would be horrible - they are large binary blobs of data.
Pictures - Pictures can be used in various contexts to help you recall a paper. Perhaps instead of a vague icon, you'd rather see the most important graph from that paper whenever we show it to you? Can't put that in BibTeX.
Exporters - See [What_will_BibDesk_2.0_be?] for a discussion of exporters. Many interesting exporters will need to save information like what items they've sent out (emails) who and where to send them to (emails, web, RSS, etc), metadata you want to attach (Your name, etc), and other settings such as which HTML template to use. These require storage somewhere.

Add Support for New, Spiffy Functionality

There are lots of new features that a thoughtful rewrite should make much simpler. The current code has too many big objects and is hard to understand and change. This section discusses some directions I'd like to take the program in.

Better UI

Source List - The main window should have an iTunes style source list, with groups of references and saved searches (aka smart playlists).
Subscriptions - BibDesk should evolve into a tool for aggregating many sources of potentially interesting references. How can you keep track of your field? Use BibDesk to put all the new literature in one place for your review
More powerful searches - BibDesk will support powerful searching throughout your database. We won't make assumptions about what kind of searching you'll need. Every search can be saved as a live group and revisited.
More ways to view your collections. Sometimes a table isn't the best way. How else can we visualize references? This is sure to be a hot topic, so two links are in order: First, the architecture behind modular displays, and a wiki page to share ideas about visualizations.
Workflows - BibDesk should have special modes that support common tasks, such as reviewing a group of papers, or generating a C.V.. Those tasks should also be saveable.

Allow easier developer contributions

This is important, because many people have reimplemented parts of BibDesk, or asked how they can help without learning about the whole mess. The people who have helped have all mentioned that it can be difficult to understand what's going on, and I appreciate their fortitude. I do claim that some of that lies in the Cocoa framework, but much of it also lies in my abuse of same.

BD 2 needs:

Clearer APIs
Smaller objects
Encapsulation
Q: Plugins? More on this in [#About_Plugins].

Keep track of which papers we have used and when

One of the challenges is keeping track of which papers you cite alot and when you've cited them. We could have smart collections per paper. For each paper you have written you see a collection, and each record gets a 'cited by you' count, so that you can see your most cited papers.

I'm far from sure how to pick up which papers are being cited together. Needs to pick this up rather than be told it explicitly. Perhaps a (totally optional) wrapper script around bibtex and a web-service listener that other systems can hit to tell BibDesk that you've cited records together --JamesHowison

Continue support for BD 1.0 BibTeX editing features

There may be people who are not happy with the new file format, and would prefer to simply edit BibTeX. We should not require you to create a library and import a BibTeX file just to make a few changes, if you don't want to.

Data Model

Although we are not writing a BibTeX editor or a MODS editor, the data model behind BibDesk 2.0 should be rich enough to support all of the representative power of MODS. Because upgrading file formats is currently not well supported by CoreData, we should try to err on the side of expressiveness if possible, and choose what to expose through the UI carefully. Some notes on versioning Core Data models here.

Bruce D'Arcus writes about mapping MODS to a relational database (which is pretty similar to Core Data's requirements)

CoreData Model

Here's an incomplete stab at some of the necessary Entities and their data. It includes only Entities descending from Item, there are also entities descending from Group. Please discuss on [Talk:DataModel] and only edit this with changes to the actual outline.

Legend: '-' is an attribute, while '->(entity)' is a relationship. ->> means to-many, so Person->>group means a person can be related to many groups. An entity name in brackets after an enity name denote its superentity. So a Publication is a subentity of TaggedItem, so it has relationships to tags and notes.

TaggedItem:

There are notes in the [#System_Architecture] section that cover points about the data model. If you were following earlier attempts at a non-CoreData BibDesk file format, note that many of the classes I wrote and proposed will be replaced by entities which are implemented using (I expect) pretty small subclasses of NSManagedObject. An intro to Core Data is available from Apple, this is also in the Tiger developer documentation.

Ordered relations

We will have to work around CoreData to let us represent an author list as a relation between entities, because it doesn't currently support ordered relationships.

- One option is to use a single string within the CoreData object model to represent a newline separated list of authors. This string can be converted to an array with the componentsSeparatedByString function. As an example, I have a demo app that uses a single CoreData string to store an ordered list of strings. There would likely be performance issues if this was used for huge lists of data, but should work fine for author lists. I am also sure there are other approaches, but this is pretty easy to implement. -- Fletcher

- That's a good suggestion for how to store the names, but other relevant author information such as homepage, picture, list of institutions worked at, connections with research groups, etc... require a separate Author (more appropriately named "Person") entity within core data, and we're back to the problem of linking two entities in an ordered way. Storing the ordering in a string is certainly possible, and we could have the publication entity reconstitute an array by splitting the stored string and searching for a person entity to match each name string. However, this doesn't seem to be much easier than storing the array as archived data, and although I'm not totally sure, I think we avoid the searching if we use an array. Of course, it might turn out to be advantageous to use an array of fetch requests anyway. I really wish I had the time to be working on this now... -Mike

- Another approach, which is the one mostly recommended on the internet, is to add an index attribute to the relationship entity. Note that in our case this is indeed an attribute of the relationship entity and not of the author entity, as a person can be related to several publications. We just have to make sure that the indexes are updated when a relationship is added or removed or reordered. -Christiaan

Consequences of Using Core Data

All entities are NSManagedObjects. This is an important point, since using NSManagedObjects for the .bdlibrary file format means that to support the old .bib editing features, we will have to keep BibItem et al. around as well as their CoreData entity replacements, and possibly provide conversions between them. We could also reimplement BibDocument using CoreData and simply not use a PersistentStoreCoordinator, instead handling file I/O ourselves, but this is a lot of work to stand still.

User Interface

Details upcoming. A major goal is to be maximally flexible with what configurations the code can support, while not forcing the user to choose between a ton of choices that are no more different than various 'skins'.

Task-based interface

The interface should be task-based - the BD 1.0 main window could be seen as a GUI designed to support a citation-making task, while the edit window mixes viewing, editing and annotating tasks. Other interesting tasks are reviewing, searching, and sharing.

We should present UI options for viewing, organizing and editing by the task they were designed to support. We should discuss ideas for these task-based views in [BibDesk2.0:Workflow_UI]. The Visualizations discussed earlier will be used as components of these views.

Source List

Have an outline like in NNW.

Inbox

Have a single place to look for new items.

Mockups

Organizing tasks: Traditional BibDesk View

Here is how I would implement the traditional view with the new architecture, with potential improvements:

Organizing tasks: Alternate View Suggested by James Clause

Organizing tasks: Alternate fancy visual view

This is a concept-car style example, not necessarily waht I think would be best, but intended to show that we can do more complex things: This displays papers with geographic info, author names, coalescing multiple papers into one stack, and showing links between the papers.

You could imagine 'exploding' the stacks and dragging between the papers to represent relationships. (Credit is due to Robb Beal and UserCreations.com for Spring, from which I stole that idea.)

Importing Multiple Items workflow: Adding text view

This explains how we could incorporate the existing 'add items from text or web' sheet as an itemdisplaycontroller and view. It also points out that reuse of display controllers is possible (and important!)

System Architecture

Anything not explicitly called a controller or otherwise noted should be read as a subclass of NSManagedObject, including Filters, Sources, Exporters, etc.

Diagram

diagram forthcoming :)

Document class

Subclass of NSManagedDocument. May not need much code given the data model.

DocumentController

A controller layer object that is responsible for maintaining the objects handling the display of data items. This may actually end up with little code as well, as much code previously in BibDocument will be factored out into item display controllers and filters.

This class will handle swapping display controllers in and out of view and maintaining the source list and its current selection.

Here is a diagram that illustrates how the current selection is chained through bindings between the vairious controller objects:

Filters

Filters are pretty simple using CoreData. We can use NSPredicate to perform the searching. That gives us pretty functional boolean searching eg (firstName contains "a") OR (lastName contains "a").

Filter Controllers

For each kind of filter, it will require a new controller UI - this will be the hard part of a controller. The controller should

Item Display Controllers

The following is an edited version of an email to the developer list. It may still have some parts which are a little out of date w.r.t. the CoreData architecture.

So we have the DocumentController (call it the LibraryController for now) take care of the source list (and perhaps the search field) and other objects take care of displaying and editing the items.

We'll call the other objects Item Display Controllers.

So the LibraryController is managing what is in the current selection, and managing which ItemDisplayControllers are displaying the current selection, but it doesn't have to do any of the display work.

Likewise, the ItemDisplayControllers are super-simple. They just need to bind to the selectedItems keys from their item source, observing changes to those items, and do whatever they want to do to display the items. They have their own nibs, and they just have to provide an NSView for the LibraryController to swap into the right-hand-side of the window.

Note that the itemsource protocol to supply the BDSKItemDisplayControllers makes you give them all the items as well as just the selected ones, so if you want to have a displaycontroller that just highlights search matches instead of trimming the display down, or whatever, that's OK.

Also, if we decide we want to do this, there's no reason why the BDSKItemDisplayControllers have to be in the same window. We could have one set displaying one collection in the main window and have another set of itemdisplaycontrollers displaying another collection in another window, just like playlists in iTunes. This would require writing another simpler windowcontroller, though - since bdsklibrarycontroller only has one current collection and one selectedItems array. We could have that object keep track, but it's simpler to delegate it to another windowcontroller.

The BDSKItemDisplayControllers can edit items too if they want - no reason why not.

The BDSKItemDisplayControllers get their items to display from a generic item source, which doesn't have to be a BDSKLibraryController. You could have an itemdisplaycontroller contain others, so for instance the current tableview-above-textview layout could be implemented with two itemdisplaycontrollers, the one that displays the table being the itemsource for the other.

There's currently no support in bdsklibrarycontroller for changing the current display controller, and as a part of that there needs to be code to keep separate lists of display controllers for the different classes - bibnote, bibauthor, bibitem, bdskexternalsource.

Groups

These are really simple entities, and may not require much more code than NSManagedObject provides. They have relationships to items (their contents), a name, description, and exporters.

Simple groups can be hierarchical. The parent will automatically contain all the items of their children.

Smart Groups

The smart group entity has a predicate data attribute, which is persistent. It also has a fetch request, which is a transient attribute calculated from the predicate. The collection of items is contained in an ivar and is automatically updated.

Some other ways of implementating smart groups are discussed on [Talk:SmartGroups].

A smart group can also automatically generate children based on the values of a category property, to further filter and categorize the items. This emulates and extends the behavior of groups in BibDesk 1.

Sources

Objects which update (possibly creating new pubs) from some external source in response to a timer or a manual command.

Source Controllers

provide UI to configure sources

Exporters

Objects linked to groups which bind to the groups' contents and perform some export action when the contents change.

Exporter Controllers

provide UI to configure Exporters

Libraries for File Import and Export

The bibutils project has produced a library that converts between MODS and many other formats. There is currently not enough documentation to tell how easy it would be to use this library within BibDesk. (As of April 2005)

About Plugins

Many people have asked about a plugin API for various parts of BibDesk. What follows is an edited excerpt from me to the develop list which explains my misgivings about these requests in the context of an Open-Source project like BibDesk:

I am against building lot of plugin architecture for BibDesk. I think that in general, people who are asking for plugin support really just want easier interfaces to internal classes, and wouldn't really care about dynamic loading if they had the better interfaces.

1- I don't see how it makes adding a new part to BibDesk any easier. You'll still have to understand the data model, and potentially many data classes, as well as at least a part of the overall architecture.

Making a serious plugin API could be a lot of work, and I'm not convinced it's worth it.

2- If we have plugins, we'll have other people distributing things and a version nightmare.

So does having exporters (and displayers/importers/searchers, etc) as bundles really make it easier for users to develop them? I don't think so.

The only two reasons I can think of to use dynamically loadable plugins are if you don't want to give out the source, so you just publish a plugin API (see browsers, voodoopad, etc.), or if you can avoid loading huge chunks of code by keeping them in bundles and only loading them on demand. I don't think either is the case for us.

My current question to someone who wants a plugin interface is: why not just contribute to the project itself? A definite goal is to make this easier in BibDesk 2.0, and hopefully this will become a moot point.

I remain open to having my mind changed, however.

Contributions to this Document

Revision 1: Michael McCracken

Wiki: BD2-open-questions
Wiki: Developer_Information