1. Summary
  2. Files
  3. Support
  4. Report Spam
  5. Create account
  6. Log in

Road map

From bio2rdf

Jump to: navigation, search

Contents

Current development release

0.6.0

  • Profile integration, so that users can download a complete configuration and select the relevant profile URI's they want to use without having to munge the configuration file or have a list of providers/queries/etc., that they want to avoid or utilise
    • Refactored the build process for QUEBEC/MQUTER so that it doesn't rely on replace strings, and instead sets a profile option to intelligently select providers without having to change the base global configuration
  • Configuration from one or more RDF sources, either a URL or a file, with a configurable MIME type so N3 etc and RDF/XML will be available subject to their availability in the sesame parser set
    • Modularised the configurations so the fact that there are a huge number of providers and namespace entries won't affect the ability to easily edit the query or profile configurations
    • Made up an example local configuration file with profile, provider and query for someone wanting to override a default provider with one of their locally hosted versions
    • Added optional config api versioning so that current clients can possibly continue to get compatible configuration files in the future. The 0.6.0 version is declared as the version 1, with future versions being integers greater than this which will as far as possible be backwards-compatible.
  • Added support for keeping track of long term statistics using either a POST URL or a sparql insert query
  • Expose the internally constructed schemas using a web page, all of the model schemas, ie, /ns/profile:*, /ns/provider:* etc resolve to a complete internal list

Current stable release

0.5.1

  • Created script to enable bulk importing of namespaces from the database.ns dumps in http://quebec.bio2rdf.org/download/n3/
    • This bulked up the number of known namespaces to over 1500 mostly due to the number of known namespaces in KEGG and OBO being added
  • Fix NPE bug caused by the Sogou bots Accept: string in the Pubby Content negotiation code
  • Add OBO-DBPEDIA data provider http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/DBpedia_to_OBO_mapping http://www.mail-archive.com/public-semweb-lifesci@w3.org/msg05030.html
    • Includes a number of obo->bio2rdf normalisation rules
  • Unrecognised namespaces return HTTP 404 responses, while unrecognised queries return HTTP 400 responses
    • Empty results could at some stage in the future be modified to return a different response code, but there doesn't seem to be a suitable replacement for 200, such as 204 that is allowed to contain information in the message body so an empty RDF document couldn't also be sent in 204 cases.
  • Fix rule ordering so that it can be lowest-to-highest for input and highest-to-lowest for output
  • Add URLRewrite for /pmid: to /pubmed: See /urirank/pmid:15383840 and put the pubmed2pmid normalisation rule on the appropriate endpoints
  • http://qut.bio2rdf.org/evidenceviewer:25833 is rewritten to http://qut.bio2rdf.org/evidence_viewer:25833 to standardise the namespace prefix
  • Add pubmed asn1,medline, providers
  • Add SIDER http://www4.wiwiss.fu-berlin.de/sider/ as provider, including normalising its links to other drug related endpoints. Credit to Anja Jentzsch for this addition
    • Using sider_drugs and sider_sideeffects as namespaces
  • Add OBO to DBpedia links using hcls.deri.org as a provider courtesy of Matthias Samwald
  • Add TCM using a downloaded copy of the dataset provided by Jun Zhao
    • Using tcm_ontology, tcm_gene, tcm_disease, tcm_medicine, tcm_ingredient, tcm_effect and tcm_interlink for the namespaces
  • Reverse construct symbol namespace from drugbank_enzymes, http://qut.bio2rdf.org/drugbank_enzymes:1
  • http://qut.bio2rdf.org/mesh:D051379 Make reverse construct on sparql.neurocommons.org/sparql so that pubmed records can be accessed backwards from the mesh terms

Past releases

0.5.0

  • Unrecognised namespaces, and unrecognised queries are now signalled in RDF with an rdf:type of both http://bio2rdf.org/ns/bio2rdf#Error and also a specific error depending on what was not recognised, ie, the URI syntax or just the namespace was not matched to any providers for any queries that matched the syntax
  • Add /pageoffset(\d+)/${queryString} to enable paging through the results based on the SPARQL_RESULTS_LIMIT defined in the configuration
    • Page 1 sets the SPARQL "OFFSET" to 0 and the LIMIT to SPARQL_RESULTS_LIMIT. Page 2 sets the SPARQL "OFFSET" to SPARQL_RESULTS_LIMIT and so on. Really large numbers will consistently fail and default to getting Page 1, except for those between the highest integer value and the highest integer value divided by SPARQL_RESULTS_LIMIT which will fail with 400 errors on the SPARQL endpoint and return no information. These should be rare though as low page numbers should be well enough to get all of the most common queries.
    • Doesn't provide an exact number of results with each page, as the paging is performed at the endpoint before recombination at the Bio2RDF server, where duplicates are removed and the entire sets combined before returning a new rdf document to the user, but it will reliably page through the endpoint. If there is only one SPARQL endpoint provider defined for a particular query/namespace combination it will return a similar number of results each time up until the end of the real set. The total that can roughly be expected can be found using /counttriples/ and /countlinks/ for people interested in this.
  • Add /multiplelabel/namespace1:identifier1/namespace2:identifier2/etc.... which performs /label/namespace:identifier operations on each of the items in the list
  • Add /multiplelinkstonamespace/namespace/namespace1:identifier1/namespace2:identifier2/etc..., which uses /linkstonamespace/namespace/* on each item
  • Add /multiplelinksns/namespace/namespace1:identifier1/namespace2:identifier2/etc... to use /linksns/namespace/* on each item
    • Note: Even though the multiple services are low latency they still generates a large number of requests, so you should make sure that you use it wisely.
  • Integrated http://rdf.myexperiment.org/sparql as a source, with namespaces like myexp_TYPE for each of the types they have internally and myexp_ontology_TYPE for each of the ontologies
    • myexp_ontology_base,myexp_ontology_contributions,myexp_ontology_snarm,myexp_ontology_annotations,myexp_ontology_experiments,myexp_ontology_components,myexp_ontology_attribcredit,myexp_ontology_packs,myexp_ontology_viewdown,myexp_ontology_specific
    • myexp_downloads,myexp_workflow,myexp_user,myexp_pack,myexp_localpackentry,myexp_viewings,myexp_tagging,myexp_favourite,myexp_policy,myexp_file,myexp_creditation,myexp_group,myexp_membership,myexp_comment,myexp_groupannouncement,myexp_workflowversion,myexp_workflowversioncomponent
  • Add /queryplan/${queryString} functionality to determine using RDF which queries would be made for each query string that is submitted, without actually submitting the queries
    • In future add a feature so that users can take the /queryplan/ and execute it locally to reduce the load on the live servers
    • /queryplan/pageoffsetNN/${queryString} also works, as does /n3/queryplan/pageoffsetNN/${queryString}
  • Handle content-negotiation between different rdf file formats
    • Using the 2 Pubby classes [1] (Pubby released under BSD license)
    • Currently support application/rdf+xml and text/rdf+n3 which should cover the vast majority of users
    • /n3/namespace:identifier now works, along with content negotiation.
    • The configuration responds to content negotiation now so http://qut.bio2rdf.org/admin/configuration will follow the Accept: header directives if they contain a supported RDF format, otherwise it will return HTML
  • Updated urlrewrite library to version 3.2
  • Add more pipes to support different types of queries and provide examples for chaining calls to the Bio2RDF resolver using DERI pipes
  • Add DOI resolver using http://bioguid.info/
  • Add reverse construct for DOI that utilises the dc:identifier predicate when the object is a string of the form "namespace:identifier"
    • Initially created to support reverse construct of http://bio2rdf.org/doi: using the uniprot citations database
  • Add geospecies through http://species.geospecies.org/order_concept_uuid/25abd6a5-9acd-41ae-8a2e-3aae7c7b5d58/ and similar URL's. List of Namespaces below
    • geospecies_kingdoms,geospecies_phyla,geospecies_order,geospecies_orders,geospecies_family,geospecies_families,
    • geospecies_bioclass,geospecies_observation,geospecies_location,geospecies_spec,geospecies_specimen,geospecies_observations
  • As part of the geospecies integration, a fix was needed to enable endpointSpecific template elements to be used on provider endpoint URL's
  • Add http://purl.uniprot.org/core/title as a /label/ predicate
  • Add last server reboot date to /admin/stats and /error/blacklist including time based on the time the BlacklistController static class is initialised

0.4.1

  • Add recent rdfisers to the source distribution
  • Add permanent blocking feature to target robots who don't follow the Crawl-delay instruction in robots.txt and perform more than 200 queries in any 12 minute blocking period. (These numbers are configurable and the blocking can be turned off completely in a private configuration)
  • Add some common mistakes into urlrewrite.xml so that atlas2rdf.jsp never sees the incorrect information and it is put to lowercase using the *FROMLOWERCASE providers
  • Add /urirank/namespace:identifier which combines /countlinks/namespace:identifier and /counttriples/namespace:identifier using a pipe (bio2rdf_uri_rank) which also filters out the zero counts so only the real information comes back.
    • Note, this can result in empty RDF documents if either countlinks or counttriples timesout or returns no non-zero counts.

0.4.0

  • The current Bio2RDF databases are very inconsistent in their use of lower and uppercase for private identifiers. Ideally the identifiers would be the ones which were used in the database, but an experiment to convert them all to lowercase for some reason still exists, so there are a new set of queries aiming to produce information despite the URI's in the databases. This will inevitably introduce new issues of its own, but for the moment, there are some databases which require this hack in order to interoperate at all.
  • Support arbitrary namespace positions, and have an idea of which match groups correspond to public identifiers and which are private and hence shouldn't be modified except in extreme situations like the odd upper/lowercase situations
    • Implement the internal support for the arbitrary namespace matching. Need to modify the matches for namespace type methods.
  • Add timing measurements for performance diagnosis and server lifetime statistics for HTTP response error codes sorted by server, and overall requests made to particular endpoints over the lifetime of the server to go with the narrow window for complex problem analysis that was implemented for 0.3
  • Add compulsory user-agent header to any requests by the server. The structure of the user-agent is "Mozilla/5.0 (compatible; Bio2RDF/0.4.0 +http://bio2rdf.wiki.sourceforge.net/RobotHelp)"
  • Add NTriples encoded replacements for query templates. Available using ${ntriplesEncoded_normalisedStandardUri} and similar for inputs and query URI's.
  • Add additional html providers and some new query types such as /suppliers/ for restriction enzymes using the rebase database
  • Fix definitions so that /query/NAMESPACEPREFIX: and /NAMESPACEPREFIX: is no longer recognised, must be at least one character in the private identifier section before anything happens. Someone was trying to access them, and they do actually incorrectly appear in some rdfised Bio2RDF databases, which prompted the change. The change was from (.*) to (.+) in the relevant regular expressions.
  • Support unrdfised namespaces with a reverse construct that acts in a similar way to links, but is applicable to the namespace instead of globally
  • Add /multiple/ service which fetches multiple resources at one time without the need for multiple requests to the server. It only performs simple constructs, so custom services cannot be performed using this method. Syntax is /multiple/namespace1:identifier1/namespace2:identifier2/namespace3:identifier3/etc...
  • Add /linkstonamespace/linkednamespace/namespace:identifier to pick out the RDF statements where the URI object of the statement belongs to the "linkednamespace" and the subject of the statement is http://bio2rdf.org/namespace:identifier. Example is below:
  • Add pipe to automagically compute the linkrank for a URI. Ie, it will compute the number of triples and links around an item. Two of the most interlinked namespace:identifier combinations can be seen at the following links [1] [2], but it is also useful for other types of studies like pubmed interlinks [3]
  • Make the configuration page use URI's for namespaces. The syntax for the namespaces is http://bio2rdf.org/ns:NAMESPACEPREFIX . Both of the following URL's point to the same place.

0.3.1

  • RDF based configuration output
  • Integrate Semantic Web Pipes (ie, http://pipes.deri.org) to provide for complex transformations, and add the pipes to the configuration format so they can be called in response to related query aliases

0.3 beta

New feature list:

  • Avoid text-based output to RDF/XML files by parsing all files after simple regex rules and then being able to further modify based on rdf triples including outputting to different formats

0.2.2

Features:

  • Distributed access to the different bio2rdf sparql mirror endpoints, including partial endpoints which each contain one or more databases
  • Normalised access to the different bio2rdf mirrors which can contain non-standard URI's internally
  • Parallel fetching on different files
  • Error detection and configurable handling of the process for ignoring specific endpoints for desired amounts of time

0.1

Features:

  • Sesame based rdfizers with native Sesame RDF repositories
Personal tools