Issue/Versioning

email: Proposed approach to versioning and shared names

I have been thinking about how to handle the versioning issue in the context of shared names.

Assumptions:

  • Some databases explicitly version at the record level and have distinct identifiers different version so the same record. An example is Refseq.
  • Some databases version at the "build" level - i.e. the whole database gets a new version at once, however new accessions/identifiers are not created.
  • Most database do not explicitly version. Instead we always have the "latest" version available.
  • Previous attempts to manage versioning in similar efforts have failed because they assume too much about what the provider will do.

Therefore, I propose the following:

  • In all cases, there will be a single "about" record without commitment to encoding or version.
  • In those cases where a provider explicitly names versioned records, a uri will be allocated using the provider's identifiers, and the about record will link to it.
  • In those cases where the the database as a whole is versioned, and where the provider or a third party provides access to prior versions, the committee will determine the form of the URI for the prior versions, for example by appending the database version to the identifier. The URI will be linked to in the about record.
  • In the case of databases that do not explicitly version, third parties who archive versions and wish to make them available via shared name will be handled in the same way we handle other third party metadata, i.e. review followed by inclusion in the about record. The URI form will concatenate the identifier and the date on which the record was archived.
  • Metadata provided in the about record will record information known about the version, e.g. date, build, and links to any provider information about versioning policy, so those clients who have a particular interest in versions are able to easily determine which versions are present and whatever is known about them.

A policy along these lines give use the flexibility to provide useful naming for versions, while not committing more than we can to what, precisely, one can expect vis a vis persistence or policy.