Work packages/Tracking ids

The task is to to track what all the ids for a databank are. In some cases, such as with the large providers, this can change daily. In other cases, such as with post-doc databases, it may be stagnant. Several approaches have worked in the past.

  • The direct method: The provider has downloadable version of the database that can be parsed for identifiers.
  • When such a database is not available, or not kept up to date we can sometimes spider the web site. Two approaches:
    • There is some category or index pages which can serve as the spidering root
    • There is a search box and by using generic searches ("e", "1*", "2*", etc), we can cover the entries

We can also consider approaching the databank provider to see whether there are ways of getting the active ids that we are unaware of, or whether they might be willing to provide an easy way of doing so.

Whichever method is used, one needs to have software that checks the ids on a regular basis, and issues events when the ids are changed so that clients (our servers, etc) can act on them.

As with other regular tasks that are done, there needs to be checks that the process itself doesn't go awry. Some things that come to mind:

  • Noting an interval of time past which one if there are not id updates someone should check things are still working.