Use Cases

Use cases have currently been volunteered from the BioRDF task force of the Health Care and Life Sciences Interest Group (HCLS IG), the Mayo Clinic, the National Center for Biomedical Ontology, and the OBO Foundry. Please add your use case below (in alphabetical order).

Contents

Health Care and Life Sciences Interest Group

BioRDF

[Note: Although members of another project called "Bio2RDF" participate in HCLSIG, BioRDF is not Bio2RDF!]

In ongoing work in the BioRDF task force of the W3C Semantic Web Health Care and Life Sciences Interest Group (HCLSIG), we would like to refer to gene names associated with a given microarray study as Shared Names URI's. The Microarray Experiment Context page describes other types of information that we would like to represent in RDF and its relation to query federation. We need Shared Names to tell us which URI's to use for a given set of gene names. We would like to work with Shared Names URI's without requiring them to first be supported by PURL servers in order to avoid having to recode our work so the need is fairly urgent.

From Scott Marshall, Feb 12, 2010

BioRDF has several gene lists from microarray studies that we would like to represent in RDF, with the gene names as Shared Names URI's. This seems like a good opportunity for Shared Names to dictate what the gene URI's should be, as well as serve them from the PURL servers (initial data set). Can we agree on what a Shared Gene Name should look like as a URI? How about: http://sharedname.org/gene/1234567 ?

Mid term:

Also, HCLS and CWA are helping to set up a SPARQL endpoint for SIB. I suggest that we ask SIB to join Shared Names with the notion that Shared Names will be provided for proteins through that same endpoint. A gentle probe of the issue a few months back encountered no resistance from head of software at SIB (Nicole Redaschi).

Mayo Clinic

The following use cases don't come directly from the Mayo Clinic, but, then again, they aren't really use cases...

  • Citation of Electronic Resources (ISO TC 37/SC4 CD 24619) - the irony is that this draft standard can not be cited as an Electronic Resource, as ISO is a closed organization. The document does draw on a couple of issues that need addressing
    1. Resource part and fragment identifiers - This standard differentiates these by who is responsible for resolution, with "parts" being intended for server side resolution and fragments for client side resolution. It notes that fragments are indicated by the presence of the hash (#) and terminated by the end of the URI. ISO CD 24619 also points out that resources come in a wide variety of granularity, and that resources may frequently be contained in other resources or be members of secondary resource collections. The current Shared Names examples tend to be fairly simple when it comes to referencing a resource, tending to go on the notion that there is a one-to-one mapping between a "record" and a reference. The NCBI Query URI's are described as being "error prone" in [1] because the CGI query components can be permuted. In addition, the URI's are deemed non RDF-friendly because of the CGI question marks. IF, however, we are to address embedded resources (CD 24619 uses the MPEG standard as an example, where a reference might be to a particular begin and end time on a particular track), we need some way to address multiple "parts" in a canonical, RDF-friendly fashion. The mapping of the parts to the particular resource description would obviously be database specific, but I believe that we need to make some statement about how they are represented. In addition, I believe that we need to make a statement on "fragments" (which, imo, should be discouraged, as the decision on "client side" and "server side" processing is subject to change.
    2. Collections - The standard describes the notion that a resource, such as a journal paper, may be included in more than one containing resource. An example that is used is that an image, which may both be referenced as an independent resource or could be embedded in another document. Whether this applies here or not, I'm not sure how this might imply, but the general notion is that, to the extent "containment" is embedded in the URI - whether it be as "database" or a hierarchy of components, is the extent that this might become more or less relevant. As usual, we have to balance the notion of URI's with embedded meaning for ease of construction and use with URI's that carry more semantics than what we actually intended.
  • Scope - While I agree in principle to the notion of starting with a narrow scope and expanding, I am concerned that Shared Names may be an exception. If this project is seen as applying only to "records" in the biomedical domain, this ends up being yet one more identifier space to be addressed by anyone who wants to develop a unified system. The rule Is the entity I wish to identify is about something, then it is in scope. that was stated in the meeting seems useful. One of our communities of interest is the "terminology" community. We manage descriptions of external entities, and we need the ability to get from a URI to a description (or descriptions) of the thing being described. We don't care if people choose to believe that these URI's reference the things directly - that issue is outside of our scope (although we might send them in the direction of Ogden & Richards...), but the aspect that we are interested in is shared names for descriptions, be they dictionary entries, thesaurus entries or entries on a formal ontology.

National Center for Biomedical Ontology

Shared Names for Ontologies and Ontology terms

  • Develop URIs for ontologies in BioPortal that do not already have URIs, ie non-OBO Foundry ontologies. In the event that the ontology developer would like to develop their own scheme, these can be modified. The naming scheme will the same as developed for the OBO Foundry ontologies.

Shared Names for Ontology-indexed Resources

  • Develop URIs for Resources included in BioPortal that have been indexed based on ontology term. Currently these resources include Array Express (AE), ARRS Goldminer (GM), the Conserved Domain Database (CDD), Online Medlian Inheritance in Man (OMIM), PharmGKB, Reactome, and UniProt.
    • JAR tells me (Dan Connolly) that this is in progress as RRO in OBO.

Mash-ups based on URIs for records from biomedical databases

The Main Page says this is the motivation for the project. Above I (Dan Connolly) see use cases involving URIs for genes, but not for gene records. Help?!?!

Resolution within an Enterprise

Consider BigBioCorp who wants to make shared names availble for use in their research without disclosing their use to other parties, in order to protect trade secrets. In this case, JoeAdmin@BigBioCorp.com should be able to bulk-download the data and operate a proxy service within this enterprise, with periodic bulk updates.

What patterns of access do they desire/require? GET/lookup? or SPARQL query? both? Evidently they require bulk-download and bulk-update.

Repeatable computation

Suppose Sarah computes a result using data from records with shared names; Bob should be able to reproduce her results.

Related issues/requirements:

data availability
is mirroring the data in the records worthwhile? (don't forget costs of researching/securing licenses)
versioning
Do we expect to have distinct names for distinct versions of records?


Shared curation of dbxrefs

I'm pretty sure JAR told me about this, but now I can't find the mail message or skype log or whatever. Dan Connolly 17:21, 30 April 2010 (UTC)

2 databases with list of dbxref prefixes. OBO and reactome?

sharednames becomes a coordination point to consolidate effort.

Server fail-over

use case story is a little obvious, but worth writing down in due course...

requirements questions:

  • In Design Notes (JAR) it says that the main functionality is coreference, not reference; so how much reliability is needed?
    • This seems to be in conflict with the Steering committee process, which says "The committee exists in order to make sure that its URIs have value to those who use them - in particular, that they resolve to information that is useful. "
  • Is something that takes an hour to repair itself OK?
  • Is something that involves a system administrator getting paged
    and manually fixing it OK?

Dissolution of the Steering Committee

Is this use case out of scope? Mostly... but... odd... the Steering committee process says both "WE HAVE NOT DISCUSSED THIS YET. I HAVE NAMED A FEW ORGANIZATIONS HERE ONLY AS A STARTING POINT FOR DISCUSSION." in the relevant section and "This page represents a decision of the Steering committee and should not be changed except by action thereof" at the top.