Meetings/2009-04-29/ReesNotes
Jonathan Rees - Presentation and Slides
- Project started with SW focus
- If you have data rendered in RDF and they share URI's, then it is a point of integration
- Community is people that have significant RDF projects in life sciences.
- Problem in this group is specifically references to a small set of well known databases (e.g. NCBI taxonomy, Uniprot, Pubchem, etc.)
- Presented examples of non-URI and URI references
- Definitions
- Name - used to stress ontological nature of the project
- Identifier - means the short id in the databank
- URI - names by Jon's definition
- Recommended two papers: Good and Wilkinson 2006, Page 2008
- Scope of project consciously limited - "records in public life science databases"
- Other entities are out of scope
- Biological entities are out of scope (focus is on just information artifacts, not the subjects of the records)
- Names for biological entities are out of scope
- In OBO foundry, the exact same technical infrastructure will be as what is adopted here.
- Requirements (system has to have these properties…)
- High http availability
- Ontological clarity
- User oriented naming system
- Persistence story
- Named basic requirements (names, services, content, administrative interface, infrastructure, responsible social entity)
- Names will be http: URIs
- Services - open source, existing tools, PURLz in prototype
- Lookup yields RDF "template" - within one Databank, every ID has one document, differing by ID
- W3c linking open data (lod) - (notion of "about" document)
- You get metadata from the record, NOT the record itself. Shared names returns "yes there is an article and you can find it…" - NOT the author, etc.
- Should shared names keep track of 404's? (keeping track of when identifiers change)
- 404 issue - is transform into references strictly lexical or does it involve knowledge of the resource?
- Fist round (0th order implementation) 50-2000 databanks
- Keep track of partial redirects for XML and html's (very few databanks provide > 1)
- One template for about-documents
Alan: One of the goals is repeatable computational experiments
Alan: One of the goals is - metadata is entirely predictable
Alan: One of the goals is to accommodate "errata" in the future
Jeffry: One about document and the space you put in an identifier? (Alan: - see demo) How does it know which variant to use?
Harold: are goals documented? Alan - yes, on wiki site.
Dave: Are templates maintained by databank over time? Jonathan: no - maintained by databank committee
David Wood: If a bug exists in a shared name system, is it up to the committee (or community) to make a change? Alan: Only responsible people.
Alan: Will have a discussion of versioning on later on. May, for example, need a specific build - metadata needs to specify this. Top level record is without commitment to encoding and version. Make sure that each name is well documented, so we can understand expectations.
- Infrastructure - replicates, encourage mirrors, automatic failover, master/slave w/ reassignable master
- Steering committee - populated by users, owns domain names and DNS, responsible for specification and QC. Not running services (svcs would be volunteer), but they are responsible for what you get back.
David Wood - when Larry and Bob created the handle system recognizing technical and social infrastructure changes, have to be resilient to both. Need an org structure whose members can come and go at will addresses the particular problem. You've tied yourself to http uri's and DNS at the moment.
- Handles and PURLZ have been invited in part for a dialog about issues faced, what real problems are faced.
