Taxonomic Intelligence Subgroup
Taxonomic Intelligence for Phyloinformatics
- 1 Members
- 2 Motivation
- 3 Activities
- 4 Namespace Resolver
- 5 Use cases
- 6 Products
- 7 Standup Reports
- 8 References
- Taxonomic names are the primary currency in phyloinformatics.
- Communication protocols need to be improved and services defined to insure that communication of taxonomic object occurs with an acceptable level of ambiguity.
- Taxonomic concepts are inadequately informaticized, so the solution has to be an imperfect but adequate one. One solution assumes a "nominal concept" with each name string for each TNS (taxonomic name service, e.g. ITS, NCBI, Species2000, etc)
- The main challenge is to communicate the meaning of a name by associating it with a TNS
- An important service is the translation in the meaning of requests among names with different TNSes
- Expand this service for the use case that involves requesting generic phylogenetic hypotheses
- Review Available Technologies
- Expand Syntax, Protocols, and Requirements
- Build Proof-of-Concept with Dummy
A user searching for all data on taxon x also could get taxon y where taxon y is a synonym of taxon x even though the tree is labeled only with y.
- A search for "Pagophilus groenlandicus" should also find anything labeled "Phoca groenlandica"
- ITIS records this synonymy (Pagophilus groenlandicus - valid - Harp Seal / Phoca groenlandica - invalid).
- NCBI has no results for Pagophilus, but returns results for Phoca...
- A search for "Argopecten" should also get anything labeled "Plagioctenium".
- Paleobiology database records this synonymy (Argopecten - valid / Plagioctenium - invalid).
A user searching for all data on taxon x could get wildly divergent trees if the the name "x" refers to two distinct taxa.
PhyloWS Interface to TreeBASE Data
Basic URNs to a Tree
The pointer to a tree can use a TreeBASE integer (e.g. "TB:2853") and a published legacy id (e.g. "LTB:Tree3586") via the treebase.org website.
That does mean that two different URNs each point to the same thing, e.g.:
If you want a tree in NEXUS, do this:
Basic URNs to a Clade
In this case the <nodeID> is a serially-generated integer starting from the root of the tree. We may redesign this to use a unique nodeID number.
This example returns the fifth node in tree with ID 2853.
Queries using SRU/CQL Syntax
GET /phylows/find/tree/?[query=<CQL query>]&[recordSchema=<format>]&[operation=searchRetrieve]&[version=1.2]
The query statement should be written in Contextual Query Language [CQL]. The "index" keys (as they are known in CQL) were picked arbitrarily for the convenience of TreeBASE's data model and namespace.
- taxon_name is a string such as "Homo sapiens", "Homo sapiens Linnaeus, 1758", "Mammalia" etc...
- taxon_label is a string attached to the node of a tree
- ncbi_taxid is a integer used by NCBI to track its taxonomic names
- ubio_namebankid is an integer used by uBio to track taxonomic names
- taxon_id is an integer for TreeBASE's own taxonomic names
- h.ncbi_taxid is a higher classification search based on NCBI's classification and using the NCBI's taxid
- h.taxon_name is a higher classification search based on NCBI's classification and using a taxon name string
- Search for all trees that have both a taxon starting with Homo sapiens and a node linked to NCBI taxid 9593 (which happens to be Gorilla gorilla): query=taxon_name+any+%22Homo+sapiens%25%22+and+ncbi_taxid+%3D+9593
- Search for all trees that have either any kind of Primates OR any kind of Aves: query=h%2Etaxon_name+any+Primates+or+h%2Etaxon_name+any+Aves
Given a name (language optional) or a number (language specified) or an LSID (no language required), return a collection of names or numbers in the other languages TNS knows about
- Exploring different options for cross referencing OTUs.
- use CDAO - this may not be flexible enough to point to all the kinds of resources we want to point to.
- Create resolver PURLs (persistent ULRs)
- Will be looking at the NeXML and PhyloWS.
- Get round trip exercise.
- Do ontology people have any comments about getting this type of information into NeXML.
- We need to parse exactly what taxonomy group is putting in.
- Hilmar argues against putting human readable content in these XMLs. Want IDs.
- When people ask TreeBASE for a tree, it sends them that tree, ideally with as much decoration on each node as we have.
- going to have to actively think about how we would digest the DICT elements.
- Matt K:
- We don't care how it gets in there, as long as it gets there.
- Concerned that people want to make the XML be human readable.
- Building proof of concept
- Put TreeBASE dump on DBHack repository
- Have a perl script that takes guid numbers and builds NeXML documents.
- Will probably build some rest services for that.
- Develop a better idea of what we want.
- Playing with Ubio, chatting with Rod Page.
- What kind of data would we like, what's can't we get at UBio.
- TreeBASE REST API Documentation
- Get so far, no post.
- Find resources by ID #
- Get this tree from TreeBASE.
- XML currently has additional data that could be made available, buried in well formatted comments.
- Roger: want to label what the author has told you, versus what you are saying.
- But that is there just to show you what could be rendered in the XML if people are interested.
- Clade should be a global scoped ID. If someone asks for a clade from a tree, and the clade exists, but not in that tree, generate a 404 error.
- Allows spec to support resources that don't have global clade IDs.
- Arlin: Why do you want to reference a clade in a tree?
- Hilmar/Roger: How do you tell what is the same thing?