Taxon Name Service I/O

From Evolutionary Informatics Working Group
Jump to: navigation, search

This is a sub-section of the Taxonomic Intelligence Subgroup

Functions of a Taxon Name Service (TNS)

  • Takes a name (language optional) or a number (language specified) or an LSID (no language required)
  • Returns a collection of names or numbers in the other languages it knows about
    • Examples:
    • Given "Aves", returns ITIS=174371, NCBI=8782
    • Given ITIS=174371, returns ITIS=174371, NCBI=8782
  • In the context of this subgroup, the TNS is a black box (we don't specify how it does the name resolution)

Other wish list items

  • Also return a classification path, which would allow us to:
    • annotate internal nodes in a tree
    • potentially resolve homonyms given a phylogenetic content
  • Flexible name search options
    • exact name searches, e.g. so that "Gallus gallus" doesn't return Alcides gallus or Anthrax gallus
    • support CQL language queries

Sample Code

uBioFunctionsXML.php

function uBioNamesXML($searchString)
  • input: free form taxon name
  • returns: array
    • array key is the namebankID
    • array value is the namebank name

uBioFunctionsSOAP.php

function uBioObjectSOAP($namebankID)
  • input: namebankID
  • returns: namebankObject.
function uBioTNSviaSOAP($namebankID)
  • input: namebankID
  • returns: array of other taxon numbers
    • array key contains the provider
    • array value is the provider id value

uBioSearchForm.php

  • demo if these are all in the same directory...
  • if you point your web browser to this page you can demo the search refinement.
  • tweak line 46 to point the resolved search someplace useful.

The first proof of concept search. SmallForm.jpg

After the first form is submitted you should get the second proof of concept search page.

  • this result blows apart the search string using spaces and then does the overlapping set.
  • this does not work (well) for the duplicate binomial (Gallus gallus [as the two sets are the same]).
  • the original namebank ID comes from the XML WebService.
  • the other database numbers are looked up using soap services.

LargeForm.jpg

uBioFunctionsDISP.php

  • these functions are used to demonstrate the proof of concept...
function checkbox_from_array($namebankData,$select_name,$selected)
  • takes parameters and makes checkboxes for a form as blob.
function checkbox_from_array_table($namebankData,$select_name,$selected)
  • takes parameters and makes checkboxes for a form as a table with the SOAP tns lookup.
function addquotes($s)
  • wraps a variable in quotes

uBioTest.php

  • a collection of code while I was trying to figure out how the uBio SOAP service functions.
  • this is not required for the proof of concept or the demo.

Existing TNS examples

Glasgow Taxonomic Name Server (GTNS)

  • GTNS is not presently being updated or supported.
  • GTNS takes "taxon_name" via POST to find_name_result.php
  • in: taxon_name="Aves"
  • returns:

<xml> <Taxon namespace="GNS/TaxonName" ID="21753">

 <CrossReferences>
   <Object namespace="ITIS/id" ID="174371"/>
   <Object namespace="NCBI/id" ID="8782"/>
 </CrossReferences>

</Taxon> </xml>

uBio WebServices

  • web services via SOAP (some functions broken; poorly documented)
  • XML based web services (but multiple cross-referenced queries required to get all desired information)
  • we have corresponded with uBio and Anthony was quite helpful although the actual services are not well suited to our purposes
  • Anthony told me that: "all function parameters must be specified"
  • as the web page descriptions are not always right - hit the wsdl file for what needs to be passed.
  • through experimentation this appears to work (using the wsdl file parameters not the web page parameters).

TNS version 1

TNS version 2

  • TNS 2 is also seriously broken TNS2
  • http://www.ubio.org/soap/ has a wsdl file.
  • a quick read of the function list quickly reveals some serious issues with the documentation.
    • e.g., classificationbank_search lists: namebankID, clientVersion, requestorIP, and classificationTitleID all as "NameBank identifier you wish to search for in ClassificationBank" - this is obviously unlikely to be true.
    • e.g., the parameters for namebank_search on the main webservice page are distinct from the parameters listed on the detailed page namebank_search page. The wsdl file at http://ubio.org/soap/ matches the main page.
  • we have successfully retrieved namebank objects based on namebankIDs...

<php> $uBioClient = new SoapClient( 'http://www.ubio.org/soap/'); $nameBankObject = ($uBioClient -> __soapCall("namebank_object",Array( "namebankID" => "21646", "keyCode" => "f74cc909af5e2aa4f4f6a78fc3795aae6bcedabb" ))); </php>

  • we have not been able to search names (only return service status):

<php> $uBioClient = new SoapClient( 'http://www.ubio.org/soap/'); $nameBankObject = ($uBioClient -> __soapCall("namebank_search",Array( "searchName" => "Homo sapiens", "searchAuth" => , "searchYear" => , "order" => , "rank" => , "sci" => '1', "linkedVern" => '0', "vern" => '0', "keyCode" => "f74cc909af5e2aa4f4f6a78fc3795aae6bcedabb" ))); </php>

  • we can get a list of classifications containing this taxon...

<php> $uBioClient = new SoapClient( 'http://www.ubio.org/soap/'); $listOfClassificationsObject = ($uBioClient -> __soapCall("classificationbank_search",Array( "namebankID" => 21646, "clientVersion" => , "requestorIP" => , "classificationTitleID" => , "keyCode" => "f74cc909af5e2aa4f4f6a78fc3795aae6bcedabb" ))); </php>

  • we can get hierarchies...

<php> $uBioClient = new SoapClient( 'http://www.ubio.org/soap/'); $classificationHierarchyObject = ($uBioClient -> __soapCall("classificationbank_object",Array( "hierarchiesID" => $answerFromPreviousQuery["classificationBankID"], "childrenFlag" => '1', "ancestryFlag" => '1', "justificationsFlag" => '0', "synonymsFlag" => '1', "clientVersion" => , "requestorIP" => , "keyCode" => "f74cc909af5e2aa4f4f6a78fc3795aae6bcedabb" ))); </php>

XML web services

namebank objects by namebankID:

namebank objects by name search:

  • but notice that the above returns an orchid as well as class Aves... "Pleurothallis aves-seriales Luer & R. Escobar" as well as the class aves. There does not appear to be a way to get the ubio service to require exact matches.

uBio return object

a small snippet from the xml file containing the namebank object to show syntax / tags. (Name strings are base-64 encoded).

<xml> <namebankID>21646</namebankID> <nameString>QXZlcw==</nameString> <fullNameString>QXZlcw==</fullNameString> <languageName>Scientific Name</languageName> <rankName>Class</rankName>

<mappings>

<value> <collectionsID>1</collectionsID> <foreignKey>174371</foreignKey> <collectionsTitle>ITIS</collectionsTitle> <collectionsURL>http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=174371</collectionsURL> <logoFile>itis.png</logoFile> <logoFileLinkIT>http://www.ubio.org/tools/image/itis.gif</logoFileLinkIT> </value>

<value> <collectionsID>15</collectionsID> <foreignKey>8782</foreignKey> <collectionsTitle>NCBI</collectionsTitle> <collectionsURL>http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=8782&lvl=3&lin=f&keep=1&srchmode=1&unlock</collectionsURL> <logoFile>ncbi.png</logoFile> <logoFileLinkIT>http://www.ubio.org/tools/image/ncbi.gif</logoFileLinkIT> </value>

</mappings> </xml>

What a TNS "should" return

Perhaps a name service should return:

Where the name is specified. In this trivial example it is not important, but if a particular database uses a different name (in the case of a synonym) it would be important. There seems to be little justification for base-64 encoding names...

<xml> <Taxon namespace="GNS/id" id="21753" name="Aves" >

 <CrossReferences>
   <Object namespace="ITIS/id" id="174371" name="Aves" />
   <Object namespace="NCBI/id" id="8782" name="Aves" />
 </CrossReferences>

</Taxon> </xml>

Perhaps even better a name service would return:

Where the name is specified. And the various references are in a slightly more parse-able format?

<xml> <CrossReferences>

  <CrossReference>
      <string id="namespace">GNS</string>
      <string id="id">21753</string>
      <string id="name">Aves</string>
  </CrossReference>
  <CrossReference>
      <string id="namespace">ITIS</string>
      <string id="id">174371</string>
      <string id="name">Aves</string>
  </CrossReference>
  <CrossReference>
      <string id="namespace">NCBI</string>
      <string id="id">8782</string>
      <string id="name">Aves</string>
  </CrossReference>
  <CrossReference>
      <string id="namespace">PaleoDB</string>
      <string id="id">36616</string>
      <string id="name">Aves</string>
  </CrossReference>
  <CrossReference>
      <string id="namespace">COL</string>
      <string id="id">1854</string>
      <string id="name">Aves</string>
  </CrossReference>

</CrossReferences> </xml>