Meeting 1 notes

From Evolutionary Informatics Working Group
Jump to: navigation, search

Day 1, Monday

The day began with brief talks from Todd Vision (NESCent intro) and Jory Weintraub (outreach), then "lightning talks" from participants. This was followed by a "visions of interoperability" discussion that sometimes strayed from the big picture. Arlin began with his vision of evolutionary methods being integrated transparently into workflows in genomics, including the problem of "functional inference" recently given an explicitly evolutionary treatment by Engelhardt [[1]] . Sudhir had a pragmatic vision of getting the top phylogenetic analysis programs or packages to talk to each other. In contrast to the utopian vision in which interoperability is uniformly positive, David's dystopian vision was that if methods are automated and become easy to use interoperably, then they will be mis-used by people who don't know what they are doing.

After lunch, Arlin presented the results of the tentative prioritization exercise that was carried out via email earlier this spring. The top area for attention was "supporting current standards". This in turn suggests several task areas, such as writing formal specifications for popular software, assessing user practices, etc. Arlin suggested that we could decide on a scheme such as this and then break out into groups. In general, considering the following stages of possible activities:

  1. Brainstorming
  2. Analysis (getting relevant information and analyzing it)
  3. Planning (design)
  4. Implementation
  5. Evaluation (how well does it work?)
  6. Dissemination (how do we educate or convince others?)

we want to get beyond brainstorming.

However, there was an objection to the presumption that we were ready to define specific tasks, so Arlin's plan was abandoned and we went back to a discussions.

After the break we returned to discussion. In the discussion we kept coming back to the idea of a generalized character-state data model formalized (for instance) as an ontology.

At the end of the day Arlin summarized the discussion and proposed that we begin Tuesday with a set of more focused presentations. The following people volunteered to give a brief presentation advocating some particular project:

  1. (Aaron) Unifying character data analysis formalization
  2. (Mark) Pragmatic gap analysis and remediation
  3. (Sergei) Substitution model language as a token problem
  4. (Weigang) Dissemination and advocacy of standards and best practices

Day 2, Tuesday

  • Summary (Arlin)
  • Question: what does "formalization" imply? (Aaron)
  • discussion about dangers of making analyses too easy for users

Summary: two extremes, from format pathologies to next-generation artefact

Lunch

We went from here to an actual discussion of the data model

Aaron: starting question is what are taxa?

Taxa

  • "taxa" terminology out-dated; OTU unpopular
  • just a reference, a label, a unique ID?
  • or a container, possibly empty, or containing data, possibly linked to a tree
  • call it a "node", treat it as a container, possibly empty
  • ancestral vs. observed

Characters

  • not primary, result from homologizing (hypothesis of homology)
  • difference between observed and inferred

AnalysisSet - what is the larger unit

  • decided on union of all relevant nodes (taxa)

Break

Summary of progress. Supermatrix issue (David S.).

Trees (Rutger)

Day 3, Wednesday

Summary:

  • Themes to which we keep returning
    • NEXUS flavors, conflicts in interpretation, ambiguities
    • the next data exchange standard format
    • maintaining quality, avoiding errors
    • a language for transition models
    • conventions for managing mixed data sets
  • Some tentative conclusions
    • We seem to value the idea of developing a unifying ontology (or other artefact)
    • We do not find barriers or acute drivers for interoperability, rather we see opportunities
    • We recognize an overlap with community needs in the area of a transition model language
    • We value the 80:20 rule: the simple solution covers 80 % of user needs, while far greater effort is required to get the remaining 20 %.

This summary was followed by discussion to identify possible deliverables listed elsewhere in the next section.

Followups and deliverables

Documentation

  1. re-factor wiki (Arlin)
    1. topic for each deliverable
    2. topic for file format examples
  2. add meeting notes (Arlin)
  3. upload relevant papers (Gopal, others)
  4. generate and add UML from Tuesday session (Hilmar)

Substitution (transition) model language

  1. clarify problem, identify stakeholders, suggest evaluation scheme
  2. propose work session under short-term visiting scientist program (Mark)

Ontology version 0.1

  1. file format examples (wiki topic), include documentation, pathological examples
    1. clustalw (Sudhir)
    2. various (BioPerl test suite)
    3. MEGA (Sudhir)
    4. NEXUS flavors (Mark)
    5. NEXUS pathologies from DataMonkey (Sergei)
    6. NEXUS Mesquite extensions (Rutger, [Sergei - I am going to put some of them in HyPhy so will also assess generalizability and utility])
    7. NEXUS Bio::NEXUS extensions (Aaron)
  2. formalization of MEGA, PHYLIP
  3. validation of chosen file formats, possible via NESCent-hosted server
  4. (possible) circumscribed demonstration of syntax highlighting to allow identification of errors
  5. staged evaluation strategy for ontology

Outreach, dissemination, partnering

  1. identify possible partners and funding opportunities for ontology development
    1. National Center for Biomedical Ontology (Hilmar)
    2. pPOD http://phylodata.seas.upenn.edu/cgi-bin/wiki/pmwiki.php
    3. TreeBASE
    4. Adam Goldstein (the philosopher, not the disk jockey) his blog[2]
    5. PosSUM (european) data standard initiative http://www.possum-datastandard.org/possum/index.php/Main_Page (Rutger)
    6. INTEROP: a new NSF Program Solicitation: Community-based Data Interoperability Networks[3]
  2. Analysis (wiki report) of interest of journals in databasing alignments and trees (Weigang)
  3. Other institutional centers of influence or possible interest
    1. NCBI (pop set), ask Lipman (Arlin)
    2. EB, as Ewan Birney (Aaron)
  4. Request for comments on a Future Data Exchange Standard (Rutger)

Some future items to consider

  • recommendation regarding an evoinfo standards organization
  • white papers on topics of interest
  • projects for students a la summer of code