Database Interop Hackathon/Teleconferences

From Evolutionary Informatics Working Group
Jump to: navigation, search

Teleconference June 29, 2009

The teleconference is planned for 1:00 pm EDT. Connection info will be distributed on the day of the event.

Agenda

Notes

participating:

Karla, Enrico, Ryan, Sheldon, Todd, Karen, Greg, Arlin, Mark, Nico towards the end.

Arlin proceeds:

  • Interop proposal well suited to present state of this group
  • Specifics of the RFP important to keep in mind:
  • multidisciplinary (relative to divs at NSF -- BioSci, CompSci, PhysSci)
  • proposed project not just to continue sci/tech work, needs community building component (decisions on stds/conventions) + tech component where collaborative provides tech (some $$ to programmers, e.g.) -- well positioned for this as a group; a "perfect fit in that sense"
  • $$ for meetings / virtual or physical. Key task is to figure out what we're going to do specifically (workshops, hackathons, sattelite meetings, etc), This is "the hard part".
  • needs to be one PI, can be many co-PIs and letters of support. Don't have to id all possible community members a priori.
Enrico: only 4 official co-pis in an NSF (vis-a-vis senior personnel)?
Karen : explicitly says in RFP there is no limit
Todd : not really very much to gain from many co-PIs.

Arlin's rhetorical questions:

  • Could NESCent provide meeting space?
  • Todd: yes, already a policy of hosted meetings (free to peeps paying expenses and doing logistics), would need staff support
  • Karen: there is a set-up at the Field for this, travel may be cheaper (Chicago being a hub)
  • Arlin: could also have regional meetings, where remoter folks call in.
  • Who will be PI?
  • Arlin: This is very important to work out very soon.
  • Karen: would love to be co-PI, not senior enough to PI
  • Arlin: could do it, personal plans make it tricky however

Nescent person?

  • Todd: NESCent tries to avoid being PI on these kinds of collab ventures.
  • Objectives? Community building?
  • Karen: Interact with existing large collab projects--EoL, also iPlant ToL workshop, BioSynC center at Field, particularly the TreeViz group, all had interop as black boxes in their discuss, but noone in the room willing/able to do take it on.

The groups certanly may sign on to our good idea. Would these large groups also commit to material contributions/collaborations?

  • Todd: letter needs to be a commitment not just 'good idea'. Ideally, obtain documentation of cost-sharing in a letter, estimates of person/hours, etc.

Note the RFP states explicitly: "general letters of endorsement may not be included"

  • Karla: Steve Goff informed about the interop proposal, is on his radar; interop organizer needs to interact with him.
  • Sheldon: EOL/iPlant are going to be drivers of interop -- important to have their contributions and buy-in
  • Mark: PI from one of these?
  • Enrico: want to recruit PI from contributor to the Stack
  • Karla or Karen (sorry!): talk to Mark Westnutt (lead on TreeViz working group) regarding getting EOL on board; Michael Sanderson (ToL/iPlant)
  • Sheldon: any commitment of iPlant time/material resources would be handled by Steve Goff


  • What are the technical objectives of the proposal?
  • Arlin: (1) ability to annotate trees with metadata, (2) ability to transfer/store/query/display the annotations
  • This is a sizable, important problem, that we have the technology to address-- could be a core technical goal for the proposal? [general assent]
  • Other things on that scale of interest?
  • Sheldon: The taxonomic intelligence problem is one, but may be out of the scope of this proposal. Difficult to proceed on this, since Bill P. unavailable.
  • Karen: EOL proposal in now to address this issue
  • Mark: There is really a dual effort involved here: both a technical and "diplomatic" interoperability. Selling the proposal may benefit from a detectable emphasis on the former as well as the latter.
  • What is the "community thing"?
  • Arlin: The objective we go the groups with, "this is what we want to work on":
  • Why would they work with us?
  • How would we work with them?
  • Enrico: dedicate a hackathon to each provider project?
  • Karen: should workshops have a focus on the data providers/groups, or on the data types (morphological, genomic, etc). Would build in valuable overlap. "There are a whole bunch of kinds of data less easy to categorize than a tree or molecular matrix."
  • Arlin: Should this be another major topic in the list? Ex) phenoscape project, large effort into annotating morph chars in ontological terms, storing in NeXML. Perhaps this is a bigger issue, and we should stick with the trees .
  • Karen: So, how is "metadata" defined?
  • Arlin: thinking of taxonomic identifiers, char matrices
  • Mark: then this ultimately bleeds into Karen's conception
  • What is our practical plan?
  • Arlin: Practical plan:
  • 3-5 years, operational limit 3yr
  • $250K total (direct + indirect) ~ $180K net. Would pay for programmer, grad student, hackathon, workshop or two; unlimited virtual meetings; pay 1FTE of programming
  • So, what are we doing in year 1, year 2, year 3? Filling out table like:
Year Tech Support Community Support
1
2
3
  • Sheldon: early focus on community building, identifying use cases, later develop technology, interleaved with more iteration later emphasis on outreach
  • Arlin: activities - hackathon, training via workshop (separate, or at conferences)
  • Sheldon: similar to the GMOD paradigm, tech building/input seeking (year 1); training/hacking (years 2, 3)
  • Karen: in moving forward to finalize the Stack, we need to bring in the data providers-- e.g.) does the type of data they have fit in with what we're providing?
  • Karen: first exercise should by more targeted- contact data providers , later the workshop stuff
  • Sheldon: has been gmod as an instructor in GMOD outreach; will ask Dave Clements for advice

Sheldon volunteers to find out how to run a workshop, determine the best conferences to target.

Karen volunteers to get cost estimates at NESCent and the Field for having a hackathon-type thing

What other things are there to do?

  • Arlin: first year about collaboration : mailing lists, help desks, unified web presence

Mark volunteered to write paragraph about setting this stuff up.

TDWIG contact/collaboration

  • Karen: contact/collaboration with TDWIG ? Focus is US, but there should be overlap

Hilmar volunteers in absentia to explore contact; Roger H. as well should be approached-- are you listening, Roger?

  • Arlin: good to have explicit reference to TDWIG in the proposal
  • Greg: EBI (particularly ENSEMBLE group, which has its own outreach dept.) has a strong background in workshop running at his institution--will keep in touch with Sheldon.

Nico Cellinese (UF/TOLKIN project) joined call.

  • Nico: brief bio: codeveloper of TOLKIN.. currently managing large amount of systematics data, moving beyond storage, embedding workflows within existing infrastructure
  • suggests a reference implementation of the EvoInfo Stack (currently just importing/export NEXUS files); CDAO could be really beneficial; enable export of an entire project as a CDAO ontology, which would guarantee a permanecy to the data.
  • strongly suggests broadcasting these standards through TDWIG (NESCent is still a national organization).
  • There appears to be overlap overlap between CDAO and Darwin cores; this should be hammered out.
  • Arlin: points very relevant to interop proposal; disseminating standards through TDWIG, and the reference implementation -- would be nice if the testbed had workflows; other repositories that don't do analysis don't often have that.


Summary

Major goals
  • Technical Objectives
  • To deliver the necessary software standards and tools to annotate phylogenetic trees with the data and metadata used to create them;
  • To create the necessary tools to convert existing data into the EvoInfo framework on a medium- to large scale;
  • To create the software infrastructure necessary to be able to transparently use EvoInfo compliant data in web services and workflows.
  • Community Objectives
  • To provide the foundation for an interactive virtual user community around the EvoInfo Stack and its components;
  • To reach out to large data providers and users, and impart a sense of participation and ownership of the EvoInfo standards through active needs assessment and training.
Action Items (Person)
  • Find a PI (Arlin)
  • Cost out workshops (Karen)
  • Workshop logistics (Sheldon, Greg)
  • Web presence/Mailing List/HelpDesk (Mark)

  • Working Group proposal
Tabled till next telecon.
  • Follow-up Hackathon
Tabled till next telecon.

Teleconference June 23, 2009

The teleconference is planned for 1:00 pm EDT.

The purpose of this meeting is to plan follow-ups to the successful hackathon in March.

Agenda

tentative draft of agenda

  1. Report from the hackathon
    • what worked
    • what didn't work
    • subsequent work on the projects started at the hackathon
    • your comments
  2. Publicity - how to publicize our work (manuscripts, posters, others)
  3. discussion of the big picture. Whats the most that this group can do for interoperability (interoperability means that operations-- get, process, store, query, etc-- can be controlled and combined and integrated and automated in a hands-off way, without expert intervention, i.e., without doing things like manually editing files and manually operating a software interface).
    • what is the most we can do for phylogenetics-systematics-diversity studies in the next 2 to 5 years?
    • what is the most we can do for comparative-genomics and molecular-evolution studies in the next 2 to 5 years?
  4. Possible followups to consider:
    1. another hackathon
    2. another working group (NESCent deadlines July 10 and December 1)
    3. an NSF interop proposal (deadline July 23). Note
Responsibilities of the Networks: Each Network holds dual responsibilities for: (1) enabling broad community engagement in the development of consensus and agreement on strategies, priorities, and best approaches for achieving broad interoperability; and (2) providing the technical expertise necessary to turn consensus and agreement into robust interoperability frameworks along with the appropriate tools and resources for their broad use and implementation. Proposals for activities not based on significant community engagement and consensus-building activities are not responsive to this solicitation and will be returned without review.

Notes

  1. participants: Sam, Sheldon, Karen, Rutger, Jim, Karla, Greg, Mark, Ryan, Enrico, Todd, Arlin, Hilmar, Peter Midford
  2. Participants' feedback and evaluation of the hackathon: (A) hackathon, (B) followup activities, (C) incomplete or future plans
    • Sam: (A) useful java API for NeXML; meeting/working with people; (C) rough plans to incorporate api into pPOD
    • Sheldon: (A) pushing NeXML as standard; improvements in visualization and components integration; web services; (B) code clean-up; presentation to iPlant and NSF prog officers; starting phylowidget package w/Greg; (C) roll phylowidget/vis tools into TreeBASE frontend
    • Karen: (A) h'thon met need for more [specific, actionable] information about the standards; got a good start on a names-resolution component for phylota; made decisions concerning the allowable scope of queries; (C) needed more info on stds technologies; sub-group bit off "too much" technically (but came to understand the real scope of the names-resolution issue only by getting together with the h'thon participants).
    • Jim: (A) moving forward with nexml; started the key discussions leading to NeXML incorporation of metadata; (B) continued to flesh out the NeXML metadata annotation standard needed; now "have got it", using Java API with RDFa in phenex in test-research (not production) environment; (C) bring the api up to production grade, migrate to RDFa syntax;
      • "Didn't solve the issue at the h'thon, but formed a foundation for a solution" --Arlin
    • Karla: (A) diversity of the participants, coalescing around five different ideas, and producing results; (B,C) would like to try Open Space technology ideas within the iPlant collaborative;
      • Sheldon - interoperability concepts very important in the crafting of challenge grants.
    • Greg: (A,B) reiterates Sheldon's comments on PhyloWidget; (C) more progress needed in getting data 'out there' into these standard [or accessible?] formats: e.g., protein alignments/gene trees (i.e. Pfam)
      • Karen: possible action item for next hackathon?
    • Mark: (A) opportunity to get plugged in; important for professional development, particularly in learning and finding placing to contribute; (B) created XML schemas, NeXML formatting for LANL HIV data and was able to deliver data from mash-up (wants someone like LANL to bite, hasn't happened yet); (C) summer-of-code student (bioperl native modules to deal with nexml);
    • Ryan: (A) prof. networking opportunities; familiarization with standards; (B,C) continuing to hammer on PhyloWS, still some issues to resolve-- there was a need going in to build generalized architecture forced a focus on resolving phylows, h'thon provided the catalyst for this effort.
    • Enrico: (A) chance to apply CDAO practically; developed tools on the spot; clarification of metadata description; got perspective on what is missing in CDAO with respect to community needs (this was the most useful thing); (B) filling in CDAO gaps based on what was learned at the h'thon.
    • Peter: (A) resolved the "competition" between the two Java APIs through consensus and agreement; agreed on importance of representing metadata; (B) continuing to flesh out the API, including the metadata support; summer-of-code student, project: display of metadata in Mesquite (at level of the char data matrix interface [?]).
    • Rutger: (A) a great success; expanded Java API to include annotations; v. interested in summer-o-code project; (B) follow-ups include additions to Perl API; json API mapping nexml to javascript object notation.
      • Hilmar: RDFa standard important, developed at hackathon; important to make data accessible. In particular, need to make it a top priority to update the schema at www.nexml.org/nexml/1.0 to comply with the RDF/a-based standard. Two SoC projects depend on it.
    • Hilmar: (A) h'thon a "huge success", impressed by the diversity of people coalescing around a few shared objectives; acceptance of a RDF/a-compliant standard for representing phylogenetically rich metadata, which "de-silos" phylogenetic data, making it accessible to off-the-shelf tools and non-specialist brains.
    • Todd: (A) always a question of how much will the participants work together after the meeting-- This time, this aspect panned out very well.
    • Arlin: (A) we've hit on a successful formula: diverse group of people+open space approach, started out rough but it worked out. Reiterate Hilmar's point that we showed how to "de-silo" phylo data with semantics-based methods to make "insider" knowledge accessible to the computing world
  3. Publicity - how to publicize our work (manuscripts, posters, others)
    • documentation support for nexml; stabilize schema; evangelize;
    • nexml manuscript needed; bioinfo application notes?
    • Hilmar's Evolution meetings poster: basis for a paper?
      • Rutger: doc support is more needed now than code support in the drive for NeXML uptake; suggestions: push for TDWIG adoption, fill out and improve wiki, get out "mini-papers" and app notes. Probably requires a concerted effort (rather than an ad-hoc, free time one).
      • Hilmar: Now approaching a point where there is a standards narrative, rather than just the pieces of the puzzle. This group is at the forefront of this; the cohesion between the three legs (NeXML/CDAO/PhyloWS) makes the story--validatable syntax, rich computable semantics, and a consistent, predictably programmable interface. The technical groundwork is laid, some polishing is required, more complete doc necessary, and some compelling biological examples.
  4. Discussion of the big picture - improving interop in 1) phylo-systematics-diversity or 2) comparative-genomics-and-molevol
    • annotating trees, decorating trees is an important use-case in data integration
      • Karen: The 'big picture' in systematics/diversity: What do people in "the big projects" want to be able to do? Open up a tree from a web server/drive, add pictures, sequences, their own annotations to that tree. We are much closer to the glue to make that happen, but not ready to do that in a large scale way.
      • Hilmar - The "decorating trees" use-case is very, very significant; this is why all these processes need to be online and talking seamlessly to one another. In the future, it will be the data on the web will be significant, and not the "sites".
  5. A list of ideas or challenges that arose during the discussion
    • larger effort needed to pursue taxonomic resolution service
    • getting data into accessible formats
    • C or C++ interface to nexml, natively or via swig
    • importance of validating metadata-containing files
    • polishing and documenting examples, using this as basis for interop strategy presentation
  6. Opportunities:
    • Another working group: (see draft proposal Son of Evoinfo) July 10 NESCent proposal deadline, is doable, but need another leader, as Arlin needs to move out of the leadership role.
    • NSF INTEROP grant: July 23 deadline, $250K/3 yr, supporting a network of researchers. Requires a community cohesion component-- needs a systematic effort to promote and penetrate, bringing others in. Proposal due within next 30 days, difficult but possible, may be the last opportunity to apply. Discuss further with others who may be interested.
  7. Next steps: Arlin will arrange a follow-up telecon to this one.

Summary : Now have feedback now on hackathon: we hit on successful formula, people are happy about what happened, and are continuing to be happy.

organizer follow-up to telecon

The organizers (Todd, Hilmar, Rutger, Arlin) talked for 10 minutes after the telecon in order to make a plan to proceed.

To do:

  1. for NSF interop
    • its important to move quickly (e.g., requests for letters of support should go out very soon)
    • start wiki page for NSF proposal (Arlin)
    • start list of collaborators
  2. for a possible follow-up hackathon
    • Todd will ask whether NESCent has funds for hackathon on the scale of $20 K (i.e., 15 to 18 people instead of 25)
  3. for a possible NESCent working group

Teleconference Feb 26, 2009

Agenda

Thursday, February 26, 2009, at 1pm Eastern US. You should have received a phone number and access code from Hilmar.

Agenda:

  1. Report from pre-meeting
  2. Introduction to participating standards (Arlin)
  3. Information gathering
  4. Use case gathering (Dave Clements)
  5. Setting the agenda for the hackathon
    • Activities and plan for first day
    • Development targets
    • General routine for days 2-5
    • Wrap-up on last day
    • Whole-group brainstorming about future activities
    • Brainstorming MIAPA
  6. Participant questions

Notes

  1. Introductions
  • present: Sheldon M., Matt Y., Rutger V., Mark J., Hilmar L., Vivek G., Ryan S., Karla G., Roger H., Todd V., Karen C., Greg J., Arlin S. (recording), Dave C.
  1. Pre-meeting report (Rutger)
    • telecon
    • use cases
      • need to gather information on inputs and outputs from participants
      • developed form for participants to fill out
    • PhyloWS, using SRU syntax (RESTful)
    • Combine CDAO with nexml to represent metadata
      • how to attach a GO term, taxon identifier, specimen-collection info, phenotype information
      • corrected mistakes
    • questions
      • use of nexml-CDAO
      • how to use ontology? don't need it to parse file, only to do reasoning
      • syntax for expressing statements, e.g., RDF triples
      • should be explored further how to express semantics in nexml
  2. Information gathering (Arlin)
    • coding (Mark)
    • use cases (Dave)
  3. Agenda for hackathon (Hilmar)
    • principles: self-organize; do its quickly; match interests with projects so as to get energy & commitment
    • need for people to communicate, get involved, in order for self-organization to work
    • more discussion of use case list
      • ok to add uses at any level of completeness or technical detail
      • 'targets' page also has space for ideas
    • mixer activity
    • supporting MIAPA (not important?)
  4. Open question period
    • bootcamps: nexml, cdao, syntax
  5. organizer follow-up
    • need pedagogic materials on
      • syntax and semantics
      • bootcamp about reasoning
      • example xml file with data and ontology links, used for reasoning example
        • tree-taxonomy correspondence (Rutger)
      • web services
      • integration (mash-up)
      • cdao
      • nexml

Teleconference Feb 20, 2009

Agenda

Friday, February 20, 2009, at 3pm Eastern US. You should have received a phone number and access code from Hilmar.

Agenda:

  1. Welcome & Kick-off (Arlin)
  2. Introductions (all)
  3. Roadmap until event (Hilmar)
    • Teleconferences
    • Pre-meeting
    • Information gathering
  4. Introduction to participating standards
  5. Taking inventory (technical, semantics, purpose)
    • Vignette about the "network" (Arlin)
    • Spreadsheet requesting information input (Arlin)
  6. Use case gathering (Dave Clements)
  7. Participant questions

Minutes

PARTICIPANTS
  • Brandon Chisham, NMSU, CDAO Project
  • Jim Balhoff, NESCent, Phenoscape
  • Enrico Pontelli, NMSU, CDAO Project
  • Ryan Scherle, NESCent, Dryad
  • Rugter Vos, University of British Columbia, NeXML
  • Hilmar Lapp, NESCent, Co-organizer, designer PhyloWS
  • Arlin Stoltzfus, NIST, Co-organizer, CDAO project
  • Jeet Sukumaran, University of Kansas, NEXUS
  • Peter Midford, University of Kansas, Mesquite
  • Karla Gendler, University of Arizona, iPlant
  • Sam Donnelly, U. Pennsylvania, pPOD
  • Sheldon McKay, modENCODE, iPlant
  • Mark Jensen, Fortinbras Research, clinical analysis of sequence data from pathogen
  • Bill Piel, Yale, TreeBASE, iPlant
  • Lucie Chan, San Diego Supercomputing, MorphoBank
  • Vivek Gopalan, NCBI
WELCOME MESSAGE (Arlin Stoltzfus)
  • Agenda has been sent out
  • Notes will be sent out after the call
INTRODUCTIONS (All Participants)
ROADMAP (Hilmar Lapp)
  • Kickoff teleconference,
  • Overview standards
  • Need to gather some information, important activity over the next 2 weeks
    • Data providers?
    • Use cases to guide?
  • Dave Clements has a page developed with Karla
  • Premeeting is taking place to prepare standards for the hackaton
    • More messages over weekend as the premeeting develops
  • 2 more telecons; next one does not need to be a replication, instead discuss more technical issues and gather info for use cases (same for third)
  • MORNING time for next one (for UK folks)
  • QUESTIONS/SUGGESTIONS?
    • No questions
STANDARDS
  • 3 technologies (phyloWS, NEXML, CDAO); all are outcomes of the evoinfo working group
  • Working Group for 2 years, started to address interoperability issues; started with brainstorming for ideas (e.g., integrated data resource); we settled on specific technologies to facilitate interop. One data standard, one ontology, one interface for web services. Hackaton is the last meeting of the working group. Thus, this is the time and place to put technology to the test.
  • NEXML: (Rutger Vos)
    • New XML standard, inspired by the NEXUS format; lots of applications use it; many data resource also use it (as data input or as serialization format)
    • NEXUS has issues, dialects, incompatibilites; we want a new standard, formally developed and that can be validated.
    • There is a NEXML.org website. It contains the XML schema, some I/O libraries (java, python, javascript, c++ in still in progress); on the other hand, there does not seem to be a strong interest towards C++.
    • It sounds like a useful technology, more reliable exchange of data, we can use it for data exchange for web services; some advantages over previous standards.
    • QUESTIONS?
      • What is the current level of support? There are some libraries provided; Perl, included in BioPerl, thus PioPerl supports it; Java is used by Mesquite; Phenoscape uses its own; Jeet is working on a library

for Python.

  • CDAO (Arlin Stoltzfus)
    • Ontology that addresses the application area of comparative data analysis; implemented in OWL
    • OWL offers good control and formal structuring for the ontology
    • CDAO formalizes knowledge/semantics; it is useful for interoperability, to resolve ambiguities using semantics; For example, the Sequence Ontology has been used with similar objectives in the case of sequence data. Different sequence databases use Gene Feature Format (GFF) but with focus on syntax; this led to incompatible definitions of certain terms (e.g., open reading frame, in some instances it is viewed as including a stop codon, in other instances it does not; the Sequence Ontology enabled to clarify this ambiguity by creating two separate concepts).
    • Similar benefits can be gained in phylogenetic analysis: for example, in the problem of tree reconciliation. There are many tools, each imposing different requirements on the input tree (e.g., completely resolved or not). These distinctions on the inputs are often semantical, not based on syntax.
    • A formal ontology allows also access to reasoners, that can be used for validation of concepts
    • Note, that formal ontologies are meant to be machine understandable, not necessarily to be used manually.
    • QUESTIONS?
      • Are there tools to generate it? Are there tools to formalize description of an analysis? Yes, there are formalisms to formally describe a biomedical analysis or protocol, and they can be instantiated using a domain specific ontology. This is the case of OBI or FUGO (as general ontologies for describing protocols) and BioMoby (as a domain specific ontology)
      • Comment on workflow languages: there are systems that support phylogenetic workflows; in Kepler there are mechanisms to introduce annotations (e.g., on the inputs and outputs) and these will be used to type check the workflow. But they are not widely used.
      • I am new to all these ontologies; how does one connect different ontologies together into the same application? That can be done, ontologies allow to import other ontologies. CDAO includes an external ontology for amino acids and enables external ontology to describe different types of characters;
  • PhyloWS (Hilmar Lapp)
    • It is the youngest of the three standards; one year old
    • Developed at the Biohackaton in Japan
    • Focused on web services
    • Obstacle: rich diversity of data resources (digital ones) accessible online, yet, designed for human consumers; the medatadata could be valuable but not machine accessible
    • Some people are forced to do complex task to extract knowledge (e.g., HTML screenscraping)
    • There is a lack of programmable interfaces, and this is an obstacle to interoperability
    • A programmable interface is aimed at Predictability and Interpretability, and these two aspects builds on the two previously proposed standards (NEXML and CDAO)
    • Predictability: how to access data holdings, search data holdings, query interfaces, how to access individual items and resources (e.g., one tree in TreeBASE, one alignment in an Alignment database) and how are these data returned. NEXML provides a solution to some of these issues by offering a standard interchange format.
    • Interpretability: how do I use the data returned? What is the meaning? CDAO represents a solution to this aspect.
    • If all these online data resources implement a standard web interface, these tasks become easy, it is simple to write widgets to embed in other web pages or applications, or create large systems (e.g., in Kepler, Mesquite) that can pull data from resources and they know what to do with them.
    • QUESTIONS?
      • Is PhyloWS implemented? Or is it can be implemented but something is missing? Yes and no. First of all, it is partially implemented, there is a prototype for Tree of Life; you can, through Phylows and a REST interface, obtain ToL trees. However there are parts of the specification that need to be fleshed out (and we will work on this at the premeeting)
INVENTORY (Arlin Stoltzfus)
  • We would like to think about possibilities and prelude to data collection and capabilities collection
  • Putting together data standards and web services, we can connect data resources that are now disconnected. For example, TreeBASE may want to pull in other data, if there are semantic mappings between schemas it becomes possible, possibly through web services, with data transmitted in NEXML. Or we can described web services to provide access to treefam data sets from EMBL. Or enable existing tools to access sophisticated data matrix viewers, like mx or nexplorer, just by producing data in some standard format. We can integrate resources; e.g., Rutger used ToL queries and then went to TimeTree to get dates for trees (this is an interactive user interface), and a service combines and integrate them.
  • We need to think about this; we need an inventory of input and output supported by different data resources (represented by you, participant). We want to create a network, where nodes are data resources and links are shared data types. If you export a character matrix and someone imports a character matrix, there there is a potential link in the graph and an opportunity for interoperation. The links are possible, but it may be theoretical and not practical (there may be format compatibility issues, lack of a robust interface). We want to propose solution for this. Please help us to create this graph.
  • After the telecon we will create a form or a shared spreadsheet on googledocs, and we will summarize that in a graph. Your data will become a part of a network of data resources. You will get this after the telecon and we invite you to fill the required information.
  • Another thing we want to hear is about use cases or wish lists. We have a use case wiki and Dave Clements and Kara Gendler have set up a template to fill in. Please suggest use cases that would shape the hackathon to be more concrete.
GENERAL QUESTIONS
  • I am new at this; is it the focus on trees? Or data repositories? Highlevel structure of tree? What is the granularity? The focus is on evolutionary data, more specifically phylogenetic data; it includes trees, taxonomies, species taxonomies, character state matrixes, discrete or continuous characters, sequence alignments, maybe transition models. In the wider context, this is only part of the picture; metadata are also important and can be linked to nodes of the tree, ranging from gene functional data, gene locations, biodiversity data. The focus is on making phylogentic data available as standards in order for outside users to access these metadata. Are we going to worry about how to enable linking phylogenetic data to other data or vice versa? Is this too further away from the hackathon? Maybe some of these are down the road, but in they are in the realm of workflows, and they are in the scope.
  • What will the event look like? There will be lots of room for creativity, not assignments, people get together and use resources according to their interest. Try to be focused on certain activities, more profitable to you and the people you group with.
  • I have not been there before: what is the typical day and how does it change over the period? The typical day will include programming. Working on a specific task that is determined to be worthwhile by a group. People at the event makes tasks feasible. Self organizing and self emerging. We will try to have lots of conversation on the mailing list before the event, but this is only to coordinate and gather information. We will not be telling people what to do. Subgroups will emerge and assume charges. The first morning will be devoted to forming the groups. Some people will be responsible for documentation. We may have some bootcamps. We will try to sense which ones from the mailing list discussion. This is not a workshop where people get up and talk and brainstorm. It is very different. Some of these goals will form over the next two weeks. You may want to start thinking who are the participatns you want to work with. Note that the wiki is open to everyone to edit, just request an account. Send email to help{at}nescent.org to request an account on evoinfo wiki. There is also an online tutorial for using wikis. We will put the link somewhere.