For computer programs to exchange data transparently requires standardized serialization schemes. Traditionally this means having a standard file format for data exchange. Current standards include NEXUS, MEGA, and PHYLIP. Supporting such standards may include such things as clarifying or extending an existing standard, providing users and developers with software tools to use the standard, providing conversion between formats, and ensuring an upward conversion path to the next standard.
Defining the problem
A variety of file formats are in use, including NEXUS, MEGA, and PHYLIP. These and several others are described on the [[Molecular Evolution Workshop]] page at Woods Hole.
One way of thinking about how well these standards are supported by software is the extent to which the format can be validated, and the extent to which the choice of format is an irrelevant, transparent issue for the user who wishes to view, search, query, store, edit or analyze data.
Consider the simplest format, FASTA. Obviously we can view the data stored in a FASTA file using a text browser, and this only becomes problematic for extremely large sets of data. But what about anything else? How many FASTA file readers place an arbitrary limit on the definition lines? What about support for search, query, and edit operations? Even for a simple FASTA file, the possibility of arbitrary line breaks means that when we search for the sequence GAATTC using a text editor, we may fail to find it because the pattern is wrapped from one line to the next. If we cannot search like this, we cannot do automatic editing.
The problem is much worse for the other file formats, which are more complex. Some of the problems surrounding NEXUS (Maddison, et al., 1997) are described in the Supporting NEXUS report from NESCent's recent phyloinformatics hack-a-thon.
Goals for the working group
(specific goals for this topic)
Strategy for achieving goal
(be sure to include specific deliverables or milestones)