NeXML/Perl

From Evolutionary Informatics Working Group
Jump to: navigation, search

Create tree and serialize to NeXML

This example assumes that you've figured out how to SSH into dbhack1.nescent.org, as per the instructions Hilmar sent out. For the code examples below, it turns out things work a lot better if you copy them from the editor window (i.e. click the edit links on the right, then copy from the textarea) as opposed to copying them from the browser view (which seems to add invisible characters).

Install the prerequisites

  • SSH into your dbhack1.nescent.org.
  • Make a directory where you want to keep the prerequisites, e.g. mkdir perllib
  • Go into the prerequisites directory, e.g. cd perllib
  • Open the nano editor, e.g. nano setup.sh
  • Paste the following shell script into the nano window:

<bash> wget http://search.cpan.org/CPAN/authors/id/M/MI/MIROD/XML-Twig-3.32.tar.gz gunzip XML-Twig-3.32.tar.gz tar -xvf XML-Twig-3.32.tar cd XML-Twig-3.32 perl Makefile.PL -n make cd - XMLTWIG=`pwd`/XML-Twig-3.32/blib/lib export PERL5LIB=$PERL5LIB:$XMLTWIG svn co https://nexml.svn.sourceforge.net/svnroot/nexml/trunk/nexml/perl perl BIOPHYLO=`pwd`/perl/lib export PERL5LIB=$PERL5LIB:$BIOPHYLO echo "### IF THERE ARE NO ERROR MESSAGES BELOW THIS LINE ###" perl -I/home/Vos/perllib/XML-Twig-3.32/blib/lib -MXML::Twig -e 1 perl -I/home/Vos/perllib/perl/blib/lib -MBio::Phylo -e 1 echo "### ADD THE FOLLOWING TO THE TOP OF YOUR SCRIPTS ###" echo "use lib '$XMLTWIG';" echo "use lib '$BIOPHYLO';" </bash>

  • Close the nano editor, e.g. Ctrl+x and y to save changes.
  • Execute the shell script, e.g. sh setup.sh
  • The last few lines should look somewhat like this:
### IF THERE ARE NO ERROR MESSAGES BELOW THIS LINE ###
###  ADD THE FOLLOWING TO THE TOP OF YOUR SCRIPTS  ###
use lib '/home/Vos/perllib/XML-Twig-3.32/blib/lib';
use lib '/home/Vos/perllib/perl/lib';
  • The last two lines will be different in your case. Use your output, not the example shown here!

Write out a tree

In this section we create a simple script that creates a tree de novo and prints it to standard out as NeXML.

  • Now open nano to create a perl script, e.g. nano example.pl
  • Paste the following into the nano window, (correct the "use lib" lines!):

<perl>

  1. These two lib paths are printed to standard
  2. out by the setup.sh script, so correct these
  3. to your locations

use lib '/home/Vos/perllib/XML-Twig-3.32/blib/lib'; use lib '/home/Vos/perllib/perl/lib';

  1. We only need the object factory

use Bio::Phylo::Factory;

  1. below is a simple example of a data structure
  2. that can be traversed as a tree. I'm sure you
  3. have your own, similar structures (e.g. an array
  4. of database records)
  5. This is the tree shape: ((A,B)n1,C)n2;

my %parent_of = ( 'A' => 'n1', 'B' => 'n1', 'n1' => 'n2', 'C' => 'n2', );

  1. the factory object creates other objects.
  2. this is useful because now you don't have
  3. to 'use' all these other classes. If you
  4. want a node/tree/forest/taxon/taxa/project
  5. etc, just call $fac->create_node and so on

my $fac = Bio::Phylo::Factory->new;

  1. the project object corresponds to a nexml
  2. document (or a nexus document)

my $proj = $fac->create_project;

  1. a single tree

my $tree = $fac->create_tree;

  1. a set of trees

my $forest = $fac->create_forest;

  1. the tree goes into the set

$forest->insert( $tree );

  1. the set goes into the project

$proj->insert( $forest );

  1. here I'm traversing the data structure,
  2. and instantiate node objects as needed

my %obj_by_name; for my $node_name ( keys %parent_of ) {

# value of the hash is the parent node # label in this case my $parent_name = $parent_of{$node_name};

# making sure that the child node object # exists and is part of the tree if ( not exists $obj_by_name{$node_name} ) { $obj_by_name{$node_name} = $fac->create_node( -name => $node_name ); $tree->insert( $obj_by_name{$node_name} ); }

# making sure that the child node object # exists and is part of the tree if ( not exists $obj_by_name{$parent_name} ) { $obj_by_name{$parent_name} = $fac->create_node( -name => $parent_name ); $tree->insert( $obj_by_name{$parent_name} ); }

# now that we're sure the both exist, # we can connect the two nodes $obj_by_name{$node_name}->set_parent( $obj_by_name{$parent_name} ); }

  1. now write out to xml

print $proj->to_xml; </perl>

  • Save and close the file, e.g. Ctrl-x and y to confirm save.
  • Execute the file. You should see an output like this:

<xml> <nex:nexml

  generator="Bio::Phylo::Project v.0.17_RC9_841"
  version="0.8"
  xmlns="http://www.nexml.org/1.0"
  xmlns:nex="http://www.nexml.org/1.0"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns"
  xmlns:xml="http://www.w3.org/XML/1998/namespace"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.nexml.org/1.0 http://www.nexml.org/1.0/nexml.xsd">
 <otus id="otus9">
   <otu id="otu10" label="A"/>
   <otu id="otu12" label="B"/>
   <otu id="otu11" label="C"/>
 </otus>
 <trees id="trees3" otus="otus9">
   <tree id="tree2" xsi:type="nex:IntTree">
     <node id="node7" label="n2" root="true"/>
     <node id="node6" label="C" otu="otu11"/>
     <node id="node5" label="n1"/>
     <node id="node4" label="A" otu="otu10"/>
     <node id="node8" label="B" otu="otu12"/>
     <edge id="edge6" source="node7" target="node6"/>
     <edge id="edge5" source="node7" target="node5"/>
     <edge id="edge4" source="node5" target="node4"/>
     <edge id="edge8" source="node5" target="node8"/>
   </tree>
 </trees>

</nex:nexml> </xml>

Now hack into this

For exhaustive documentation on the Bio::Phylo API, read the pages on cpan. These are more user-friendly than the local pod because they also document all the inherited methods.