NeXML/Perl

From Evolutionary Informatics Working Group
Revision as of 23:42, 10 March 2009 by Rvos@interchange.ubc.ca (talk) (New page: = Create tree and serialize to NeXML = This example assumes that you've figured out how to SSH into dbhack1.nescent.org, as per the instructions Hilmar sent out. == Install the prerequisit...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Create tree and serialize to NeXML

This example assumes that you've figured out how to SSH into dbhack1.nescent.org, as per the instructions Hilmar sent out.

Install the prerequisites

  • SSH into your dbhack1.nescent.org.
  • Make a directory where you want to keep the prerequisites, e.g. mkdir perllib.
  • Open the nano editor, e.g. nano setup.sh
  • Paste the following shell script into the nano window:

<bash> wget http://search.cpan.org/CPAN/authors/id/M/MI/MIROD/XML-Twig-3.32.tar.gz gunzip XML-Twig-3.32.tar.gz tar -xvf XML-Twig-3.32.tar cd XML-Twig-3.32 perl Makefile.PL -n make cd - XMLTWIG=`pwd`/XML-Twig-3.32/blib/lib export PERL5LIB=$PERL5LIB:$XMLTWIG svn co https://nexml.svn.sourceforge.net/svnroot/nexml/trunk/nexml/perl perl BIOPHYLO=`pwd`/perl/lib export PERL5LIB=$PERL5LIB:$BIOPHYLO echo "### IF THERE ARE NO ERROR MESSAGES BELOW THIS LINE ###" perl -I/home/Vos/perllib/XML-Twig-3.32/blib/lib -MXML::Twig -e 1 perl -I/home/Vos/perllib/perl/blib/lib -MBio::Phylo -e 1 echo "### ADD THE FOLLOWING TO THE TOP OF YOUR SCRIPTS ###" echo "use lib '$XMLTWIG';" echo "use lib '$BIOPHYLO';" </bash>

  • Close the nano editor, e.g. Ctrl+x and y to save changes.
  • Execute the shell script, e.g. sh setup.sh
  • The last few lines should look somewhat like this:
### IF THERE ARE NO ERROR MESSAGES BELOW THIS LINE ###
###  ADD THE FOLLOWING TO THE TOP OF YOUR SCRIPTS  ###
use lib '/home/Vos/perllib/XML-Twig-3.32/blib/lib';
use lib '/home/Vos/perllib/perl/lib';
  • The last two lines will be different in your case. Use your output, not the example shown here!

Write out a tree

In this section we create a simple script that creates a tree de novo and prints it to standard out as NeXML.

  • Now open nano to create a perl script, e.g. nano example.pl
  • Paste the following into the nano window, (correct the "use lib" lines!):

<perl>

  1. These two lib paths are printed to standard
  2. out by the setup.sh script, so correct these
  3. to your locations

use lib '/home/Vos/perllib/XML-Twig-3.32/blib/lib'; use lib '/home/Vos/perllib/perl/lib';

  1. We only need the object factory

use Bio::Phylo::Factory;

  1. below is a simple example of a data structure
  2. that can be traversed as a tree. I'm sure you
  3. have your own, similar structures (e.g. an array
  4. of database records)
  5. This is the tree shape: ((A,B)n1,C)n2;

my %parent_of = ( 'A' => 'n1', 'B' => 'n1', 'n1' => 'n2', 'C' => 'n2', );

  1. the factory object creates other objects.
  2. this is useful because now you don't have
  3. to 'use' all these other classes. If you
  4. want a node/tree/forest/taxon/taxa/project
  5. etc, just call $fac->create_node and so on

my $fac = Bio::Phylo::Factory->new;

  1. the project object corresponds to a nexml
  2. document (or a nexus document)

my $proj = $fac->create_project;

  1. a single tree

my $tree = $fac->create_tree;

  1. a set of trees

my $forest = $fac->create_forest;

  1. the tree goes into the set

$forest->insert( $tree );

  1. the set goes into the project

$proj->insert( $forest );

  1. here I'm traversing the data structure,
  2. and instantiate node objects as needed

my %obj_by_name; for my $node_name ( keys %parent_of ) {

# value of the hash is the parent node # label in this case my $parent_name = $parent_of{$node_name};

# making sure that the child node object # exists and is part of the tree if ( not exists $obj_by_name{$node_name} ) { $obj_by_name{$node_name} = $fac->create_node( -name => $node_name ); $tree->insert( $obj_by_name{$node_name} ); }

# making sure that the child node object # exists and is part of the tree if ( not exists $obj_by_name{$parent_name} ) { $obj_by_name{$parent_name} = $fac->create_node( -name => $parent_name ); $tree->insert( $obj_by_name{$parent_name} ); }

# now that we're sure the both exist, # we can connect the two nodes $obj_by_name{$node_name}->set_parent( $obj_by_name{$parent_name} ); }

  1. now write out to xml

print $proj->to_xml; </perl>

  • Save and close the file, e.g. Ctrl-x and y to confirm save.
  • Execute the file. You should see an output like this:

<xml> <nex:nexml

  generator="Bio::Phylo::Project v.0.17_RC9_841" 
  version="0.8" 
  xmlns="http://www.nexml.org/1.0" 
  xmlns:nex="http://www.nexml.org/1.0" 
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns" 
  xmlns:xml="http://www.w3.org/XML/1998/namespace" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xsi:schemaLocation="http://www.nexml.org/1.0 http://www.nexml.org/1.0/nexml.xsd">
 <otus id="otus9">
   <otu id="otu10" label="A"/>
   <otu id="otu12" label="B"/>
   <otu id="otu11" label="C"/>
 </otus>
 <trees id="trees3" otus="otus9">
   <tree id="tree2" xsi:type="nex:IntTree">
     <node id="node7" label="n2" root="true"/>
     <node id="node6" label="C" otu="otu11"/>
     <node id="node5" label="n1"/>
     <node id="node4" label="A" otu="otu10"/>
     <node id="node8" label="B" otu="otu12"/>
     <edge id="edge6" source="node7" target="node6"/>
     <edge id="edge5" source="node7" target="node5"/>
     <edge id="edge4" source="node5" target="node4"/>
     <edge id="edge8" source="node5" target="node8"/>
   </tree>
 </trees>

</nex:nexml> </xml>

Now hack into this

For exhaustive documentation on the Bio::Phylo API, read the pages on cpan. These are more user-friendly than the local pod because they also document all the inherited methods.