NeXML/Perl
Contents
Create tree and serialize to NeXML
This example assumes that you've figured out how to SSH into dbhack1.nescent.org, as per the instructions Hilmar sent out.
Install the prerequisites
- SSH into your dbhack1.nescent.org.
- Make a directory where you want to keep the prerequisites, e.g.
mkdir perllib
. - Open the nano editor, e.g.
nano setup.sh
- Paste the following shell script into the nano window:
<bash> wget http://search.cpan.org/CPAN/authors/id/M/MI/MIROD/XML-Twig-3.32.tar.gz gunzip XML-Twig-3.32.tar.gz tar -xvf XML-Twig-3.32.tar cd XML-Twig-3.32 perl Makefile.PL -n make cd - XMLTWIG=`pwd`/XML-Twig-3.32/blib/lib export PERL5LIB=$PERL5LIB:$XMLTWIG svn co https://nexml.svn.sourceforge.net/svnroot/nexml/trunk/nexml/perl perl BIOPHYLO=`pwd`/perl/lib export PERL5LIB=$PERL5LIB:$BIOPHYLO echo "### IF THERE ARE NO ERROR MESSAGES BELOW THIS LINE ###" perl -I/home/Vos/perllib/XML-Twig-3.32/blib/lib -MXML::Twig -e 1 perl -I/home/Vos/perllib/perl/blib/lib -MBio::Phylo -e 1 echo "### ADD THE FOLLOWING TO THE TOP OF YOUR SCRIPTS ###" echo "use lib '$XMLTWIG';" echo "use lib '$BIOPHYLO';" </bash>
- Close the nano editor, e.g.
Ctrl+x
andy
to save changes. - Execute the shell script, e.g.
sh setup.sh
- The last few lines should look somewhat like this:
### IF THERE ARE NO ERROR MESSAGES BELOW THIS LINE ### ### ADD THE FOLLOWING TO THE TOP OF YOUR SCRIPTS ### use lib '/home/Vos/perllib/XML-Twig-3.32/blib/lib'; use lib '/home/Vos/perllib/perl/lib';
- The last two lines will be different in your case. Use your output, not the example shown here!
Write out a tree
In this section we create a simple script that creates a tree de novo and prints it to standard out as NeXML.
- Now open nano to create a perl script, e.g.
nano example.pl
- Paste the following into the nano window, (correct the "use lib" lines!):
<perl>
- These two lib paths are printed to standard
- out by the setup.sh script, so correct these
- to your locations
use lib '/home/Vos/perllib/XML-Twig-3.32/blib/lib'; use lib '/home/Vos/perllib/perl/lib';
- We only need the object factory
use Bio::Phylo::Factory;
- below is a simple example of a data structure
- that can be traversed as a tree. I'm sure you
- have your own, similar structures (e.g. an array
- of database records)
- This is the tree shape: ((A,B)n1,C)n2;
my %parent_of = ( 'A' => 'n1', 'B' => 'n1', 'n1' => 'n2', 'C' => 'n2', );
- the factory object creates other objects.
- this is useful because now you don't have
- to 'use' all these other classes. If you
- want a node/tree/forest/taxon/taxa/project
- etc, just call $fac->create_node and so on
my $fac = Bio::Phylo::Factory->new;
- the project object corresponds to a nexml
- document (or a nexus document)
my $proj = $fac->create_project;
- a single tree
my $tree = $fac->create_tree;
- a set of trees
my $forest = $fac->create_forest;
- the tree goes into the set
$forest->insert( $tree );
- the set goes into the project
$proj->insert( $forest );
- here I'm traversing the data structure,
- and instantiate node objects as needed
my %obj_by_name; for my $node_name ( keys %parent_of ) {
# value of the hash is the parent node # label in this case my $parent_name = $parent_of{$node_name};
# making sure that the child node object # exists and is part of the tree if ( not exists $obj_by_name{$node_name} ) { $obj_by_name{$node_name} = $fac->create_node( -name => $node_name ); $tree->insert( $obj_by_name{$node_name} ); }
# making sure that the child node object # exists and is part of the tree if ( not exists $obj_by_name{$parent_name} ) { $obj_by_name{$parent_name} = $fac->create_node( -name => $parent_name ); $tree->insert( $obj_by_name{$parent_name} ); }
# now that we're sure the both exist, # we can connect the two nodes $obj_by_name{$node_name}->set_parent( $obj_by_name{$parent_name} ); }
- now write out to xml
print $proj->to_xml; </perl>
- Save and close the file, e.g.
Ctrl-x
andy
to confirm save. - Execute the file. You should see an output like this:
<xml> <nex:nexml
generator="Bio::Phylo::Project v.0.17_RC9_841" version="0.8" xmlns="http://www.nexml.org/1.0" xmlns:nex="http://www.nexml.org/1.0" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns" xmlns:xml="http://www.w3.org/XML/1998/namespace" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.nexml.org/1.0 http://www.nexml.org/1.0/nexml.xsd"> <otus id="otus9"> <otu id="otu10" label="A"/> <otu id="otu12" label="B"/> <otu id="otu11" label="C"/> </otus> <trees id="trees3" otus="otus9"> <tree id="tree2" xsi:type="nex:IntTree"> <node id="node7" label="n2" root="true"/> <node id="node6" label="C" otu="otu11"/> <node id="node5" label="n1"/> <node id="node4" label="A" otu="otu10"/> <node id="node8" label="B" otu="otu12"/> <edge id="edge6" source="node7" target="node6"/> <edge id="edge5" source="node7" target="node5"/> <edge id="edge4" source="node5" target="node4"/> <edge id="edge8" source="node5" target="node8"/> </tree> </trees>
</nex:nexml> </xml>
Now hack into this
For exhaustive documentation on the Bio::Phylo API, read the pages on cpan. These are more user-friendly than the local pod because they also document all the inherited methods.