Topological sorting of JavaScript files

Posted by John Kleijn • Wednesday, December 29. 2010 • Category: PHP

Recently I'm doing a lot of ExtJS work. In complex JS app developement you'll quickly have a LOT of files to link to your HTML document. You'll want to combine and compress these files for production and link them sperate and uncompressed for testing.

The simplest solution is just listing the files in a configuration file. But managing this list is a PITA. We're developers, not clerks. So I decided this list should be auto-generated and properly sorted. Scanning the filesystem for javascript files is trivial, and I quickly decided the sort order should be determined by examining the contents, looking for class names. All files contain classes, with a JSDoc declaration of @class app.module.ClassName, so it's not hard to establish which file depends on which.

The real issue is sorting. Before long I came to the conclusion that there's no out of the box solution for the type of sorting I needed, all sort functions use QuickSort, even the ones using UDFs. On WikiPedia, I found that what I needed "resolving dependencies into an ordered list" could be accomplished using a different algorithm called "Topological sorting".

WikiPedia describes one implementation in pseudo code:

L ← Empty list that will contain the sorted elements

S ← Set of all nodes with no incoming edges

while S is non-empty do

remove a node n from S

insert n into L

for each node m with an edge e from n to m do

remove edge e from the graph

if m has no other incoming edges then

insert m into S

if graph has edges then

output error message (graph has at least one cycle)

else

output message (proposed topologically sorted order: L)

I decided to create a PHP implementation of this algorithm.


Because the implementation is already mostly known, I could not "test first" (though technically I could've), so I started with the minimum to create a graph: a node strucure, I created a simple node class:


<?php
namespace libname\util;

use SplObjectStorage;

class TopSortNode
{
        private $_value;
        private $_name;
        private $_incoming;
        private $_outgoing;

        final public function __construct($value, $name = null)
        {
                $this->_value = $value;
                $this->_name = $name ?: $value;
                $this->_incoming = new SplObjectStorage;
                $this->_outgoing = new SplObjectStorage;
        }

        public function getName()
        {
                return $this->_name;
        }

        public function setName($name)
        {
                $this->_name = $name;
                return $this;
        }

        public function setValue($value)
        {
                $this->_value = $value;
                return $this;
        }
       
        public function getValue()
        {
                return $this->_value;
        }

        public function removeLink(TopSortNode $node)
        {
                $this->_outgoing->detach($node);
                $node->_incoming->detach($this);
                return $this;
        }

        public function addLink(TopSortNode $node)
        {
                $this->_outgoing->attach($node);
                $node->_incoming->attach($this);
                return $this;
        }

        public function getNodesLinked()
        {
                return $this->_incoming;
        }

        public function getLinkedNodes()
        {
                return $this->_outgoing;
        }

        public function isOrphan()
        {
                return !$this->_incoming->count();
        }
}
 

I then created a test to satisfy:


<?php
namespace tests\util;

use libname\util\TopSorter, libname\util\TopSortNode;

class TopSortTest extends \PHPUnit_Framework_TestCase
{
        /**
         * Test a small sort operation
         */

        public function testSmallSort()
        {
                /**
                 * Create nodes with names and values
                 */

                $a = new TopSortNode('a');
                $b = new TopSortNode('b');
                $c = new TopSortNode('c');
                $d = new TopSortNode('d');

                $sorter = new TopSorter();

                /**
                 * Register the nodes for sorting
                 */

                $sorter
                        ->register($a)
                        ->register($b)
                        ->register($c)
                        ->register($d);

                /**
                 * Say $a depends on $b
                 */

                $a->addLink($b);

                /**
                 * $b depends on $c
                 */

                $b->addLink($c);

                /**
                 * And $d depends on $a
                 */

                $d->addLink($a);

                /**
                 * Then:
                 *
                 *  - b should be before a,
                 *  - c should be before b,
                 *  - a should be before d
                 */

                $sorted = $sorter->getValues();

                $this->assertTrue(array_search('b', $sorted) < array_search('a', $sorted));
                $this->assertTrue(array_search('c', $sorted) < array_search('b', $sorted));
                $this->assertTrue(array_search('a', $sorted) < array_search('d', $sorted));
        }
}
 

To pass this test I had to create a sorter which implemented the algorithm.


<?php
namespace libname\util;

use SplObjectStorage;

class TopSorter
{
        private $_nodes = array();

        /**
         * @return array
         */

        public function getNodes()
        {
                return $this->_nodes;
        }

        /**
         * Register a node
         *
         * @param TopSortNode $node
         * @return TopSorter
         */

        public function register(TopSortNode $node)
        {
                $this->_nodes[$node->getName()] = $node;
                return $this;
        }

        /**
         * Fetch/lazy create a node
         *
         * @param mixed $value
         * @param scalar $name
         * @return TopSortNode
         */

        public function getNode($value, $name = null)
        {
                if(!isset($this->_nodes[$name ?: $value])){
                        $this->register(new TopSortNode($value, $name));
                }
                return $this->_nodes[$name ?: $value];
        }

        /**
         * Gets a sorted array of node values
         *
         * @see http://en.wikipedia.org/wiki/topological_sorting
         *
         * @return array
         */

        public function getValues()
        {
                /**
                 * L ← Empty list that will contain the sorted elements
                 */

                $values = array();

                /**
                 * S ← Set of all nodes with no incoming edges
                 */

                $orphans = array();

                $nodes = $this->_nodes;

                foreach($nodes as $key => $node)
                {
                        if($node->isOrphan()){
                                $orphans[] = $node;
                        }
                }

                /**
                 * While S is non-empty do
                 */

                while(count($orphans) > 0)
                {
                        /**
                         * Remove a node n ($node) from S insert n into L
                         */

                        $node = array_pop($orphans);

                        /**
                         * Add value
                         */

                        array_unshift($values, $node->getValue());

                        /**
                         * Reset links (artifact of SplObjectStorage)
                         */

                        $links = $node->getLinkedNodes();
                        $links->rewind();

                        /**
                         * For each node m ($linked) with an edge e from n ($node) to m (nodes with n as parent) do
                         */

                        while($links->valid())
                        {
                                $linked = $node->getLinkedNodes()->current();

                                /**
                                 * Remove edge e (link from n to m) from the graph
                                 */

                                $node->removeLink($linked);

                                /**
                                 * If m ($linked) has no other incoming edges (isOrphan) then
                                 */

                                if($linked->isOrphan()){
                                        /**
                                         * Insert m into S
                                         */

                                        $orphans[] = $linked;
                                }
                        }
                }

                foreach($nodes as $node)
                {
                        /**
                         * If graph has edges then output error message (graph has at least one cycle)
                         */

                        if(!$node->isOrphan()){
                                throw new \RuntimeException("Sort failed");
                        }
                }

                /**
                 * Output message (proposed topologically sorted order: L)
                 */

                return $values;
        }
}
 

In truth, I didn't realize that SplObjectStorage::rewind needed to be called which caused me a lot of headaches. This isn't a problem until you add more than one link on the same node, which you can see I did not include in my test case. But for brevity, lets just say that I did ;-)

The last check the algorithm does is to prevent circular references from excluding results. If there's circular references, some nodes will never become orphans. Just to make sure this works:


/**
 * Assert the operation will fail when a circular reference exists
 */

public function testCircularReference()
{
        $sorter = new TopSorter();

        $a = new TopSortNode('a');
        $b = new TopSortNode('b');
        $c = new TopSortNode('c');
        $d = new TopSortNode('d');

        $sorter
                ->register($a)
                ->register($b)
                ->register($c)
                ->register($d);

        $a->addLink($b);
        $b->addLink($c);
        $c->addLink($a);

        $this->setExpectedException('RuntimeException');

        $sorted = $sorter->getValues();
}
 

This utility is now usable for all situations where sorting by dependency is needed. For building the javascript list I did something like this:


foreach($finder->findFiles('#.js$#') as $path => $relative)
{
        $contents = file_get_contents($path);
        preg_match_all("#@class\s+(.*)(\b|$)#", $contents, $matches);
        array_unique($matches[1]);

        $sorter->register(
                new TopSortNode(
                        array(
                                'http' => "/modules/$relative",
                                'classes' => $matches[1],
                                'contents' => $contents
                        ),
                        $path
                )
        );
}

/**
 * Establish links
 */

foreach($sorter->getNodes() as $nodeA)
{
        $specA = $nodeA->getValue();

        foreach($specA['classes'] as $className)
        {
                foreach($sorter->getNodes() as $nodeB)
                {
                        $specB = $nodeB->getValue();

                        /**
                         * Skip self-links
                         */

                        if(in_array($className, $specB['classes'])){
                                continue;
                        }

                        /**
                         * Check for any dependencies to $class from $nodeA
                         */

                        if(preg_match("/" . preg_quote($className) . "/", $specB['contents']))
                        {
                                $nodeB->addLink($nodeA);
                        }
                }
        }
}

/**
 * List now contains topologically sorted specs
 */

$list = $sorter->getValues();
 

0 Trackbacks

  1. No Trackbacks

2 Comments

Display comments as (Linear | Threaded)
  1. My solution was a bit uglier. I put every component into a separate file and add a list of it's direct dependencies (ugh!). Then I let PHP recursively load all JS files needed for any given 'Screen' (a basic unit of my GUI system).

    I'm currently thinking on how to remove dependency lists by parsing the source code for used classes (might be tricky when lazy instantiation through xtype is used).

  2. Not really. I don't scan for xtypes myself (I use a @depends JSDoc tag), but adapting the code above to work for xtypes is trivial.

    Just add a sibling to the 'classes' index in the spec, eg:


    preg_match_all(&quot;Ext.reg(ister)?\(('|\&quot;)([a-z]+)('|\&quot;)&quot;, $contents, $matches);

    'xtype'  =&gt; $matches[3]
     

    Then simply add another iteration for every file, if a match for "xtype:\s*('|\")" . $specA['xtype']. "('|\")" is found, add the link.

    PS Serendipity sucks ass.

Add Comment

You can use [geshi lang=lang_name [,ln={y|n}]][/geshi] tags to embed source code snippets.
Standard emoticons like :-) and ;-) are converted to images.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA


Antiquities and such