PhantomJS and asynchronous resources after load

I’ve been dabbling with PhantomJS for about the past 3 months and just “picked up my pen” again this week. I’m trying to write some automated testing around tag resources loading onto the page.

page.open(url,function(status){
    if(status=='success'){
       var check = page.evaluate(function(){
           return window._blah?true:false;
        });
    }
});

What I noticed was varying results as to certain tags being loaded or not, and upon re-running against a site it would change. I knew it had something to do with async scripts loading in tags afterthe official page “load” event. My checks are centered around the javascript for these tags actually being present and active on the page – to me a more “true” test of a tag other than checking for the <script> tag.

page.onResourceRequested = function(req){
    console.log("onResourceRequested");
}

page.open(url,function(status){
    if(status=='success'){
       console.log("PageLoad");
       var check = page.evaluate(function(){
           return window._blah?true:false;
        });
    }
});

Just as I expected looking at the output, there was a stream of onResourceRequested events, a “PageLoad” then one or two more onResourceRequested – proof that executing my checks at page “Load” were probably firing too soon to capture the working order of some async tag scripts.  So, what can you do?  We’re in a bit of a catch-22 since we have no way of knowing when all of the resources that I’m looking for have been requested – and the whole point is to check for when these resources “magically” disappear.

Solution: As part of my test, I can reasonably assume a certain amount of time within I expect all of my tags to be loaded and active on the site.  So I can create myself a window after the official DOM Load event before checking.  The downside to this is if there is a slow loading tag, or something else blocking on the site I might miss a tag that does eventually load.  However, in this case I’ll want to know and be aware of this anomaly so I can alert the client.

page.onResourceRequested = function(req){
    console.log("onResourceRequested");
}

var queue = [];

page.open(url,function(status){
    if(status=='success'){
       queue[url] = 'queued'
       window.setTimeout(function(){
           if(queue[url]!='done'){
              console.log("PageLoad + n seconds");
              var check = page.evaluate(function(){
                 return window._blah?true:false;
               });
               queue[url] = 'done';
            }
        },5000); //I'll go with 5 seconds
    }
});

After implementing the setTimeout to give myself delay after the Load, I had a few cases when iterating over a set of pages that the asynchronous timeouts began to fall all over themselves. I think one of my sites was getting a double page load. I’m lazy so instead of trying to figure it out, I just added a queue array that I could shove a site into before I set the timeout. When the asynchronous code runs after my delay interval it will check to see if it has already executed. This ensures it won’t kick off multiple copies that begin to exponentially grow like some kind of asynchronous monster. I’m doing synchronous programming in asynchronous javascript.

2 thoughts on “PhantomJS and asynchronous resources after load

    1. Oh, I did. I found CasperJS to be really slick as well. I ran into problems getting the control I wanted over saving and output of results, and getting results exported to JSON when running under Windows. I’m running the same test(s) on over 150 different sites, seemed like more work to organize it in the “CasperJS”, testing framework style-way.

Leave a Reply

Your email address will not be published. Required fields are marked *