Spidering

Exhaustively testing every function and feature on a web page can be hard and tedious work. While it's easy to try the obvious things, testing every link and button on a page can involve hundreds of clicks - and even then, how do you know you didn't miss one? If the links or buttons on your page change frequently then you are even worse off - you might record a script but you would quickly find your script was testing the wrong links!

Spidering is a feature to help you efficiently try every navigation that is possible on a page with just the click of a button. Using spidering you can quickly find any broken links or other actions that cause problems - without ever having to record them!

How Spidering Works

The term "Spidering" comes from the idea of a spider crawling across the links in its web. In Badboy, you initiate Spidering by dropping a Spider item from the Toolbox into your Script. When the Spider item plays, it first surveys all the links and buttons it can find on the page and makes a list of them. From then on, each time the Spider plays it executes one item from it's list.

The figure below shows how a typical Spider item might appear in your Script:

Spider Looping

Since a Spider item only executes one item from your page each time it plays, in order to execute all the links on a page it needs to execute in a loop. You accomplish this by placing the Spider item in a Step. You don't need to configure looping for the Step - the Spider item will automatically loop for you after each spidered link or button is navigated.

If you like you can disable automatic looping and control it yourself by adjusting the Spider's properties.

Each time the Spider plays, it follows the following process:

  • First, it executes the next of the navigations on your page from it's list
  • After the navigation has executed, if there are more navigations to try, it will loop back to the first item in it's parent Step.
  • If the Spider has exhausted all the links and buttons on the page it will exit and Badboy will continue playing the items after the Spider in your script.

In this way the Spider item will loop inside it's Step until it has browsed all the links on a page.

Navigation Options

Sometimes you may not always want to execute every navigable element on your page. Badboy offers you the ability to control what content is Spidered in several ways:

  • You can choose what to include: Links, Buttons or both
  • Mode - This is an advanced option that lets you choose whether to browse links by name (Navigation Mode) or to browse them by their URLs (Request Mode). For highly dynamic links browsing as Navigations is usually better, but for pages where many links have the same name, Request mode may be necessary.
  • Which browser frames to include - you can choose a specific frame or you can just let Badboy browse all of them
  • Specific exclusions - you can provide a list of regular expressions which will filter out content that should not be browsed. For example, if you don't want Badboy to browse the "logout" link then you might enter "logout" here.

Setting Assertions

You may like to configure Assertions for your Spider item so that you can check as each page is Spidered that your web site is working correctly. To add Assertions, place them before the Spider item in the script and configure them as cascading Assertions. By making them cascading Assertions you ensure that they will execute each time an item is Spidered. The figure below shows how an example of how you might configure a cascading Assertion for a Spider item in your script.

Populating Forms

If your pages contain Forms then you may wish to ensure that they get populated with data so that Badboy can spider them correctly. To do this, just add Form Populators for any forms on the page prior to your Spider item. The figure below shows how this would appear:

Performing Actions on Spidered Pages

Badboy will automatically check for errors on the pages visited by the spider, and you can also add cascading Assertions to check conditions as described above. Sometimes, however, you might want to do more actions on each spidered page: execute Javascript, conditional logic, populate variables or other actions. To do this you can add children to the Spider Item itself: the spider will execute it's child items for every page that it visits. The figure below shows how this looks in your script:

Random Walking

By default the Spider Item creates a plan that it uses to make sure it tries every link on your page when it executes. This makes sure that the each navigation is covered in an orderly manner. If you like, however, you can tell the Spider item to just pick a navigation (link, or button) to perform at random. In this case no plan is created and each time the Spider just randomly picks a link or button on the current page to navigate. Note that if you use the "loop automatically" option in combination with Random Walking, Badboy may loop forever because in Random Walk mode Badboy does not remember which links it has already traveled and may execute the same links over and over again. It will only finish in that case when it comes to a page where it cannot find any navigations to perform!

Controlling Looping Yourself

In some situations you may prefer not to have Badboy jump to the previous step immediately after executing a Spider. For example, you might like to perform some other operations - Mouse Clicks, Variable Setters etc. In this case, just uncheck the "Loop Automatically" box in the Spider Item properties. This will cause the Spider to execute and then carry on with subsequent items in your script without looping. You can control looping yourself by setting a loop on the Step or using other features that change the script flow (for example, using Badboy's OLE seek() method).

Detecting Errors

If there are errors during Spidering (for example, due to broken links on your page), usually you will want to locate them and fix them. Badboy records the errors by adding them as red Responses under the Spider item. If you expand such responses then you can see detailed information about the actual navigation that caused the problem. You will also see an item created for you by Badboy which reproduces the problem. To reproduce the failure you can play this item manually, or copy or move it elsewhere to play it as part of your Script execution. The figure below shows how errors are captured during spidering:

Recursive Spidering

A Spider in its default configuration will test all the links and buttons on a single page. However it will not descend into the links and buttons on the child pages that are visited to test those. When a spider descends more than one level into a website it is known as "recursive spidering".

Badboy does not support infinite depth recursion, however if you want to spider multiple levels you can do that by adding a Spider Item as a child of another Spider Item. In this configuration Badboy will Spider each link on the top level page and then for each visited page it will spider each link or button found. When errors are found, instead of a single error being reported, Badboy will record all the navigations or requests that preceded the problem from the starting page (so, for example, if you have 2 levels of spiders, you will see 2 navigations recorded when a problem occurs). The diagram below shows how a 2 level recursive spider with some errors appears in a script:

It is important to understand that when a Spider is recursive it may end up back on a page it has previously visited. For example, if you start at the Home Page and then every child page from the Home Page has a link back to the home page then a recursive Spider may well find itself back on the Home Page again, and if it descends further it will end up re-testing the whole Home Page. This may not be a problem, but it will waste a lot of time. To help avoid this, you can configure Spider Items to keep track of which links have been visited and not revisit ones that have already been navigated. To do this, check the box in the Spider Item labelled "Filter Duplicates", as shown below:

The "Filter Duplicates" option will stop a Spider from revisiting the same URL multiple times. It will not stop the spider from navigating a link or button with the same name multiple times, if that link actually points to a different URL. When a Spider is a child of another Spider in the script it will check its parent Spiders and filter duplicate links that its parents have visited as well as links it has already visited itself.

Although the "Filter Duplicates" option will try and stop your Spider from going in circles, it is not perfect - it cannot prevent the Spider from looping in cases where the URL to be navigated to is not clear from the element on the page, for example if a link is responded to dynamically using Javascript, or is redirected or if the loop occurs as a result of a button rather than a link. Hence you may still find it beneficial to add some Exclusions to prevent this happening for cases that you discover.


Badboy Documentation Generated on Mon Dec 29 22:28:42 EST 2008