Total Validator
HTML / XHTML / WCAG / Section 508 / CSS / Links / Spelling

Help | Website | Feedback

Introduction

All of the options that appear on the Include tab of the Pro tool are described below.

Include tab

top

Skip

You may not always wish to check certain parts of your site. This option allows you to skip parts of the site by specifying one or more paths below the starting page to ignore.

When you click on the 'Skip' button a dialog appears to allow you to add, remove, and update a list of paths to skip.

The paths that you enter here must start with a '/'. Any pages that lie below the starting page and within the path specified here will be ignored.

For example if your starting page is http://www.mysite.com/some_path/index.html and you specify a path to skip of /other_path then pages that start with http://www.mysite.com/some_path/other_path will be skipped. Note that this includes pages such as http://www.mysite.com/some_path/other_path/index.html and http://www.mysite.com/some_path/other_path_also/index.html. If you just wish to skip http://www.mysite.com/some_path/other_path/index.html then specify a path to skip of /other_path/.

You can use this option in combination with the Include option to provide further restrictions on what to check. For example you could set a path to skip of '/' to skip everything except the starting page, and then use Include to specify exactly which paths should be validated. You can also use the 'robots.txt' option at the same time for further fine-grained selection of what to validate.

You can use regular expression syntax here, but you must always start with '/' as the first character and note that .* is always automatically added to the end of whatever you enter.

Note: When using this with the Follow local option all skip paths are relative to the root of the website (or filesystem for local pages) instead of the starting page.

top

Include

If you specify some paths to skip, or use Disallow within your robots.txt file, you may wish to override this to include some paths within these areas of your website that you do wish to check. Note that it only makes sense for these include paths to be 'below' the paths to skip or 'below' paths disallowed in the robots.txt file.

When you click on the 'Include' button a dialog appears to allow you to add, remove, and update a list of paths to include.

The paths that you enter here must start with a '/' and be more than just a single '/'.

For example if your starting page is http://www.mysite.com/some_path/index.html and you specify a path to skip of /other_path/, and an include path of /other_path/sub_path/ then pages such as http://www.mysite.com/some_path/other_path/index.html will be skipped, but pages such as http://www.mysite.com/some_path/other_path/sub_path/index.html will be included (as long as you have a link to them from any pages that are validated).

If you wish to skip everything except a single folder you could set a path to skip of '/' (to skip everything except the starting page), and then use 'Include' to specify exactly which paths below this should be validated. You can also use the 'robots.txt' option at the same time for further fine-grained selection of what to validate.

You can use regular expression syntax here, but you must always start with '/' as the first character and note that .* is always automatically added to the end of whatever you enter.

Note: When using this with the Follow local option all skip paths are relative to the root of the website (or filesystem for local pages) instead of the starting page.

top

Follow local

Normally when checking more than one page, only those pages that start with the URL of the folder containing the starting page will be validated. This includes pages 'above' the starting page, or in a different part of the website.

Selecting the Follow local option will tell the validation engine to validate any page it finds on the same website as the starting page.

Note that using this option significantly modifies how the the skip, include, follow remote and depth options. operate. See the documentation for these options on this page for more information.

top

Follow remote

When checking more than one page; those pages that don't start with the URL of the folder containing the starting page will be ignored. This includes pages on remote sites and pages 'above' the starting page, or in a different parts of the same website.

Selecting this option will cause the validator to ignore this restriction on the starting page only and so will visit all the pages linked to the starting page regardless of their URL. Effectively each link on the starting page is treated as if it is the starting page itself.

As this applies to the starting page only, links on subsequent pages will be treated as normal and only links 'below' will be validated. But you can extend this further by also setting the Follow local option so that all links to all pages on all sites referenced on the starting page are validated.

Use this option with care otherwise you may end up checking far more pages than intended. It is expected that in most cases this option will be used with a specially constructed starting page that references different parts of the same website or different websites. See Validating multiple sites for further details.

top

Ignore case

The path component of URLs are case-sensitive. So mypage.html and MyPage.html are references to completely different web pages. However, because Windows uses a case-insensitive file system many web authors mistakenly assume they point to the same page. This is a mistake and should be corrected, and most web servers will generate broken links where it occurs. However, IIS masks this problem and so if you use IIS you may not be aware of this issue.

This problem will make your site slower, affect your page rankings in search engines, and skew any web site analytics you run. It may also cause Total Validator to validate the same page more than once and/or miss pages altogether. As such it is in your interest to correct this error as soon as possible.

If you are running a server on Windows and suspect that you may have this problem, then look for duplicated pages in the results and use the 'ignore case' option. Using this option will cause Total Validator to ignore the case in URLs when deciding whether to follow links such as ../SubFolder/.. and ../subfolder/.. and you may find that more pages than before are now validated.

top

Use robots.txt

An alternative way of specifying which parts of your site to skip is to add a standard robots.txt file to you website. Total Validator will use any rules marked for all user agents with a *, as well as those specifically marked with a user agent of 'TotalValidator'. For example:

User-agent: *
Disallow: /blogs

User-agent: TotalValidator
Allow: /support/
Disallow: /support/resources/

Total Validator supports all of the features supported by Google including multiple 'Disallow:' and 'Allow:' statements in any order, wildcards and suffixes.

Note that paths in a robots.txt file are relative to the root of the site and not the starting page for validation unlike the 'Skip' and 'Include' options. The starting page itself will always be validated even if the robots.txt file disallows it. This option can also be used in combination with the 'Skip' and 'Include' options for fine-grained selection of what to validate.

Also note that when used with the 'Follow remote' option the robots.txt files for each of the sites on the starting page will be read and applied to that site only.

top

Validate errors

If you select this option then whenever your web server returns an error status code such as 404 (page not found), then the error page sent by the web server will be validated.

This is a useful way of checking that your error pages also conform to standards.

top

Ignore problems

If you use the tool and it reports that there are 'errors' in your site that you are happy to live with, then use this option to stop them appearing. In this way you can clean up the reports produced to make them more useful to you. You could also ignore any errors/warnings that you think are errors in the tool itself, although we would prefer it if you could let us know so we can fix them so that everyone will benefit.

This value you supply must be a comma separated list of errors and/or warnings to ignore. For example:

E601, W600, E404, P861

Once you've seen how the errors/warnings are reported we are sure you'll understand what to put in here.

This option applies to all pages/css validated. If you need finer control then you can add special instructions to your pages/css instead.

top

Depth

This option allows you to restrict how many folders 'below' the starting page to check, based on the number of '/' characters in the link. So if you just wish to check the top few levels below the starting page then set how many here.

If you use this with the Follow local option, the the depth is relative to the root of the website (or filesystem for local pages) instead of the starting page

The value that you enter here must be an integer (whole number) greater than zero. Leave blank to visit all pages below the starting page.

top

Stop after problems

This option allows you to specify the maximum number of problems to be reported before the validation is automatically halted. This is especially useful on large sites, where the same problems may be reported again and again. Instead of waiting for the whole site to be validated you can fix these common problems after validating only a few pages and then validate the entire site.

The value that you enter here must be an integer (whole number) greater than 0. Leave it blank for this option to be ignored.

Note that the number of problems reported could be slightly higher than the figure you enter, as all problems for the last tag validated will be reported.

top

Stop after pages

This option allows you to specify the maximum number of pages with problems to be reported before the validation is automatically halted. This is an alternative to the 'Stop after problems' option, although both can be used at the same time if required.

The value that you enter here must be an integer (whole number) greater than 0. Leave it blank for this option to be ignored.

top

Page pause

If you wish to minimise the impact of validation requests on your server you can use this option to set the time in milliseconds to pause before retrieving each page. By pausing in this way the rate of requests hitting the server will be reduced. Normally this option is used together with the Link pause option.

top

Browser identification

When validating a website the tool identifies itself as 'TotalValidator/6.0' by default. If you wish the tool to identify itself as another user agent, then select the required identity from the drop down list.

You can amend the list of identities and what they mean using the 'Edit List' button. This will display a dialog box for easy editing of the list of user agents and the corresponding text sent to the web server when the tool accesses it. You can generally find the user agent strings used by your website visitors by viewing your web logs. There are also online resources such as http://www.useragentstring.com

If you wish to return to the default list of identities then use the 'Reset' button provided on the edit screen. This also allows you to update the list with the identities provided in the most recent version following an upgrade.

top

Strip query

Some websites are constructed such that query parameters are dynamically added to links on their pages such that the links are different each time the page is served. This is a problem for Total Validator which treats these links as being to different pages because the URLs are different. This means that it will test the same page(s) again and again.

If this happens to you then use this option to prevent it. The links will then be stripped of all query parameters before being used. Note that this may mean that not all pages are checked, depending on how the query parameters are used.

top

Strip session

Some websites are constructed such that session ids may be dynamically added to links on their pages. These links typically add these session ids to the end of the link using a semicolon ';' to separate them like so:

http://thewebsite.com/path/page.html;jsession=123456

This can sometime be a problem for Total Validator which may view two links to the same page as referring to different pages because the URLs are different. This means that it may test the same page(s) again and again.

If this happens to you than use this option to prevent it. The links will then be stripped of the semicolon and everything following this up to the start of any query parameters or to the end of the URL if there are none.

top