According to Search Engine Watch (2011), to achieve mass-scale rankings, it is critical to understand the technical aspects involved when a search engine is viewing a website. This includes detecting crawlabilities issues, finding 404 error pages and the use of a robots.txt file.
Default 404 error pages are displayed by servers and are cryptic to novice web users (Walter, 2008). The author points out that they provide no explanation of why the error occurred and no other means of getting back on track. In order to help discover 404 errors, Google Webmaster Tools (GWT), provides best practice diagnostics to see a list of 404 errors crawled by Google. Moreover, a list of internal and external links pointing to the 404 page is displayed (Search Engine Watch, 2011). The significance of rectifying 404 errors is to prevent the potential loss of visitors, as dead end links can frustrate users and cause them to leave the website.
A common solution to the problem is to create a custom error page that can provide a clear description of the problem and offer solutions to get visitors back on track. Best practice guidelines for 404 error pages are provided below:
- The design should be consistent with the rest of the site.
- Include a clear message acknowledging the problem, indicating the possible cause (don’t admonish your users) and offer solutions.
- Avoid technical jargon as many users may not know what a 404 error means.
- Include a link to the homepage.
- Include a link to the sitemap that provides links to all of the main sections of the site.
- Include an option to report a broken link (contact form link).
To support this, Coyier (2009) suggests that a 404 page should look like an error page, but it should still look like your website. The author describes that the page should provide apologise to the user (2), provide useful links (4, 5) and provide a way to contact or report an error (6). Furthermore, Walter (2008) underlines that users should never be auto-redirected to another page, as users will be unaware that an error occurred, leading to confusion.
Robots txt file
A robots.txt file instructs web agents (also known as Web bots, Crawlers, or Spiders) to access certain parts of a website. In other words, a robots.txt file controls which pages are indexed and excluded from search engines. This allows website owners to use the file to give instructions about their site to web robots; the Robots Exclusion Protocol (Robotstxt.org, 2010).
According to this specification, web bots are required to look for the presence of the robots.txt file in a websites root directory before it can download anything from the website. This ascertains how the web bot should access files in other directories. Schrenk (2007) points out that the filename is case sensitive and so it must remain in lower case. Furthermore, each website should only have one robots.txt file.