Seo guidelines for website development

Table of Contents

Content is defined as all or one of the following:

  • a href links
  • Images
  • Text
  • Relevant and useful content not requiring a click or scroll event

General Guidelines For All Pagetypes

Most general SEO guidelines are front-end requirements, but to accomplish them, backend allowances must be made. As such, any Development effort should consider these requirements before coding.

Images & Video

  • Page speed, accessibility requirements, general UX/usability, and crawl/indexability are the four most important aspects when it comes to coding in a way that allows images and video to rank well in Google Search
    • All other SEO factors (meta tags, URL naming convention, click paths and breadcrumbs) are normally handled when the four main requirements listed above are addressed
    • All Images on the site should:
      • Be hosted on the root domain, or at least a *example.com subdomain
        • Meet Google’s accessibility requirements
        • Description and alt text should be manually alterable
        • Title can be defined as a phonetical version of the image name
        • Have sizing adjusted prior to upload to properly fit viewport
        • Keep file sizes as small as possible. Ideally, have a file size under 100kb, though exceptions exist
        • Do not scale images in HTML
          • Don’t use a bigger image than necessary just because the width and height can be set in HTML
            • Example: An image of 500px wide and 250px wide should be displayed on-page at 500x250px rather than a scaled down 400x200px image
      • All videos on site should:
        • Include structured data for video
        • Ideally, videos should be hosted on the site or through a third-party hosting service like Wisita, as this gives the brand more ownership over the content
          • YouTube is still a viable option, however, self-hosting the video allows additional SEO benefits
        • Include transcription of the video (when applicable) within the page where the video is embeded or separately near the video
    • Use Native lazy loading for images and videos or webp formatting
      • If using webp, do not append URL parameters to the image URLs. Instead, define size and add tracking separate from the URL itself

Content Indexation - Rendering & Display

Generally speaking, all relevant and useful content on a page should render correctly with Javascript disabled and present in the HTML source document.

 

  • No tricks here–code with HTML and CSS for the “core” content that loads when the page loads. Anything requiring an onclick or scroll event should call an API at that time and not on initial pageload
    • Examples of post-page load JS content is tooltips, shipping and location checks post user input, and the like
    • When personalization such as user-specific data (pricing, location-based content, personalized recommendations, etc.) is present, this should be served as a layer over the generalized content and can be handled client-side
      • The client-side JS overwrites the generic content in each zone with personalized content when personalized content is available (cookied users)
        • Bots and new users in this case see the same content, but cookied users get the personalized experience
  • All content-based widgets–product  or content recommendation carousels, user-suggestion widgets, sales information, etc should have a “generic” or non-personalized version that is served to new users and bots
    • This content can be cached and served periodically (daily to weekly at most)
  • If the content is on every page with the same page template and a click or scroll action is needed to view the content:
  • Ideally, use an ajax style call to a separate file that is loaded when the user makes a click and not on initial page load
    • Note: Googlebot does not make click or scroll actions, so any content accessed this way will not be considered when ranking a page
      • This is why it’s important to have all unique, value-adding content rendered server-side with HTML/CSS, and use client-side JS to layer on personalization features
  • Try not to have to include this content in the HTML source or rendered DOM upon initial page load – only include in the DOM once a user makes a click or scroll action to view this content
    • If the API used to call this content is exposed on initial page load, Googlebot has a tendency to crawl it to fetch content that is not useful, which wastes crawl resources and can lead to the API endpoints being indexed and ranking confusion
  • If the content loads on initial page load, server-side render it and include in the initial HTML source document
  • Ensure the links, text content, and images in the HTML source document are the same as what’s in the fully rendered DOM after initial page load for new users (non-cookied) and bots

SEO Meta Tags

All SEO meta tags are server-side rendered, in the HTML source, and not altered via JS.

 

With the exception of checkout pages and post-login landing pages, all URLs should have a self-referencing canonical tag. One other possible exception: if separate pages/URLs are built as affiliate landing pages, and are skinned to look nearly identical to the homepage, these pages should canonical to the homepage.

 

A handy site audit checklist for SEO meta tags by pagetype: 

  • Place SEO meta tags as high in the <head> section as possible
  • Never overwrite SEO meta tags with JS, this can lead to search engine confusion and hurt rankings – load server-side and leave as-is
  • Always use absolute URLs (no relative URLs); this is the URL you can copy and paste from the browser after the page fully loads
  • Never have a canonical tag point to a page that redirects, or does not serve a 200 status code
  • Include the following on every page:

Canonical tag

  • Should be loaded as a core part of the HTML code and not loaded with JavaScript (or altered during the rendering process) so it can be properly understood by search engines
  • Depending Canonicals should be either
    • Self-referential (https://example.com should point to https://example.com)
    • Or, pointing to relevant alternative 
      • This is on a case-by-case basis, however, some blanket rules will apply in cases of pagination, faceted navigation, etc. (see next point)
  • Paginated canonicals should be self-referential and not point to first page in series (https://example.com/page-4/ should point to https://example.com/page-4/, not https://example.com/)
    • Exceptions exist if there is a “view all” page option available
      • “View all” pages should have a self-referencing canonical tag
  • Only one canonical should be on a page

Title Tag

  • Should be loaded as a core part of the HTML code and not loaded with JavaScript so it can be properly rendered by search engines
  • Should be near the top of the <head>
  • Should only have one per page
  • Should be free of HTML entities
  • Should be included upon page load so search spiders can crawl upon page render
  • Should contain targeted keyword(s) for page, as well as any other important page-specific elements (ex: page topic, product line featured on page, etc.)
  • Contains branding at the end (Title Tag Example | [new domain name])
  • average 60 characters in length

Meta description

  • Should be near the top of the <head>
  • Should only have one per page
  • Should be free of HTML entities
  • Should be included upon page load so search spiders can crawl upon page render
  • Should average roughly 150 characters
  • Should contain similar keywords to the title 
  • Should have some sort of call to action (CTA)

Hreflang

  • Ex. for: https://www.example.com/page-url/

<link rel=”alternate” hreflang=”en-us” href=”https://www.example.com/page-url/”>

<link rel=”alternate” hreflang=”en-ca” href=”https://www.example.com/page-url/”>

<link rel=”alternate” hreflang=”x-defualt” href=”https://www.example.com/page-url/”>

 

  • Example for: https://www.example.com/page-url/

<link rel=”alternate” hreflang=”en-ca” href=”https://www.example.com/page-url/”>

<link rel=”alternate” hreflang=”en-us” href=”https://www.example.com/page-url/”>

<link rel=”alternate” hreflang=”x-defualt” href=”https://www.example.com/page-url/”>

 

  • Hreflang tags dedicated to the language of the page should match the canonical tag (EX: On https://www.example.com/, the canonical should be https://www.example.com/, as should the hreflang dedicated to that country (or language if applicable), <link rel=”alternate” href=”https://www.example.ca/” hreflang=”en-ca”>) In other words, the self referential hreflang tag should match the self-referential canonical tag
  • Hreflangs should include a “x-default” declaration pointing to the primary version of the site (in many cases, this is the English version of the website)
  • Hreflangs should only declare an ISO 3166 region code if the page is targeted to a specific region (ex: Canadian-specific page is targeting just users in Canada). Otherwise no region should be declared
  • All hreflang should:
    • Only point to pages that exist / serve a 200 status code
    • Only point to the exact equivalent of the page’s canonical (example.com/page-1/ should ONLY point to example.ca/page-1 as opposed to example.ca/page-1/?=query-paramater )
    • Reciprocate between designated language pages. For instance, it’s vital that any language page pointing to another language page should have an hreflang attribute pointing back to it. (EX: if the English page points to the Spanish page, then the Spanish page should also point an hreflang back to that same English page)

Robots meta

  • Valid pages intended to be shown in search engine results should NOT have a “noindex, follow” or “noindex, nofollow” tag
    • IMPORTANT: Any time new changes are pushed, this should be checked to make sure all new content is indexable 

Tracking tags

  • Tracking tags (Google Tag Manager, Google Analytics, etc.) should be in the <head>
  • This should be checked with each update as it’s often one of the elements most often left out of the process

Structured Data

  • Place JSON-LD schema in the <head> section if possible
  • If using microdata or RDFa, wrap the live text for each data denomination whenever possible

Pagination

Pagination URL example: https://www.example.com/page-url/?page=2

 

Append “page [page number]: to the title and H1 on all paginated pages. Ex. The title for https://www.example.com/page-url/?page=2 would be “Page 2: [title from page 1]”

 

Use the meta description from the first page for all paginated pages (https://www.example.com/page-url/ is page one building off the above examples and its meta description should be copied to all paginated pages)

 

Pagination recommendations vary widely site-to-site, and depend on a wide variety of factors. Google has also changed their recommendations recently and no longer uses re=”previous” and rel=next”. The following generalized guidelines are recommended:

  • Article/content sections should be featured on the main content listing page, with links to articles within that section and the most recent articles for the category
    • From there, each category should utilize the standard page 2, page 3, page 4, etc. browsing format found across the web
      • Every paginated page should canonical to itself, and not the main category page
        • Do not use infinite scroll techniques that require a scroll action – Google does not perform scroll actions and will not find the content without a separate solution also implemented for SEO purposes (a “view all” page version)
      • The more categories, the better, as long as each category has at least two or more pages of content to display to a user/bot
      • Include all your standard meta tags on each page documented elsewhere in this sheet

General < head > and < body > Guidelines

  • Ensure there are no iframes in the <head>
    • Iframes effectively close the <head> section for crawlers and everything proceeding it will be ignored by search engines
  • Place SEo meta tags as high in the <head> section as possible

OnClick, Scroll Events & Noscript Tags

  • While the finer details are more nuanced, it’s best to assume that any content found via an onclick or scroll event, or within a noscript tag is not going to be properly indexed by Google. Ensure server-side rendering for all content that’s relevant and useful to search engines

Content Within Tabbed Modules

The purpose of a tabbed module is to maximize text on a web page while sacrificing as little page space as possible. Tabbed modules can be constructed from HTML lists that are marked up with CSS or JavaScript. Such modules can double or triple the amount of text on a page without requiring any additional space on the page

  • If the content

Handling Error / 404 URLs

  • Serve a 404 status code and the 404 page template directly on the landing page URL. Do not redirect to a 404 URL
  • Provide HTTP 404 Header Error Code for Search Engines
  • Provides the user with links to the home page or any other main pages of the site
  • Displays the HTML sitemap or links to it
  • Displays links to the site’s most popular pages
  • Allows visitors to search site content

Print Versions of Articles/Pages/Documents and PDFs

Print versions of pages and PDFs have little to negative SEO value and should be prevented from being crawled and indexed. To do so:

  • Add a “noindex” directive to the a href links pointing to all PDFs or print versions
  • Not required but helpful: store these documents in a separate and unique folder that can be blocked via Robots.txt
    • Example: https://example.com/print/[file name here]

Worth noting – any content that could be valuable to a user or search engine should be translated into page text within the article or page linking to it. The execution to this is if the content is duplicated from another source or site. When this is the case, rewrite the content and then post it on the article or page that is linking to it

Subdomains

Any content you want to rank well and draw in new users should be on the main domain and not a subdomain. If the content is relevant to new users and should be found in Google search, place it on the main domain. You can utilize subfolders on the main if necessary.

 

Login, My Account, and other shopflow specific pages are moved to subdomains for a number of reasons and is to be expected. However, these pages receive a large amount of quality, value-driving backlinks from other websites, so to leverage that value, it’s important that each of these pages links back to a small handful of key pages on the root domain. Examples include:

  • The homepage
  • The hub or beginning page for each shopping funnel
  • Key products or services (keep this less than 5)

 

There is a real concern that these links can pull would-be customers out of the sales funnel right before they were about to make a purchase. As such, it’s important the links are present and visible, but not a dominant or overpowering part of the template

Redirects

When a page is removed from the site that has relevance to people using search engines, there are a few scenarios to consider. 

  • Does the page have users landing on it daily, and there is no plan to republish it in the near future?
    • If so, 301 redirect to a similar page
      • Added enhancement if possible – serve a small top or bottom viewport-anchored message informing the user as to why they’ve been redirected
  • Does the page have backlinks from other websites that pass link value, and there is no plan to republish it in the near future?
    • If so, 301 redirect to the most relevant page on the website
  • Does the page have 0 backlinks, very low landing traffic, and there is no plan to republish it in the near future?
    • Do not redirect – serve a 404 on the landing URL
  • Will the page be republished periodically (examples being recurring seasonal sales, articles, or similar)?
    • 302 redirect to the most relevant page still serving a 200 status code
      • Added enhancement if possible – serve a small top or bottom viewport-anchored message informing the user as to why they’ve been redirected

Key SEO considerations when implementing redirects:

  • Organize redirects from most specific to least, especially relevant when utilizing a service worker for handling redirects. This helps prevent redirect chains which can bleed backlink value
    • Test using the following Chrome Extensions–there should be 1 redirect, max two as long as the first is a 307 http → https redirect and the backend system does not allow for bypassing this redirect:
    • It’s important to note the manual aspect of this process–the team needs to audit legacy backlinks pulled from Google Search Console and other SEO tools to determine if old URLs inadvertently flow through a redirect chain when a new redirect is implemented
      • Any chains found require a shuffling of order in the redirect file

Personalization

All pages need a “generic”–non-bot, no cookie set–experience that is the “default” version of the page. Once data is known on the user via their cookie, it becomes “personalized.” When this happens, client-side JS can override specific content within the content clocks that exist on both the “default” / “generic” version of the page show, as well as the “personalized” version. 

  • The key to this is that the same zones–some divs, spans, and other HTML–exist and show content to both user types, but when more is known via a user’s cookie, the content can be overwritten via client-side JS
    • That said, the same number of links and images should be present. One version cannot be blank while the other shows a wide variety of content offerings

Sitemaps

  • Generate automated sitemaps by pagetype are regularly updated. Once per day is preferred, once per week would be the outer limit
  • This ensures all search-relevant URLs are being crawled by search engines on at least a semi-regular basis and the proper URL strings are being found and indexed
  • Create image sitemaps that update regularly. This will allow Google to find and index the website’s images. Image sitemaps also ensure search engines find images that may not be accessible to them otherwise (such as images that are displayed using certain JS or loading techniques)
  • Create a sitemap index file to host all sitemaps

Technical Requirements

  • URL/Page Sitemaps

  • Follow all of Google’s Sitemap Guidelines (under technical requirements)
  • XML format all sitemaps
  • Create a sitemap index file at example.com/sitemap-index.xml
    • Include both image and webpage sitemaps
      • List webpage sitemaps in the sitemap index first, then images, then video
  • Automatically updated regularly (ideally daily, weekly at a minimum)
  • Keep the total URL count below 50k per sitemap, and overall size below 50mb
  • Split by page silo and label accordingly for increased measurability and transparency
  • Use only static URLs
    • Given that, do not include:
      • Any URL parameters
      • Any pages that are blocked in robots.txt
      • Any page with a canonical URL that points to a different page
      • Any page other than a 200 status code
        • For example, no 404s/403s, 301/302s, or any pages that have redirects or error codes of any kind
      • Any page with either a noindex and/or nofollow meta tag
      • Non-https URLs
  • Include last modified date if the data point is available
    • If possible, organize the sitemap so that the pages most recently modified are at the top of the sitemap
      • This would mean any content change on the page that a new user or bot would encounter
        • New review, spelling change in a product description, change in what products the  recommendations widgets serve, etc.

Image Sitemaps

For image sitemaps, include the URL where they are found and list all images found on that URL

  • Only for URLs that follow the above guidelines as well as Google’s sitemap guidelines (under tech. Requirements below)
  • Include image Alt text if available
    • If not, include logic to bring alt text in when available, do not include the alt text tag when alt text is not present

Technical Notes

Robots.txt

A robots.txt file is a restrictive file designed to tell a search engine crawler which parts of a website they cannot crawl. In addition to this functionality, the robots.txt can also be used in order to inform search engines as to the location of XML sitemaps. Sitemaps can help search engines quickly find all of the pages on your site, as well as new content as it is published

 

Follow Google’s guidelines for robots.txt files whenever creating or updating a robots.txt file.

 

The SEO team should advise on what to include and what not to include in this file. Here are some examples of what can be specified in a robots.txt file:

 

  • References to any sitemaps or sitemap index files
    • Directive Example:
      • Sitemap: https://example.com/sitemap-index.xml
  • Blocking URLs that lead to confidential or personally identifiable information
    • Directive Example:
      • Disallow: /my-order/*

 

Many business owners put all of their focus into optimizing their website and content without taking the time to build a solid technical foundation.

Need Improvement?

We Are Here To Assist You

Something isn’t Clear?


Feel free to contact us, and we will be more than happy to answer all of your questions.