Single Page Applications (SPA) have grown in popularity in the last couple of years with developers for new web applications. AngularJS, ReactJS, and BackboneJS are the most popular JavaScript (JS) applications building SPA. Aside from it being a new toy for web developers to play with, it has many benefits; easy functionality, better structured JavaScript applications, databinding, and its ever so clever rendering. Traditionally, users must wait until all the resources and images are loaded before rendering begins. With JS based websites, pages are dynamically updated on the user’s interaction with the app. This means resources, JavaScript, and pictures are loaded once the user navigates to the page. In theory, this sounds like a new technology that both SEO and web developers can stand behind. But when in play, the ever so clever rendering is proving to be highly problematic. The main issue that concerns SEO is JS based websites are having a difficult time being indexed properly in Google and other search engines.

Before we address the issues, we need to first explain how the Googlebot crawls JS websites. The Googlebot does several things when it makes first contact with a website, but it does two main things that concern us the most:

  • Googlebot starts with an initial request for the pre-DOM HTML (This is the website’s static HTML source code and most commonly used for HTML websites. You can view any website’s source code by viewing the page source in your browser.)
  • Googlebot starts to render the post-DOM and the content is loaded (This is the dynamic code that is rendered after JavaScript modifies it based on the user’s interactions with the app)

Once Googlebot has the rendered content, it treats it like the traditional HTML source. This process gives Google two HTML versions of the page:

  1. pre-DOM HTML Source
  2. post-DOM rendered HTML

These two HTML versions can potentially cause problems in how Googlebot is interpreting your website. Google’s JavaScript crawling is fairly new and is not reliable as their HTML crawler. Google can index content rendered by JS but sometimes it doesn’t. Google says “Generally able to render and understand” JS. “Generally” doesn’t sound all assuring and indicates Google can’t render and understand ALL content on JS. It also means that Google is still HTML first.

Another issue that will cause Google bots to not crawl and understand JS websites is time. If too much time passes or something doesn’t go as intended, Googlebot will either turn to the pre-DOM HTML source of the page or leave the site without crawling it.

Common SEO Issues with JS Websites

1. Missing Page Meta Data and Crawl Controls

Googlebot will use the rendered version (post-DOM rendered HTML) but it may refer to the pre-DOM to fill in missing data. Missing information between the two versions can prove to be highly problematic and causation for URLs not to be indexed. We recommend both versions to contain all the necessary information: meta data, page titles, crawl controls, etc. If Googlebot can’t read the JS information from the post-DOM, the pre-DOM will be your backup. And most importantly, both the pre and post-DOM must contain identical data.

2. Inconsistent Versions

Googlebot will also refer to the pre-DOM for any inconsistencies and if both versions don’t match up, this could lead to indexing issues. Meta content, page titles, crawl controls, etc. should mirror one another exactly. We can’t rely on Google to crawl, render, and index all content from a JS page. Keep in mind at all times that Google is still HTML first. We found that Google will try to render and index information found in the post-DOM…however, if it cannot, it will look to the pre-DOM version. Also, you must consider that Google is not the only bot. You also have Bing, Facebook, Twitter, LinkedIn, DuckDuckGo, other SEO and web developer tools, etc. that can’t crawl JavaScript. Having identical versions of pre and post-DOM HTML will help ensure information is crawled correctly when Googlebot fails to read post-DOM HTML and when second tier bots can’t crawl JS websites

3. Clean URLs

Every page needs to have a unique indexable URL if you want it indexed by search engines. A pushState will not create a unique URL as the page needs to be returning a 200 OK server response.

4. Typical HTML Attributes

Continue using typical HTML attributes such as including links in href and images in src attributes. Doing so will help Google recognize internal linking structure and access internal links and images throughout the website.

 

Consider these issues carefully when considering JS based websites for your business. Improperly deploying a JS website without following and understanding HTML standards can lead to search engines not indexing your website or cause rankings to plummet.