How to Implement Prerendering with React

Disclaimer: dirty hack here.

I recently finished implementing the front end for my music website. I chose to use ReactJS. However, when I wanted to improve my website’s SEO, I started to run into a major issue that is inherent to React and Single Page Apps frameworks.

The Problem

In a nutshell, React is not suitable for SEO. The reason behind that is related to the rendering process with Javascript. Let’s make a small return in time to understand why.

Back 10 years ago, the only way to create websites was to use server-side rendering. It worked this way :

  1. a client makes an HTTP request, for example, https://musiroom.com/tops/all/all
  2. then, the website’s server prepares the HTML page with all its content in advance, making the necessary database calls and business logic. The page is rendered on the server side.
  3. the client receives an HTML page with most of the relevant information already on the page (of course, some data could still be fetched with Javascript, but as a rule, it represented a small part of the process).

Let’s compare this to what is happening with a single-page app :

  1. the client makes the same request.
  2. the server sends a static HTML page containing almost nothing, plus some javascript.
  3. only afterward, the javascript will load the rest of the information. This is called Client Side Rendering.

Now this sounds great and certainly gives a lot of benefits, performance, and UX-wise. But here is the problem: Google bots (or other tools that need your page) usually stop at step 2. Javascript is a pain for crawling bots, even though they started to support client-side rendering.
The consequence is that Google bots will see nothing but an empty HTML page, which of course is terrible for SEO.

Solutions

With this problem in mind, I started to explore possible solutions for that. Here are the main two :

  • convert my React App using server-side rendering. It would have implied either doing it manually (e.g. by creating a NodeJS app that would render the content) or using solutions such as Next or Gatsby (which is, by the way, the framework I’ve used for this blog)
  • use pre-made tools for prerendering the web page for web crawlers such as google bots. One very popular service for that is prerender.io.

These solutions are perfectly fine and I encourage you to use them if you think they fit your needs. Unfortunately, it was not the case for me: migrating to SSR would have been too long, and prerender.io too expensive for me.

That’s why I decided to implement prerendering on my own.
Let’s see how.

Prerendering architecture

Here is what our architecture will look like. We want the do the following :

  • we do not want to change anything for users who are not bots. This means we have a reverse proxy (like Nginx or Apache) forwarding client requests to static HTML/CSS/JS content from React’s build.
  • we will introduce a new rule for bots. Instead of being redirected like any user, the request will be forwarded to our ‘prerendering app’. This app will fetch the content from the react bundle, load the javascript, and return the rendered HTML page to the bot.
  • because the Javascript loading might take some time (for me it is around 3 seconds), we also store each computation in a cache, so that we don’t need to fetch the request every time. This is also critical for SEO because a page shouldn’t take too long to be loaded.

Now you might see some caveats with this system. If the results of the prerendering app are cached, it means bots will always have an outdated version of the page. Thus, this implementation will only work if you are okay with this tradeoff. For MusiRoom, it was accepted that search engines index my pages even if they are a bit outdated because most of the content was static.

Now let’s build it!

Let’s build it!

In summary, we will need :

  • a prerendering app that queries the frontend and caches the result. We will use a simple Flask app with a file-based cache, but you can always make it more complex if you have other requirements.
  • a rule in the reverse proxy (webserver) to forward incoming requests from bots. We will use Nginx as a reverse proxy.

Prerendering App with flask + cache

import os

from flask import Flask
from flask_caching import Cache

from selenium import webdriver
from selenium.webdriver.firefox.options import Options

config = {
    "CACHE_TYPE": "FileSystemCache",  # Flask-Caching related configs
    "CACHE_DEFAULT_TIMEOUT": int(os.environ.get("CACHE_DEFAULT_TIMEOUT", "3600")),
    "CACHE_DIR": "/tmp",
}

app = Flask(__name__)
app.config.from_mapping(config)
cache = Cache(app)

BASE_URL = os.environ.get("BASE_URL")
DEFAULT_WAIT = int(os.environ.get("DEFAULT_WAIT", "2"))

@app.route("/", defaults={"path": ""})
@app.route("/<path:path>")
@cache.memoize()
def prerender(path):
    options = Options()
    options.headless = True
    driver = webdriver.Firefox(options=options)
    driver.implicitly_wait(DEFAULT_WAIT)
    driver.get(f"{BASE_URL}/{path}")
    html = driver.page_source
    driver.close()
    return html

if __name__ == "__main__":
    app.run()

This simple flask app responds to any endpoint and does two things :

  • first, queries the react app’s URL corresponding to the endpoint, using a headless browser (the “driver” variable). This headless browser works like a usual browser, and so will not only query the page but let the Javascript load itself. By calling driver.implicitly_wait, we ensure Javascript has had time to be loaded. Finally, the HTML corresponding to the fully loaded version is returned
  • by using cache.memoize as a route decorator, we store this result in our local cache, which will make the next calls to this endpoint much quicker. See Flask Caching’s documentation. In this example, the cache is stored in the filesystem. You might want to use a more robust solution such as Redis or Memcached in your production settings.

Selenium installation can be a bit tricky depending on your OS version. In the last section of this article, you will find the source code for this app, with a Dockerfile helping you install Selenium properly.

Nginx redirection

The server block for your Nginx configuration will look like this :

server {
    server_name mywebsite.com;

    location / {

        if ($http_user_agent ~* (google|yahoo|qwant|bing|yandex|facebook|linkedin|twitter|telegram|slackbot|seo) ) {
	   proxy_pass http://<prerendering_url>;
	   break;
	}	  

        proxy_pass http://<frontend_url>;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $host;
        proxy_redirect off;
    }
}

Here is what it means :

  • the if statement detects whether the user is a bot. Bots querying websites always have a special “User-Agent” header that makes them recognizable. Here, we detect the presence of some apps, like Google, Facebook, and LinkedIn… and forward the request to the prerendering app that we just created
  • in any other case (i.e a normal user), the request is forwarded to the React app

Final code

You can find the full code for this flask app at https://github.com/Jedyle/flask_spa_prerendering

Leave a Comment

Your email address will not be published. Required fields are marked *