Downtime happens 😓 How we cope outages.

published on 27 September 2021

Downtime is harmful for any SaaS. Especially when a SaaS is responsible for public data, such as website.

Unicorn Platform hosts your website. It is critically important for us to keep your website uptime 100%. If you can not be sure your website is always accessible, how can you trust us? 

Unfortunately a 100% uptime is hardly reachable. Downtimes happen.

This post will tell you about the main reasons of downtimes, why this happens and what we do to minimalize downtime of your website.

DDoS

We are regularly DDoSed.

DDoS (distributed denial-of-service) is a flooding attack. Wanna-be-Neo sends thousands requests per second and overloads the server. The unpleasant thing about DDoS is that it happens from distributed sources. Hundreds of IPs worldwide send flooding request and it is hard to block them.

Perhaps attackers DDoS us for fun or to test our servers' stability.

There are numerous ways we protect our servers from DDoS. I will not disclose all our tricks because it will make a DDoS attack more productive.

  • The "Do not store all eggs in one bucket" rule helps us to segment the DDoS attacks. We serve clients' websites from a growing network of independent servers. So when one website is being attacked, only a portion of all Unicorn Platform websites is affected. We add a separate server per each new ~200 websites.
  • We have an easy way to manually find a pattern of an attack. Each DDoS has a pattern. We have a simple tool to be able quickly adapt the new attack pattern and filter out unwanted requests.
  • Cloudflare is an efficient tool to filter out DDoS requests and we often use it, but we do not completely rely on it.

Internal reasons

Often an outage is consequence of a human mistake.

Since Unicorn Platform was launched in 2018, we had hundreds of releases. And some of the releases made your website down.

Each time we failed, we learned something new. Each time we learned, we either updated the infrastructure or our internal rules to improve the stability:

  • Only one person in the company (Alex, the CEO) has access to the "deploy" button.
  • We have a test server - a complete copy of the production server. Each update includes a testing stage on the clone. The identity of the server environment helps to spot and fix potential bugs.
  • A manual post-deploy testing process. We have a list of live websites to test after the deploy to visually see if anything went wrong.
  • Side tasks (such as making screenshots of pages) were moved to a separate server.
  • We implemented an automatic error reporting tool. Each error - even non-critical - is immediately researched. We believe a small error may become big, so we fix it before anything bad arise.

Underhood updates

We are constantly improving the stability of our infrastructure. This includes not only automatic tests, pre-checks and post-checks but also error notification systems and manual checks process.

These underhood works is a common reason why we delay the release of the product updates. We believe in a steady constant growth. A steady growth is not possible without continuous working on the stability.

Should I trust Unicorn Platform?

We will be honest with you as always. Unicorn Platform is less stable than the major players such as Webflow or SquareSpace. We have less experience in managing attacks and less DevOps expertise in our team.

Before I started Unicorn Platform I knew there will be challenges. One of the biggest is providing an excellent stability of the service.

Since day 1 the top consultant in the niche - Kostja - is working with me. Kostja is a founder of the DevOps-as-a-service company named Appliku. He helps me to design the servers infrastructure effectively, predict the most common problems and solve potential future challenges.

I believe with the knowledge from the mistakes I made and with the help of the consultant we will boost Unicorn Platform uptime from 99.85% to 99.99% in the upcoming year 2022! 🙂

Read more

Runs on Unicorn Platform