With such almost-astronomical traffic, one can’t help but wonder how WhatsApp works - its system design, server architecture, technology . How does it handle so many concurrent users and messages? What kind of frameworks and programming languages enable that kind of scale? How do they keep all that data secure? So many questions!
In this article, we are going to take a deep dive into WhatsApp’s architecture and system design. We’ll answer all the above-mentioned questions and more. If you’ve ever wondered about the top dog in the chat app world, keep reading.
Disclaimer: We scoured the internet to collect every resource on WhatsApp architecture design and have compiled and summarized it here. To the best of our knowledge, this information is accurate. However, as companies do update their tech stack frequently, this information is subject to change.
WhatsApp Front-End Tech Stack
Let’s start with the frontend and work our way to the hardware on the backend.
The first part of the WhatsApp system design that a user interacts with is the mobile or web app. WhatsApp supports nearly all platforms. It has an iOS app, Android app, desktop app, web app, and Windows Phone app. Up until 2017, you could even use WhatsApp on a BlackBerry.
With so many supported platforms, you may have guessed that WhatsApp would be a hybrid app. But, in fact, it’s not. They actually built a native app for each platform. Here's a list of all the supported platforms with the front-end language(s) that were used to build each one:
Windows Phone: C#
Mac Desktop app: Swift/Objective-C
PC Desktop app: C/C#/Java
How WhatsApp Stores Chat Locally
In addition to the programming language itself, another important technology that WhatsApp uses on the frontend is an SQLite database. SQLite is a stand-alone, self-contained, relational database that is meant to be embedded into applications—which means it lives on your device. WhatsApp uses it to store conversations. Since it would be a waste of resources to download all the messages from the cloud every time you open the app, WhatsApp chooses to store the messages locally. In fact, WhatsApp only stores messages until they are received at which point they get removed.
Which Messaging Protocols Does WhatsApp Use?
WhatsApp uses a highly modified version of XMPP on an Ejabberd server (more on that later) to communicate with the clients.
The XMPP on the client opens an SSL socket to the WhatsApp servers. All the sent messages are queued on the servers until the client opens or reconnects to this socket to retrieve the messages. Once a message is successfully retrieved by the client, a success status is sent back to the WhatsApp server. The server then forwards this status to the original sender; letting them know that the message was received by adding the “checkmark” icon next to the successfully sent message.
Keep in mind that, while XMPP is one of the most popular messaging protocols for chat apps, it is definitely not the only option for choosing a messaging protocol.
WhatsApp Encryption Technology
WhatsApp uses end-to-end encryption. Ideally, this means that only the original sender and the true recipient of the message can read the message in plain text.
When you send a message, it gets encrypted using a specific encryption protocol (more on that next). WhatsApp then stores this encrypted message on their servers until it’s delivered to the recipient. Upon delivery, the recipient's device decrypts the message back into a readable, plaintext message using a unique cryptographic key. Across this entire process, WhatsApp never knows the content of your message.
WhatsApp’s encryption technology is called Signal Encryption Protocol, which was developed by Open System Whispers to be a modern, open-source, strong encryption protocol for asynchronous messaging systems.
To the best of our knowledge, the current WhatsApp back-end system design looks like this:
Erlang is the main programming language
FreeBSD is the operating system
Ejabberd is the XMPP application server
BEAM is the Erlang-based virtual machine
Mnesia is their Erlang-based database
YAWS is their multimedia web server
Let’s explore some of the more interesting aspects of WhatsApp’s back-end architecture:
WhatsApp's choice of programming language is in large part what allows it to work on such a colossal scale.
Erlang is a functional programming language that is oriented towards building concurrent, scalable, and reliable systems. It uses a process-based model called the “actor model” in which small, isolated processes communicate with each other through messages. These processes can create new processes, send messages and modify their state in response to receiving messages.
Its process-based property gives Erlang its extremely high concurrency, scalability, and reliability.
These processes can also communicate with processes outside of the core on which it runs. This makes it easy to scale the system horizontally (by adding more machines) or vertically (by adding more cores). Lastly, since the processes can communicate with each other and, more importantly, restart each other, it’s easy to build self-healing systems. If a bug crashes a process, another process can restart it.
An interesting technical choice by WhatsApp's founders was picking FreeBSD as an operating system instead of a more widely used system (like Linux).
“Linux is a beast of complexity. FreeBSD has the advantage of being a single distribution with an extraordinarily good ports collection.”
Also, when it comes to raw performance, especially in regards to system load per packet, no other operating system can beat FreeBSD.
However, when it comes down to it, the real reason that they decided to use FreeBSD is probably because both co-founders had a long history of working with it at Yahoo!.
Ejabberd is an open-source XMPP server that is written in Erlang. WhatsApp uses a modified version of XMPP as its protocol for handling message delivery. Even the Ejabberd server that WhatsApp uses is heavily customized to optimize for server performance.
What’s the purpose of Ejabberd?
Well, it handles the message routing, deliverability, and general instant messaging aspects of the app. Features of Ejabberd include:
Storing and forwarding offline messages
Contact list and presence
To store data and temporary messages, WhatsApp uses an Erlang-based, distributed DBMS (Database Management System) called Mnesia. This DBMS provides benefits that many traditional databases don’t such as:
Real-time key/value lookup
High fault tolerance
Mnesia is also the only DBMS that’s written in Erlang. This in itself is a benefit because there are no data structure differences between Erlang in the application and Erlang in the DBMS. Coding is, therefore, quicker and more explicit.
BEAM, short for “Bogdan’s Erlang Abstract Machine”, is a virtual machine that compiles and executes Erlang source code. The BEAM is designed specifically for highly concurrent applications - perfect for WhatsApp’s use case. BEAM’s secret sauce is light-weight processes that don’t share memory and are managed by schedulers. These schedulers can manage millions of processes across multiple cores. This makes BEAM highly scalable and resistant to failures, such as those caused by high traffic loads, system updates, and network outages.
BEAM is so crucial to the WhatsApp system design that the WhatsApp team has published many patches and fixes to the core source code.
YAWS (Yet Another Web Server) is an Erlang-based web server that's ideal for dynamic content. WhatsApp uses YAWS for storing multimedia data. YAWS itself uses HTML5 WebSockets that simplify two-way communication by establishing a reliable and fast connection between the server and the app. Through the use of this technology, WhatsApp is able to send and receive multimedia data across billions of devices—in near real time.
WhatsApp Hardware Components
In 2017, four years after being acquired by Facebook, WhatsApp was taken off of IBM SoftLayer’s cloud and brought into Facebook’s proprietary data centers.
What we do know is that in 2014 WhatsApp required around 550 servers and over 11,000 cores that ran Erlang. We also know that WhatsApp’s user base was "only" around half a billion in 2014 compared to the more than 2 billion users it reached in 2020. So, with that data in mind, we'll let you imagine how many servers and cores WhatsApp now requires. We imagine it's a lot.
WhatsApp Architecture Diagram
The easiest way to get a full understanding of WhatsApp’s architecture design is, of course, through a WhatsApp architecture diagram.
Starting from the left side we have multiple different clients (mobile and web apps), each of which hosts a local SQLite database for storing conversations.
The clients use HTTP WebSockets to send and retrieve multimedia data like images and videos from the YAWS web server. But, as you can see, XMPP is used to actually send those files and other messages to other users.
When an XMPP message is sent, it goes through the series of steps depicted above. First, it gets sent to WhatsApp’s custom Ejabberd server which runs on BEAM and FreeBSD. The Ejabberd server saves the message in a Mnesia database table where it gets put into a queue. When the receiving user opens the app, thereby reconnecting to the socket, the message in the queue gets routed through the Ejabberd server and delivered to the recipient. Once successful delivery can be confirmed, the message gets deleted from the Mnesia database.
While we don’t know the exact specifications of WhatsApp’s technical architecture and system design, we can get a good idea based on the technologies that WhatsApp employs. We hope this article, exploring the WhatsApp architecture design, has answered your burning questions. Now that you've gained an understanding of how the WhatsApp server works, learned what the WhatsApp tech stack looks like, and even scanned a WhatsApp architecture diagram...maybe you're feeling empowered to take on a chat app project of your own.
Cosette Cressler is a passionate content marketer specializing in SaaS, technology, careers, productivity, entrepreneurship and self-development. She helps grow businesses of all sizes by creating consistent, digestible content that captures attention and drives action.