1.1 Introduction
Raft is a consensus algorithm for managing a replicated log. It produces a result equivalent to Paxos and is as efficient, but its structure is quite different. Raft was designed to provide a better understandability and a foundation for building practical systems. To achieve that, Raft separates the key elements of consensus (leader election, og replication, safety) to reduce the number of possible states.
Raft is similar in many ways to existing consensus algorithms, but it has several novel features:
- Strong leader: Raft uses a stronger form of leadership than other consensus algorithms, which simplifies the replication log management and makes Raft more understandable.
- Leader election: Raft applies randomized timers to the heartbeat process to achieve leader election. This technique adds a small amount of overhead in exchange for rapid and simple conflict resolution.
- Reconfiguration: In order for the cluster to continue operating normally during the membership changes, Raft uses a joint consensus approach where the majorities of two different configurations overlap during transitions.
1.2 Consensus algorithm
Raft implements consensus by first electing a distinguished leader. The leader accepts log entries from clients, while having complete responsibility for managing the replicated log. Given the leader approach, Raft decomposes the consensus problem into three relatively independent sub-problems (leader election, log replication, safety) which will be discussed subsequently.
A raft cluster which contains n servers, can tolerate up to n-1⁄2, where n is typically odd. At any given time each node is in one of three states: leader, follower, or candidate. Under normal circumstances there is only one leader and all of the other servers are followers, which redirect client requests to the leader. Followers issue no requests on their own but simply respond to requests from leaders and candidates. The candidate state is used to elect a new leader which will be described in the following sections.
Raft divides time into terms of arbitrary length, which is equal to Paxos epochs. Terms are numbered with consecutive integers. Each term begins with a leader election. If a candidate wins the election, then it serves as leader for the rest of the term. In the case of a split vote between candidates, the term will end with no leader, as a result, a new term (with a new election) will follow. Raft ensures that there is at most one leader in a given term. Each server stores a current term number, which increases over time. If a candidate or leader discovers that its term is out of date, it immediately reverts to follower state.

1.3 Leader Election
Upon system initiation, servers begin as followers. Leaders send periodic heartbeats to all followers in order to maintain their authority. If a follower receives no communication over a period of time, called the election timeout, then it assumes there is no viable leader and begins an election to choose a new leader. To begin an election, a follower increments its current term and transitions to candidate state. It then votes for itself and requests votes from the other servers in the cluster. This might lead to three possible scenarios.
- The candidate wins the election: A candidate becomes leader if it receives votes from a majority of the servers. Each server will vote for the candidate whose request was received first. The majority rule ensures that at most one candidate can win the election for a particular term. Once a candidate wins an election, it becomes leader and then sends heartbeat messages to all other servers to establish its authority and prevent new elections.
- Another server establishes itself as leader: While waiting for votes, a candidate may receive a message from another server claiming to be leader. If the leader's term is larger or equal to the candidate's current term, then the candidate recognizes the server as the legitimate leader and then returns to follower state. Otherwise, the candidate rejects the message and continues in candidate state.
- A period of time goes by with no winner:If many followers become candidates at the same time, it is possible that no candidate obtains a majority (split vote). In this case, each candidate will time out and start a new election by incrementing its term. To prevent the indefinite repetition of split voting, Raft uses randomized timeouts chosen from a time interval.
1.4 Log replication
The leader appends the client's command to its log as a new entry, then sends it to the other servers to replicate the entry. When the entry has been safely replicated (a majority of servers have replicated it), the leader applies the entry to its state machine and returns the result of that execution to the client. If followers crash or run slowly, the leader retries to send the command indefinitely until all followers eventually store all log entries. The leader makes a decision whether it is safe to apply a log entry to the state machines (committed). Raft satisfies the following properties, which together constitute the Log Matching Property:
- If two entries in different logs have the same index and term, then they store the same command.
- If two entries in different logs have the same index and term, then the logs are identical in all preceding entries.

The leader handles inconsistencies by forcing the followers' logs to duplicate its own. This means that conflicting entries in follower logs will be overwritten with entries from the leader's log. The leader maintains a nextIndex for each follower, which is the index of the next log entry the leader will send to that follower.
As the log increases in size, it occupies more space and takes more time to replay. Therefore it needs to discard unnecessary information. In order to achieve this Raft uses a simple snapshotting technique.
1.5 Safety
The Raft algorithm restricts which servers may be elected as leader. The restriction ensures that the leader for any given term contains all of the entries committed in previous terms.
Raft uses a simple approach which guarantees that all the committed entries from previous terms are present on each new leader from the moment of its election, without the need to transfer those entries to the leader. This means that log entries only flow in one direction, from leaders to followers, and leaders never overwrite existing entries in their logs.
If a leader crashes before committing an entry, future leaders will attempt to finish replicating the entry. However, a leader cannot immediately conclude that an entry from a previous term is committed, once it is stored on a majority of servers.

1.6 Reconfiguration
Raft was designed to make configuration changes online, without shutting down the whole system. To ensure safety, Raft uses a two-phase approach for reconfiguration. In the first step the system switches to a transitional configuration called joint consensus. Once the joint consensus has been committed, the system then transitions to the new configuration. The joint consensus combines both the old and new configurations:- Log entries are replicated to all servers in both configurations.
- Any server from either configuration may serve as leader.
- Agreement (for elections and entry commitment) requires separate majorities from both the old and new configurations.

The aforementioned properties allow the system to service clients' requests while performing reconfiguration. This approach has three main issues.
- New servers may not initially store any log entries. If they are added in this state, it could take a while for them to catch up, which could lead to a period where committing new log entries might not be possible. In order to avoid that, Raft introduces an additional phase before the configuration change, in which the new servers join the cluster as non-voting members.
- The leader may not be part of the new configuration. In this case, the leader returns to follower state once it has committed the Cnew log entry. This means that there will be a period of time when the leader is managing a cluster of servers that does not include itself. It replicates log entries but excludes itself from majorities. The leader transition occurs when Cnew is committed.
- Removed servers can disrupt the cluster. These servers will not receive heartbeats, so they will time out and start new elections. They will then become candidates with new term numbers, and this will cause the current leader to revert to follower state. This will cause constant leader elections, resulting in poor availability. To prevent this problem, servers ignore the vote requests when they assume a viable leader.