# Raft **Repository Path**: SavenNeer/raft ## Basic Information - **Project Name**: Raft - **Description**: A Fault-Tolerant Key/Value Storage System. - **Primary Language**: Unknown - **License**: GPL-3.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2023-02-01 - **Last Updated**: 2024-07-06 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Raft Design Notes ## What is Raft ? * 可容错的键值存储系统(A fault-tolerant key/value storage system) * 复制状态机协议(A replicated state machine protocol) * A replicated service achieves fault tolerance by storing complete copies of its state (i.e., data) on multiple replica servers. 在多个副本服务器上,一个副本服务是通过存储其状态的完全拷贝副本实现的 ## Design - 2A ### Goal 2A Task: Implement Raft by adding code to ``raft/raft.go`` * Implement Raft leader election and heartbeats (AppendEntries RPCs with no log entries). * The goal for Part 2A is for a single leader to be elected, for the leader to remain the leader if there are no failures, and for a new leader to take over if the old leader fails or if packets to/from the old leader are lost. Lab-2A部分的目标是选举出单个领导人leader.如果选举成功则leader正常工作.如果选举失败或者旧leader的网络通信出现错误,新的leader应当接管. * Run go test -run 2A to test your 2A code. ### Hint For 2A * You can't easily run your Raft implementation directly; instead you should run it by way of the tester, i.e. go test -run 2A. 你不能简单直接地运行你的raft程序,而是需要通过tester程序提供的(网络底层)程序测试.例如,你可以运行``go test -run 2A`` * Follow the paper's Figure 2. At this point you care about sending and receiving RequestVote RPCs, the Rules for Servers that relate to elections, and the State related to leader election, 你需要参考论文中的Figure-2图.此时,你需要关心的是:Vote的发送与接收的RPC功能,服务器选举相关的规则,以及leader选举的相关状态. * Add the Figure 2 state for leader election to the Raft struct in raft.go. You'll also need to define a struct to hold information about each log entry. 将Figure-2中有关leader选举的信息添加到Raft结构体中.你也需要定义一个结构体用来保存有关每个log entry的信息. * Fill in the RequestVoteArgs and RequestVoteReply structs. Modify Make() to create a background goroutine that will kick off leader election periodically by sending out RequestVote RPCs when it hasn't heard from another peer for a while. This way a peer will learn who is the leader, if there is already a leader, or become the leader itself. Implement the RequestVote() RPC handler so that servers will vote for one another. 填充``RequestVoteArgs``和``RequestVoteReply``两个结构体.修改``Make()``函数以创建一个后台goroutine,用于一段时间没有收到其他节点的消息时,通过发送RequestVote RPC定期启动领导人选举。 * To implement heartbeats, define an AppendEntries RPC struct (though you may not need all the arguments yet), and have the leader send them out periodically. Write an AppendEntries RPC handler method that resets the election timeout so that other servers don't step forward as leaders when one has already been elected. 为了实现心跳(机制),请定义一个AppendEntries RPC结构(尽管您可能还不需要所有参数),并让leader定期发送它们.编写一个AppendEntries RPC处理方法,该方法可以重置选举timeout,以便其他服务器在已有服务器当选的情况下不会作为leader继续运行 * Make sure the election timeouts in different peers don't always fire at the same time, or else all peers will vote only for themselves and no one will become the leader. The tester requires that the leader send heartbeat RPCs no more than ten times per second. 确保选举timeout在不同的节点上不会总是同时启动,或者其它所有的节点都只给它们自己投票而导致没有节点成为leader.tester要求leader每秒发送的心跳RPC不超过10次 * The tester requires your Raft to elect a new leader within five seconds of the failure of the old leader (if a majority of peers can still communicate). Remember, however, that leader election may require multiple rounds in case of a split vote (which can happen if packets are lost or if candidates unluckily choose the same random backoff times). You must pick election timeouts (and thus heartbeat intervals) that are short enough that it's very likely that an election will complete in less than five seconds even if it requires multiple rounds. 测试程序要求你的Raft必须在旧leader出错后的5秒内选举出一个新的leader(如果绝大多数的节点都能够通讯的情况下).然而,请记住,在投票分裂的情况下,leader选举可能需要多轮投票(这种情况可能发生在网络包丢失或者candidates不幸地随机到了两个相同的backoff时间).你必须选择足够短的选举timeout时间(以及心跳间隔时间),在即使需要多轮投票的前提下,选举能够在5秒内完成. * The paper's Section 5.2 mentions election timeouts in the range of 150 to 300 milliseconds. Such a range only makes sense if the leader sends heartbeats considerably more often than once per 150 milliseconds. Because the tester limits you to 10 heartbeats per second, you will have to use an election timeout larger than the paper's 150 to 300 milliseconds, but not too large, because then you may fail to elect a leader within five seconds. 论文的5.2部分提到了选举timeout需要控制在150-300ms之间.只有当leader发送心跳的频率远高于每150毫秒一次时,这样的范围才有意义.考虑到测试仪限制你每秒心跳10次,所以你必须使用大于论文150到300毫秒的选举timeout时间,但不要太大,因为这样你可能无法在5秒内选出一位领导人. * You may find Go's rand useful. 你可能会注意到Go的rand随机包很有用. * You'll need to write code that takes actions periodically or after delays in time. The easiest way to do this is to create a goroutine with a loop that calls time.Sleep(). Don't use Go's time.Timer or time.Ticker, which are difficult to use correctly. 你需要编写代码,以便定期或在延迟后执行操作.最简单的方法是创建一个循环调用``time.Sleep()``的goroutine.不要使用Go的``time.Timer``或``time.Ticker``,它们很难被正确使用. * Read this advice about locking and structure.If your code has trouble passing the tests, read the paper's Figure 2 again; the full logic for leader election is spread over multiple parts of the figure. 请阅读``raft-locking.txt``和``raft-structure.txt``中的相关建议.如果你的代码在通过测试的时候出现问题,请再次阅读论文的Figure-2的内容.leader选举的全部逻辑分布在图的多个部分. * Don't forget to implement GetState(). 不要忘记实现``GetState()``函数 * The tester calls your Raft's rf.Kill() when it is permanently shutting down an instance. You can check whether Kill() has been called using rf.killed(). You may want to do this in all loops, to avoid having dead Raft instances print confusing messages. 测试程序在永久关闭实例时调用Raft的``rf.Kill()``.你可以使用``rf.killed()``检查是否调用了``Kill()``.你可能希望在所有循环中都这样做,以避免已经消除的Raft实例打印出令人困惑的消息. * A good way to debug your code is to insert print statements when a peer sends or receives a message, and collect the output in a file with go test -run 2A > out. Then, by studying the trace of messages in the out file, you can identify where your implementation deviates from the desired protocol. You might find DPrintf in util.go useful to turn printing on and off as you debug different problems. 调试代码的一个好方法是在节点发送或接收消息时插入print语句,并使用``go test -run 2A > out``将输出收集到文件中.然后,通过研究输出文件中的trace跟踪消息,你可以定位与期望协议存在出入的位置.您可能会发现``util.go``中的``DPrintf``函数,它在调试不同问题时打开和关闭print功能非常有用 * Go RPC sends only struct fields whose names start with capital letters. Sub-structures must also have capitalized field names (e.g. fields of log records in an array). The labgob package will warn you about this; don't ignore the warnings. Go的RPC只发送名称以大写字母开头的结构字段.子结构也必须具有大写的字段名称(例如数组中的日志记录字段).labgob包会警告你这一点;不要忽视警告. * Check your code with go test -race, and fix any races it reports. 使用``go test -race``检查代码,并修复它报告的任何竞争错误.