当前位置：首页 > 学习笔记 > 正文内容

分布式系统完全指南：高可用架构设计的核心实践

廖万里3个月前 (03-17)学习笔记8

"分布式系统是由多台计算机组成的系统，这些计算机通过网络通信协作，对外呈现为单一系统。掌握分布式系统的原理和设计，是构建大规模高可用应用的必备技能。"

一、分布式系统基础理论

CAP理论

CAP理论指出，在分布式系统中，一致性（Consistency）、可用性（Availability）、分区容错性（Partition tolerance）三个特性最多只能同时满足两个。 一致性：所有节点在同一时间看到相同的数据 可用性：每个请求都能获得响应（成功或失败） 分区容错性：系统在网络分区情况下仍能继续运行由于网络分区在分布式系统中不可避免，因此实际设计时需要在一致性和可用性之间权衡。

BASE理论

BASE是对CAP理论中AP方案的延伸： - 基本可用（Basically Available）：系统出现故障时允许损失部分可用性 - 软状态（Soft State）：允许系统存在中间状态 - 最终一致性（Eventually Consistent）：经过一段时间后，所有副本达到一致

二、一致性协议

Paxos算法

Paxos是最经典的分布式一致性算法，由Leslie Lamport提出。算法分为Prepare和Accept两个阶段，通过提案编号保证最终一致性。

// Paxos Proposer伪代码
class Proposer {
    private int proposalNumber;
    
    public void propose(Object value) {
        // 阶段1：Prepare
        for (Acceptor acceptor : acceptors) {
            Promise promise = acceptor.prepare(proposalNumber);
            if (promise.hasAcceptedValue()) {
                value = promise.getAcceptedValue();
            }
        }
        
        // 阶段2：Accept
        int majority = acceptors.size() / 2 + 1;
        int acceptedCount = 0;
        for (Acceptor acceptor : acceptors) {
            if (acceptor.accept(proposalNumber, value)) {
                acceptedCount++;
            }
        }
        
        if (acceptedCount >= majority) {
            notifyLearners(value);
        }
    }
}

class Acceptor {
    private int promisedNumber = -1;
    private int acceptedNumber = -1;
    private Object acceptedValue = null;
    
    public synchronized Promise prepare(int proposalNumber) {
        if (proposalNumber > promisedNumber) {
            promisedNumber = proposalNumber;
            return new Promise(acceptedNumber, acceptedValue);
        }
        return new Promise(promisedNumber, acceptedValue);
    }
    
    public synchronized boolean accept(int proposalNumber, Object value) {
        if (proposalNumber >= promisedNumber) {
            promisedNumber = proposalNumber;
            acceptedNumber = proposalNumber;
            acceptedValue = value;
            return true;
        }
        return false;
    }
}

Raft算法

Raft是更易理解的一致性算法，通过领导者选举和日志复制实现一致性。

// Raft节点状态
type State int

const (
    Follower State = iota
    Candidate
    Leader
)

type Node struct {
    state       State
    currentTerm int
    votedFor    int
    log         []LogEntry
    
    // 选举超时
    electionTimeout  time.Duration
    lastHeartbeat    time.Time
    
    // 领导者相关
    nextIndex  []int
    matchIndex []int
}

// 领导者选举
func (n *Node) startElection() {
    n.state = Candidate
    n.currentTerm++
    n.votedFor = n.id
    
    votes := 1  // 投自己一票
    for _, peer := range n.peers {
        go func(peer *Peer) {
            resp := peer.RequestVote(n.currentTerm, n.id, len(n.log)-1, n.log[len(n.log)-1].Term)
            if resp.VoteGranted {
                votes++
                if votes > len(n.peers)/2 {
                    n.becomeLeader()
                }
            }
        }(peer)
    }
}

// 日志复制
func (n *Node) appendEntries(entry LogEntry) {
    n.log = append(n.log, entry)
    
    for i, peer := range n.peers {
        go func(idx int, p *Peer) {
            prevLogIndex := n.nextIndex[idx] - 1
            entries := n.log[n.nextIndex[idx]:]
            
            resp := p.AppendEntries(
                n.currentTerm,
                prevLogIndex,
                n.log[prevLogIndex].Term,
                entries,
                n.commitIndex,
            )
            
            if resp.Success {
                n.nextIndex[idx] += len(entries)
                n.matchIndex[idx] = n.nextIndex[idx] - 1
            } else {
                n.nextIndex[idx]--
            }
        }(i, peer)
    }
}

三、分布式ID生成

Snowflake算法

Twitter的Snowflake算法生成64位唯一ID，包含时间戳、机器ID和序列号。

public class SnowflakeIdGenerator {
    private final long twepoch = 1288834974657L;
    private final long workerIdBits = 5L;
    private final long datacenterIdBits = 5L;
    private final long sequenceBits = 12L;
    
    private final long maxWorkerId = -1L ^ (-1L << workerIdBits);
    private final long maxDatacenterId = -1L << datacenterIdBits;
    
    private final long workerIdShift = sequenceBits;
    private final long datacenterIdShift = sequenceBits + workerIdBits;
    private final long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits;
    private final long sequenceMask = -1L ^ (-1L << sequenceBits);
    
    private long workerId;
    private long datacenterId;
    private long sequence = 0L;
    private long lastTimestamp = -1L;
    
    public synchronized long nextId() {
        long timestamp = timeGen();
        
        if (timestamp < lastTimestamp) {
            throw new RuntimeException("时钟回拨");
        }
        
        if (lastTimestamp == timestamp) {
            sequence = (sequence + 1) & sequenceMask;
            if (sequence == 0) {
                timestamp = tilNextMillis(lastTimestamp);
            }
        } else {
            sequence = 0L;
        }
        
        lastTimestamp = timestamp;
        
        return ((timestamp - twepoch) << timestampLeftShift)
            | (datacenterId << datacenterIdShift)
            | (workerId << workerIdShift)
            | sequence;
    }
    
    private long tilNextMillis(long lastTimestamp) {
        long timestamp = timeGen();
        while (timestamp <= lastTimestamp) {
            timestamp = timeGen();
        }
        return timestamp;
    }
}

四、分布式锁

Redis分布式锁

public class RedisDistributedLock {
    private final RedisTemplate redisTemplate;
    private final String lockKey;
    private final String lockValue;
    private final long expireTime;
    
    public boolean tryLock() {
        String script = 
            "if redis.call('setnx', KEYS[1], ARGV[1]) == 1 then " +
            "  redis.call('expire', KEYS[1], ARGV[2]) " +
            "  return 1 " +
            "end " +
            "return 0";
        
        DefaultRedisScript redisScript = new DefaultRedisScript<>(script, Long.class);
        Long result = redisTemplate.execute(redisScript, 
            Collections.singletonList(lockKey), 
            lockValue, 
            String.valueOf(expireTime));
        
        return result != null && result == 1;
    }
    
    public void unlock() {
        // 使用Lua脚本保证原子性
        String script = 
            "if redis.call('get', KEYS[1]) == ARGV[1] then " +
            "  return redis.call('del', KEYS[1]) " +
            "end " +
            "return 0";
        
        redisTemplate.execute(new DefaultRedisScript<>(script, Long.class), 
            Collections.singletonList(lockKey), lockValue);
    }
}

// 使用示例
public void processWithLock() {
    RedisDistributedLock lock = new RedisDistributedLock(redisTemplate, "order:123", UUID.randomUUID().toString(), 30);
    
    if (lock.tryLock()) {
        try {
            // 执行业务逻辑
            processOrder();
        } finally {
            lock.unlock();
        }
    }
}

Zookeeper分布式锁

public class ZookeeperDistributedLock implements Watcher {
    private ZooKeeper zk;
    private String lockPath;
    private String currentLockPath;
    
    public void lock() throws Exception {
        // 创建临时顺序节点
        currentLockPath = zk.create(lockPath + "/lock-", 
            new byte[0], 
            ZooDefs.Ids.OPEN_ACL_UNSAFE, 
            CreateMode.EPHEMERAL_SEQUENTIAL);
        
        // 获取所有子节点
        List children = zk.getChildren(lockPath, false);
        Collections.sort(children);
        
        // 判断是否是最小节点
        String currentNode = currentLockPath.substring(lockPath.length() + 1);
        int currentIndex = children.indexOf(currentNode);
        
        if (currentIndex == 0) {
            return;  // 获得锁
        }
        
        // 监听前一个节点
        String prevNode = lockPath + "/" + children.get(currentIndex - 1);
        final CountDownLatch latch = new CountDownLatch(1);
        
        Stat stat = zk.exists(prevNode, event -> {
            if (event.getType() == Event.EventType.NodeDeleted) {
                latch.countDown();
            }
        });
        
        if (stat != null) {
            latch.await();
        }
    }
    
    public void unlock() throws Exception {
        zk.delete(currentLockPath, -1);
    }
}

五、负载均衡策略

一致性哈希

public class ConsistentHash {
    private final TreeMap ring = new TreeMap<>();
    private final int virtualNodes;
    private final HashFunction hashFunction;
    
    public void addNode(T node) {
        for (int i = 0; i < virtualNodes; i++) {
            int hash = hashFunction.hash(node.toString() + ":" + i);
            ring.put(hash, node);
        }
    }
    
    public void removeNode(T node) {
        for (int i = 0; i < virtualNodes; i++) {
            int hash = hashFunction.hash(node.toString() + ":" + i);
            ring.remove(hash);
        }
    }
    
    public T getNode(String key) {
        if (ring.isEmpty()) {
            return null;
        }
        
        int hash = hashFunction.hash(key);
        
        // 找到第一个大于等于hash的节点
        Map.Entry entry = ring.ceilingEntry(hash);
        
        if (entry == null) {
            // 环形，回到第一个节点
            entry = ring.firstEntry();
        }
        
        return entry.getValue();
    }
}

六、分布式事务

两阶段提交（2PC）

public class TwoPhaseCommitCoordinator {
    private List participants;
    
    public boolean commit(Transaction transaction) {
        // 阶段1：准备
        for (Participant participant : participants) {
            if (!participant.prepare(transaction)) {
                // 任意参与者失败，回滚所有
                rollback(transaction);
                return false;
            }
        }
        
        // 阶段2：提交
        for (Participant participant : participants) {
            participant.commit(transaction);
        }
        
        return true;
    }
    
    private void rollback(Transaction transaction) {
        for (Participant participant : participants) {
            try {
                participant.rollback(transaction);
            } catch (Exception e) {
                log.error("Rollback failed", e);
            }
        }
    }
}

总结

分布式系统设计需要在CAP理论指导下做出合理权衡。掌握一致性协议（Paxos/Raft）、分布式ID生成、分布式锁、负载均衡等核心技术，是构建高可用分布式系统的基础。实践中要注意： 1. 网络分区不可避免，设计要考虑故障场景 2. 选择合适的一致性级别，平衡性能和正确性 3. 实施完善的监控和告警机制 4. 设计幂等接口，处理重复请求 5. 制定故障恢复预案，定期演练

本文链接：https://www.kkkliao.cn/?id=850 转载需授权！

分享到：