⭐⭐⭐ Spring Boot 项目实战	⭐⭐⭐ Spring Cloud 项目实战
《Dubbo 实现原理与源码解析 —— 精品合集》	《Netty 实现原理与源码解析 —— 精品合集》
《Spring 实现原理与源码解析 —— 精品合集》	《MyBatis 实现原理与源码解析 —— 精品合集》
《Spring MVC 实现原理与源码解析 —— 精品合集》	《数据库实体设计合集》
《Spring Boot 实现原理与源码解析 —— 精品合集》	《Java 面试题 + Java 学习指南》

摘要: 原创出处 https://mp.weixin.qq.com/s/ZImQievDPLAfOv26HZOiQg 「渣渣王子」欢迎转载，保留摘要，谢谢！

🙂🙂🙂关注**微信公众号：【芋道源码】**有福利：

RocketMQ / MyCAT / Sharding-JDBC 所有源码分析文章列表

RocketMQ / MyCAT / Sharding-JDBC 中文注释源码 GitHub 地址

您对于源码的疑问每条留言都将得到认真回复。甚至不知道如何读源码也可以请教噢。

新的源码解析文章实时收到通知。每周更新一篇左右。

认真的源码交流微信群。

本文是【追光者系列】HikariCP诡异问题拨开迷雾推理破案实录（上）的下半部分

直奔疑点

首先我直接怀疑是不是数据库断掉了连接，于是登上了数据库查询了数据库参数，结果是没毛病。为了稳妥起见，我又咨询了DBA，DBA说线下环境和线上环境参数一样，并帮我看了，没有任何问题。

为什么我直接奔向这个疑点，因为《Solr权威指南》的作者兰小伟大佬曾经和我分享过一个案例：他前段时间也遇到类似的问题，他用的是c3p0，碰到和我一样的碰到一样的exception，那个异常是服务器主动关闭了链接，而客户端还拿那个链接去操作，大佬加个testQuery，保持心跳即可解决。c3p0设置一个周期，是定时主动检测的。估计是mysql服务器端链接缓存时间设置的太短，服务器端主动销毁了链接，一般做法是客户端连接池配置testQuery。 testQuery我觉得比较影响hikariCP的性能，所以我决定跟一下源码了解一下原理并定位问题。

拨开迷雾

按着上一篇的推测，我们在Hikari核心源码中打一些日志记录。

第一处，在com.zaxxer.hikari.pool.HikariPool#getConnection中增加IDEA log expression

public Connection getConnection(final long hardTimeout) throws SQLException
 {
    suspendResumeLock.acquire();
    final long startTime = currentTime();
    try {
       long timeout = hardTimeout;
       PoolEntry poolEntry = null;
       try {
          do {
             poolEntry = connectionBag.borrow(timeout, MILLISECONDS);
             if (poolEntry == null) {
                break; // We timed out... break and throw exception
             }
             final long now = currentTime();
             if (poolEntry.isMarkedEvicted() || (elapsedMillis(poolEntry.lastAccessed, now) > ALIVE_BYPASS_WINDOW_MS && !isConnectionAlive(poolEntry.connection))) {
                closeConnection(poolEntry, "(connection is evicted or dead)"); // Throw away the dead connection (passed max age or failed alive test)
                timeout = hardTimeout - elapsedMillis(startTime);
             }
             else {
                metricsTracker.recordBorrowStats(poolEntry, startTime);
                return poolEntry.createProxyConnection(leakTask.schedule(poolEntry), now);
             }
          } while (timeout > 0L);
          metricsTracker.recordBorrowTimeoutStats(startTime);
       }
       catch (InterruptedException e) {
          if (poolEntry != null) {
             poolEntry.recycle(startTime);
          }
          Thread.currentThread().interrupt();
          throw new SQLException(poolName + " - Interrupted during connection acquisition", e);
       }
    }
    finally {
       suspendResumeLock.release();
    }
    throw createTimeoutException(startTime);
 }

我们对于直接抛异常的代码的条件判断入口处增加调试信息

if (poolEntry.isMarkedEvicted() || (elapsedMillis(poolEntry.lastAccessed, now) > ALIVE_BYPASS_WINDOW_MS && !isConnectionAlive(poolEntry.connection)))

String.format("Evicted: %s; enough time elapse: %s;poolEntrt: %s;", poolEntry.isMarkedEvicted(), elapsedMillis(poolEntry.lastAccessed, now) > ALIVE_BYPASS_WINDOW_MS,poolEntry);

第二处、在com.zaxxer.hikari.pool.HikariPool#softEvictConnection处增加调试信息

private boolean softEvictConnection(final PoolEntry poolEntry, final String reason, final boolean owner)
   {
      poolEntry.markEvicted();
      if (owner || connectionBag.reserve(poolEntry)) {
         closeConnection(poolEntry, reason);
         return true;
      }
      return false;
   }

String.format("Scheduled soft eviction for connection %s is due; owner %s is;", poolEntry.connection,owner)

为什么要打在softEvictConnection这里呢？因为在createPoolEntry的时候注册了一个延时任务，并通过poolEntry.setFutureEol设置到poolEntry中 softEvictConnection，首先标记markEvicted。然后如果是用户自己调用的，则直接关闭连接；如果从connectionBag中标记不可borrow成功，则关闭连接。这个定时任务是在每次createPoolEntry的时候，根据maxLifetime随机设定一个variance，在maxLifetime - variance之后触发evict。

/**
 * Creating new poolEntry.
 */
private PoolEntry createPoolEntry()
{
   try {
      final PoolEntry poolEntry = newPoolEntry();
      final long maxLifetime = config.getMaxLifetime();
      if (maxLifetime > 0) {
         // variance up to 2.5% of the maxlifetime
         final long variance = maxLifetime > 10_000 ? ThreadLocalRandom.current().nextLong( maxLifetime / 40 ) : 0;
         final long lifetime = maxLifetime - variance;
         poolEntry.setFutureEol(houseKeepingExecutorService.schedule(
            () -> {
               if (softEvictConnection(poolEntry, "(connection has passed maxLifetime)", false /* not owner */)) {
                  addBagItem(connectionBag.getWaitingThreadCount());
               }
            },
            lifetime, MILLISECONDS));
      }
      return poolEntry;
   }
   catch (Exception e) {
      if (poolState == POOL_NORMAL) {
         LOGGER.debug("{} - Cannot acquire connection from data source", poolName, (e instanceof ConnectionSetupException ? e.getCause() : e));
      }
      return null;
   }
}

守株待兔

做了如上处理之后我们就安心得等结果

很快来了一组数据，我们可以看到确实poolEntry.isMarkedEvicted()一直都是false,(elapsedMillis(poolEntry.lastAccessed, now) > ALIVE_BYPASS_WINDOW_MS这个判断为true。

sharedlist 是 CopyOnWriteArrayList，每次getConnection都是数组首

softEvictConnection这里的信息在20分钟到了的时候也出现了

从这张图我们可以看到，owner是false，第一个触发定时任务的也正好是第一个连接。删除的那个就是每次微服务健康检测healthcheck连接用的那个。

我仔细数了一下，确实一共创建了十次SSL，也就是本次周期确实重新连了十次数据库TCP连接。那么问题来了，为什么每次正好是8次或者9次异常日志？

抽丝剥茧

定时任务的执行时间是多少？

这个定时任务是在每次createPoolEntry的时候，根据maxLifetime随机设定一个variance，在maxLifetime - variance之后触发evict。 maxLifetime我现在设置的是20分钟，

// variance up to 2.5% of the maxlifetime
            final long variance = maxLifetime > 10_000 ? ThreadLocalRandom.current().nextLong( maxLifetime / 40 ) : 0;
            final long lifetime = maxLifetime - variance

按照20分钟1200秒来算，这个evict的操作是1170秒左右,按理说离20分钟差30秒左右。

但是通过观察，好像第一个连接创建的时间比其他连接快4秒。