项目中给http client 4.x设置了各种timeout,但是还是发现一些请求耗费的时间远超过预期,排查一部分请求是由于虚拟机或者GC等情况引起,还有一部分长耗时请求归咎于DNS解析的耗时,所以有必要梳理下http client使用中的各种timeout设置,从而对一个请求完成的最大时间有个正确而清晰的认识:
分析http client中,一次http请求的流程大致有2条主线如下:
(1)从连接池获取到连接->发送http请求->接受http响应
(2)从连接池获取连接,但是没有可用的,此时又可以创建新连接情况下-> DNS解析->建立连接(如果一个DNS可以解析成多个地址,第一个地址连不上,会挨个尝试)-> TLS握手(假设是https)->发送http请求->接受http响应。
分解上面的2条主线,至少有4种timeout(socket timeout, connection timeout, request connection timeout, dns resolve timeout)影响了一次http请求的完成时间。
(1)从连接池获取连接的最长等待时间
通过复用连接池的连接,可以避免不断频繁开关连接,提高效率,所以从连接池获取连接时,当连接池满时,可以设置等待一定的时间来获取连接,而不是永久等待。
1.1 错误堆栈:
org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool ! at org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:226) ! at org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(PoolingClientConnectionManager.java:195)
1.2 源码解析:
   @Override
    public ConnectionRequest requestConnection(
            final HttpRoute route,
            final Object state) {
        Args.notNull(route, "HTTP route");
        if (this.log.isDebugEnabled()) {
            this.log.debug("Connection request: " + format(route, state) + formatStats(route));
        }
        final Future<CPoolEntry> future = this.pool.lease(route, state, null);  //从连接池获取连接
        return new ConnectionRequest() {
            @Override
            public boolean cancel() {
                return future.cancel(true);
            }
            @Override
            public HttpClientConnection get(
                    final long timeout,
                    final TimeUnit tunit) throws InterruptedException, ExecutionException, ConnectionPoolTimeoutException {
                return leaseConnection(future, timeout, tunit);
            }
        };
    }
    protected HttpClientConnection leaseConnection(
            final Future<CPoolEntry> future,
            final long timeout,
            final TimeUnit tunit) throws InterruptedException, ExecutionException, ConnectionPoolTimeoutException {
        final CPoolEntry entry;
        try {
            entry = future.get(timeout, tunit);
            if (entry == null || future.isCancelled()) {
                throw new InterruptedException();
            }
            Asserts.check(entry.getConnection() != null, "Pool entry with no connection");
            if (this.log.isDebugEnabled()) {
                this.log.debug("Connection leased: " + format(entry) + formatStats(entry.getRoute()));
            }
            return CPoolProxy.newProxy(entry);
        } catch (final TimeoutException ex) {
            throw new ConnectionPoolTimeoutException("Timeout waiting for connection from pool");
        }
    }
1.3 设置方法:
RequestConfig requestConfig = RequestConfig.custom(). setConnectionRequestTimeout(2 * 1000).
(2)DNS解析时间
拿到域名建立连接之后,需要进行dns解析,一般dns解析都会设置cache时间,例如5分钟,但是为了防止绑死在某个主机上,都不会永久cache,所以dns解析不可避免,然而dns解析的时间控制除了本身将这个过程异步化外并未提供任何可以设置的时间参数。
2.1 堆栈错误:
Caused by: java.net.UnknownHostException: nebulaik.webex.com.cn: Temporary failure in name resolution
        at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
        at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
        at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
        at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
        at java.net.InetAddress.getAllByName(InetAddress.java:1192)
        at java.net.InetAddress.getAllByName(InetAddress.java:1126)
        at com.webex.dsagent.client.http.DSADnsResolver.resolve(DSADnsResolver.java:24)
        at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:111)
        at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
        at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
        at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
        at org.apache.http.impl.execchain.ServiceUnavailableRetryExec.execute(ServiceUnavailableRetryExec.java:84)
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
2.2 源码分析:
大多请求遇到这种错误,都耗时15秒,原因在于系统配置(/etc/resolv.conf ):
options rotate timeout:5 retries:3 nameserver 10.224.91.88 nameserver 10.224.91.99
诊断方法可以使用dig命令:
dig命令提供了2个参数:
+time=T Sets the timeout for a query to T seconds. The default timeout is 5 seconds. An attempt to set T to less than 1 will result in a query timeout of 1 second being applied. +tries=T Sets the number of times to try UDP queries to server to T instead of the default, 3. If T is less than or equal to zero, the number of tries is silently rounded up to 1.
例子:
[root@centos~]# time dig www.baidu.com +time=1 ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.62.rc1.el6_9.4 <<>> www.baidu.com +time=1 ;; global options: +cmd ;; connection timed out; no servers could be reached real 0m3.010s user 0m0.003s sys 0m0.007s [root@centos~]# time dig www.baidu.com +time=5 ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.62.rc1.el6_9.4 <<>> www.baidu.com +time=5 ;; global options: +cmd ;; connection timed out; no servers could be reached real 0m15.013s user 0m0.006s sys 0m0.005s
另外一个追踪方法和问题实例:解析aaa.com.cn超时了。
dig +trace aaa.com.cn //示例
使用dig +trace定位到:
aaa.com.cn. 600 IN CNAME bbb.com.cn. //别名 bbb.com.cn. 120 IN NS ns1.dns.com.cn. bbb.com.cn. 120 IN NS ns2.dns.com.cn.
其中ns1.dns.com.cn和ns2.dns.com.cn负责解析这个域名,但是做了环境迁移,导致不可达。后来修改下这2条记录,就不再超时了。
2.3 设置方法:
要不修改配置文件(options rotate timeout:5 retries:3),要不将dns解析异步化。
附上各种DNS错误:
https://zhuanlan.zhihu.com/p/40659713
题外:如果一个DNS可以解析成多个地址,而第一个地址连不上时,则一个一个尝试,直到成功或者最后一个,所以很多timeout特别长,也在于此。源码分析:
org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(ManagedHttpClientConnection, HttpHost, InetSocketAddress, int, SocketConfig, HttpContext)
   @Override
    public void connect(
            final ManagedHttpClientConnection conn,
            final HttpHost host,
            final InetSocketAddress localAddress,
            final int connectTimeout,
            final SocketConfig socketConfig,
            final HttpContext context) throws IOException {
        final Lookup<ConnectionSocketFactory> registry = getSocketFactoryRegistry(context);
        final ConnectionSocketFactory sf = registry.lookup(host.getSchemeName());
        if (sf == null) {
            throw new UnsupportedSchemeException(host.getSchemeName() +
                    " protocol is not supported");
        }
        final InetAddress[] addresses = host.getAddress() != null ?
                new InetAddress[] { host.getAddress() } : this.dnsResolver.resolve(host.getHostName());
        final int port = this.schemePortResolver.resolve(host);
        for (int i = 0; i < addresses.length; i++) {
            final InetAddress address = addresses[i];
            final boolean last = i == addresses.length - 1;
            Socket sock = sf.createSocket(context);
            sock.setSoTimeout(socketConfig.getSoTimeout());
            sock.setReuseAddress(socketConfig.isSoReuseAddress());
            sock.setTcpNoDelay(socketConfig.isTcpNoDelay());
            sock.setKeepAlive(socketConfig.isSoKeepAlive());
            final int linger = socketConfig.getSoLinger();
            if (linger >= 0) {
                sock.setSoLinger(true, linger);
            }
            conn.bind(sock);
            final InetSocketAddress remoteAddress = new InetSocketAddress(address, port);
            if (this.log.isDebugEnabled()) {
                this.log.debug("Connecting to " + remoteAddress);
            }
            try {
                sock = sf.connectSocket(
                        connectTimeout, sock, host, remoteAddress, localAddress, context);
                conn.bind(sock);
                if (this.log.isDebugEnabled()) {
                    this.log.debug("Connection established " + conn);
                }
                return;
            } catch (final SocketTimeoutException ex) {
                if (last) {
                    throw new ConnectTimeoutException(ex, host, addresses);
                }
            } catch (final ConnectException ex) {
                if (last) {
                    final String msg = ex.getMessage();
                    if ("Connection timed out".equals(msg)) {
                        throw new ConnectTimeoutException(ex, host, addresses);
                    } else {
                        throw new HttpHostConnectException(ex, host, addresses);
                    }
                }
            } catch (final NoRouteToHostException ex) {
                if (last) {
                    throw ex;
                }
            }
            if (this.log.isDebugEnabled()) {
                this.log.debug("Connect to " + remoteAddress + " timed out. " +
                        "Connection will be retried using another IP address");
            }
        }
    }
(3)连接建立时间
3.1 错误的堆栈:
Caused by: org.apache.http.conn.ConnectTimeoutException: Connect to demosite.yy.zz:443 failed: connect timed out
        at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:150)
        at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
        at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
        at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
        at org.apache.http.impl.execchain.ServiceUnavailableRetryExec.execute(ServiceUnavailableRetryExec.java:84)
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
        at org.jboss.resteasy.client.jaxrs.engines.ApacheHttpClient4Engine.invoke(ApacheHttpClient4Engine.java:312)
        ... 69 more
Caused by: java.net.SocketTimeoutException: connect timed out
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:337)
        at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:141)
        ... 79 more
3.2 源码解析:
    void socketConnect(InetAddress address, int port, int timeout)
        throws IOException {
        int nativefd = checkAndReturnNativeFD();
 
        int connectResult;
        if (timeout <= 0) {  //不设置timeout的时间,永久阻塞
            connectResult = connect0(nativefd, address, port);
        } else {  //设置timeout时间,只等timeout时间。
            configureBlocking(nativefd, false);  //先设置为非阻塞模式
            try {
                connectResult = connect0(nativefd, address, port); //做个连接
                if (connectResult == WOULDBLOCK) { 
                    waitForConnect(nativefd, timeout); //最多等待阻塞timoout时间
                }
            } finally {
                configureBlocking(nativefd, true);  //设置回阻塞模式
            }
        }
 
    }
 
3.3 设置方法:
RequestConfig requestConfig = RequestConfig.custom(). setConnectTimeout(2 * 1000).
(4)数据处理时间:
发送完请求后,等待响应需要一定的时间,这个时间即为数据处理时间。除此之外,建立连接后的tls握手过程,也有多个请求响应过程,这个时间参数对这个交互过程也适用。
4.1 错误堆栈:
Caused by: java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java.net.SocketInputStream.read(SocketInputStream.java:171)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
        at sun.security.ssl.InputRecord.read(InputRecord.java:503)
        at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
        at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930)
        at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
        at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:139)
        at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:155)
        at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:284)
        at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
        at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
        at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
        at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:165)
        at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:167)
        at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
        at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271)
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
        at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
        at org.apache.http.impl.execchain.ServiceUnavailableRetryExec.execute(ServiceUnavailableRetryExec.java:84)
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
4.2 源码解析:
http发完请求后,立马阻塞读取响应:
    public HttpResponse execute(
            final HttpRequest request,
            final HttpClientConnection conn,
            final HttpContext context) throws IOException, HttpException {
        try {
            HttpResponse response = doSendRequest(request, conn, context);
            if (response == null) {
                response = doReceiveResponse(request, conn, context); //阻塞读取响应
            }
            return response;
        }
    /**
     *  Enable/disable {@link SocketOptions#SO_TIMEOUT SO_TIMEOUT}
     *  with the specified timeout, in milliseconds. With this option set
     *  to a non-zero timeout, a read() call on the InputStream associated with
     *  this Socket will block for only this amount of time.  If the timeout
     *  expires, a <B>java.net.SocketTimeoutException</B> is raised, though the
     *  Socket is still valid. The option <B>must</B> be enabled
     *  prior to entering the blocking operation to have effect. The
     *  timeout must be {@code > 0}.
     *  A timeout of zero is interpreted as an infinite timeout.
     *
     * @param timeout the specified timeout, in milliseconds.
     * @exception SocketException if there is an error
     * in the underlying protocol, such as a TCP error.
     * @since   JDK 1.1
     * @see #getSoTimeout()
     */
    public synchronized void setSoTimeout(int timeout) throws SocketException {  //仅仅对读有效
        if (isClosed())
            throw new SocketException("Socket is closed");
        if (timeout < 0)
          throw new IllegalArgumentException("timeout can't be negative");
        getImpl().setOption(SocketOptions.SO_TIMEOUT, new Integer(timeout));
    }
    /**
     * Reads into an array of bytes at the specified offset using
     * the received socket primitive.
     * @param fd the FileDescriptor
     * @param b the buffer into which the data is read
     * @param off the start offset of the data
     * @param len the maximum number of bytes read
     * @param timeout the read timeout in ms
     * @return the actual number of bytes read, -1 is
     *          returned when the end of the stream is reached.
     * @exception IOException If an I/O error has occurred.
     */
    private native int socketRead0(FileDescriptor fd,
                                   byte b[], int off, int len,
                                   int timeout) //阻塞读取timeout时间
4.3 设置方法:
RequestConfig requestConfig = RequestConfig.custom(). setSocketTimeout(requestTimeoutInSecond * 1000).
假设不设置so_timeout的同时,设置了上文(3)中提到的connection timeout,则so_timeout = connection timeout
org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(int, Socket, HttpHost, InetSocketAddress, InetSocketAddress, HttpContext)
            if (connectTimeout > 0 && sock.getSoTimeout() == 0) {
                sock.setSoTimeout(connectTimeout);
            }
总结:
除了以上几种直接影响处理耗时的timeout参数为,尝试机制是另外一个影响处理耗时的机制,而且基本都是“翻倍”,这里不做赘述。所以从这个角度看,影响一次请求的耗时,主要包括以下因素:
(1) GC的STW效应
(2) 虚拟机的影响
(3) 应用层/系统层各种timeout参数
(4) 应用层/系统层重试次数
通过上文可知,除非将一个请求过程完全异步化,否则必须将所有的参数都了然于心才能很有信心的了解一个请求到底最长需要多久来完成,但是异步化的缺点就是管理的复杂性、缺乏负反馈等问题,需要综合权衡。

