1 提示如下错误:Connection pool shut down
现象,操作Dynanodb时,提示错误,最开始怀疑是Dynamodb本身问题,导致OOM。实际这个情况应该是OOM导致了这个错误,分析如下:
具体错误为:java.lang.IllegalStateException: Connection pool shut down
查看错误抛出的地方:
//org.apache.http.impl.conn.PoolingHttpClientConnectionManager#requestConnection @Override public ConnectionRequest requestConnection( final HttpRoute route, final Object state) { Args.notNull(route, "HTTP route"); if (this.log.isDebugEnabled()) { this.log.debug("Connection request: " + format(route, state) + formatStats(route)); } Asserts.check(!this.isShutDown.get(), "Connection pool shut down"); // ...... }
归因于从“连接池”中请求连接时,发现连接池被关闭了,所以查看何时会关闭:只有显示关闭shutdown才会。
//org.apache.http.impl.conn.PoolingHttpClientConnectionManager#shutdown @Override public void shutdown() { if (this.isShutDown.compareAndSet(false, true)) { // } }
何时会显示关闭?参考以下代码,遇到ERROR时,会显示关闭。
org.apache.http.impl.execchain.MainClientExec#execute @Override public CloseableHttpResponse execute( final HttpRoute route, final HttpRequestWrapper request, final HttpClientContext context, final HttpExecutionAware execAware) throws IOException, HttpException { //省略非关键代码:请求连接,执行请求。 } catch (final Error error) { connManager.shutdown(); throw error; } }
而java.lang.OutOfMemoryError继承于Error,正是其中一种,所以遇到OOM时,连接池会关闭,后续不可自动恢复,必须重启Application.
BTW: 这种错误属于Fast Fail的错误,反而不会导致OOM。(不考虑一些落内存的重试)
2 Redis Timeout相关?
错误如下:
org.springframework.dao.QueryTimeoutException: Redis command timed out; nested exception is io.lettuce.core.RedisCommandTimeoutException: Command timed out after 2 second(s) at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:70) at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:41) at org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44) at org.springframework.data.redis.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:42) at org.springframework.data.redis.connection.lettuce.LettuceConnection.convertLettuceAccessException(LettuceConnection.java:278) at org.springframework.data.redis.connection.lettuce.LettuceConnection.await(LettuceConnection.java:1086) at org.springframework.data.redis.connection.lettuce.LettuceConnection.lambda$doInvoke$4(LettuceConnection.java:939) at org.springframework.data.redis.connection.lettuce.LettuceInvoker$Synchronizer.invoke(LettuceInvoker.java:665)
排除,发生OOM之前的错误数量(<100)不至于导致OOM.
3 Bug or 正常使用
如果是一般性bug,OOM后,内存不会自动下降,但是参考如下日志可以看出还是会下降的:内存从8322465096下降至6235528672
{"monitorType":"async_jvm_monitor","logTime":"2022-11-18 09:16:42.222","metricsDetail":{"host":"ip-10-0-1-146.ec2.internal","heapUsed":3703050240,"nonHeapUsed":345615112,"ygcCount":4,"ygcCost":244,"fgcCount":0,"fgcCost":0}} {"monitorType":"async_jvm_monitor","logTime":"2022-11-18 09:17:42.223","metricsDetail":{"host":"ip-10-0-1-146.ec2.internal","heapUsed":6674228736,"nonHeapUsed":346253752,"ygcCount":5,"ygcCost":291,"fgcCount":0,"fgcCost":0}} {"monitorType":"async_jvm_monitor","logTime":"2022-11-18 09:18:42.221","metricsDetail":{"host":"ip-10-0-1-146.ec2.internal","heapUsed":6962728408,"nonHeapUsed":346507992,"ygcCount":59,"ygcCost":2126,"fgcCount":0,"fgcCost":0}} {"monitorType":"async_jvm_monitor","logTime":"2022-11-18 09:20:14.337","metricsDetail":{"host":"ip-10-0-1-146.ec2.internal","heapUsed":8322465096,"nonHeapUsed":345714256,"ygcCount":300,"ygcCost":3192,"fgcCount":35,"fgcCost":13357}} {"monitorType":"async_jvm_monitor","logTime":"2022-11-18 09:21:24.884","metricsDetail":{"host":"ip-10-0-1-146.ec2.internal","heapUsed":8248065216,"nonHeapUsed":346660992,"ygcCount":253,"ygcCost":1954,"fgcCount":161,"fgcCost":62121}} {"monitorType":"async_jvm_monitor","logTime":"2022-11-18 09:21:42.235","metricsDetail":{"host":"ip-10-0-1-146.ec2.internal","heapUsed":7522034616,"nonHeapUsed":356767008,"ygcCount":212,"ygcCost":2613,"fgcCount":9,"fgcCost":4216}} {"monitorType":"async_jvm_monitor","logTime":"2022-11-18 09:22:42.221","metricsDetail":{"host":"ip-10-0-1-146.ec2.internal","heapUsed":8340010192,"nonHeapUsed":359838736,"ygcCount":327,"ygcCost":3584,"fgcCount":34,"fgcCost":13066}} {"monitorType":"async_jvm_monitor","logTime":"2022-11-18 09:23:42.219","metricsDetail":{"host":"ip-10-0-1-146.ec2.internal","heapUsed":6501521464,"nonHeapUsed":358762232,"ygcCount":110,"ygcCost":1575,"fgcCount":1,"fgcCost":432}} {"monitorType":"async_jvm_monitor","logTime":"2022-11-18 09:24:42.227","metricsDetail":{"host":"ip-10-0-1-146.ec2.internal","heapUsed":6235528672,"nonHeapUsed":359783160,"ygcCount":24,"ygcCost":583,"fgcCount":0,"fgcCost":0}} {"monitorType":"async_jvm_monitor","logTime":"2022-11-18 09:25:42.219","metricsDetail":{"host":"ip-10-0-1-146.ec2.internal","heapUsed":7306915680,"nonHeapUsed":360388120,"ygcCount":34,"ygcCost":732,"fgcCount":0,"fgcCost":0}} {"monitorType":"async_jvm_monitor","logTime":"2022-11-18 09:26:42.219","metricsDetail":{"host":"ip-10-0-1-146.ec2.internal","heapUsed":6650218176,"nonHeapUsed":361077272,"ygcCount":40,"ygcCost":885,"fgcCount":0,"fgcCost":0}} {"monitorType":"async_jvm_monitor","logTime":"2022-11-18 09:27:45.410","metricsDetail":{"host":"ip-10-0-1-146.ec2.internal","heapUsed":8421057232,"nonHeapUsed":361559696,"ygcCount":148,"ygcCost":1594,"fgcCount":39,"fgcCost":14195}} {"monitorType":"async_jvm_monitor","logTime":"2022-11-18 09:29:21.789","metricsDetail":{"host":"ip-10-0-1-146.ec2.internal","heapUsed":8300889856,"nonHeapUsed":361789168,"ygcCount":187,"ygcCost":1008,"fgcCount":263,"fgcCost":94826}} {"monitorType":"async_jvm_monitor","logTime":"2022-11-18 09:30:14.178","metricsDetail":{"host":"ip-10-0-1-146.ec2.internal","heapUsed":7992134312,"nonHeapUsed":361961280,"ygcCount":99,"ygcCost":527,"fgcCount":118,"fgcCost":42734}}