How to avoid double connection close upon regaining connectivity?
Asked Answered
H

1

12

Under these circumstances:

  1. The client loses network connectivity to zk.
  2. One minute passes.
  3. The client regains network connectivity to zk.

I'm getting the following panic:

panic: close of closed channel

goroutine 2849 [running]:
github.com/samuel/go-zookeeper/zk.(*Conn).Close(0xc420795180)
  github.com/samuel/go-zookeeper/zk/conn.go:253 47
github.com/curator-go/curator.(*handleHolder).internalClose(0xc4203058f0, 0xc420302470, 0x0)
  github.com/curator-go/curator/state.go:136 +0x8d
github.com/curator-go/curator.(*handleHolder).closeAndReset(0xc4203058f0, 0xc42587cd00, 0x1e)
  github.com/curator-go/curator/state.go:122 +0x2f
github.com/curator-go/curator.(*connectionState).reset(0xc420302420, 0x1b71d87, 0xf)
  github.com/curator-go/curator/state.go:234 +0x55
github.com/curator-go/curator.(*connectionState).handleExpiredSession(0xc420302420)
  github.com/curator-go/curator/state.go:351 +0xd9
github.com/curator-go/curator.(*connectionState).checkState(0xc420302420, 0xffffff90, 0x0, 0x0, 0xc425ed2600, 0xed0e5250a)
  github.com/curator-go/curator/state.go:318 +0x9c
github.com/curator-go/curator.(*connectionState).process(0xc420302420, 0xc425ed2680)
  github.com/curator-go/curator/state.go:299 +0x16d
created by github.com/curator-go/curator.(*Watchers).Fire
  github.com/curator-go/curator/watcher.go:64 +0x96

This is the detailed sequence of events:

  1. The client loses network connectivity to zk.
  2. One minute passes.
  3. The client regains network connectivity to zk.
  4. goroutine A calls s.ReregisterAll() -> Conn() -> checkTimeout() -> reset (bc 1 minute has elapsed) -> closeAndReset() -> conn.Close() which can block for a second
  5. goroutine B handles zk.StateExpired (zk cluster sends this bc it considers this client as dead since it didn't ping during 2.) -> reset -> closeAndReset() -> conn.Close() which causes a panic because conn.Close() already closed the connection's c.shouldQuit channel AND s.zooKeeper.getZookeeperConnection was never called by goroutine A because it was blocking for the second so there's no new connection.

A naive solution I tried is to just use a mutex on reset, but now I'm getting helper.GetConnectionString() equal to empty string. What's the best way to avoid this crash and try to get into a good state when the client loses and then regains network connectivity? Should the fix be in github.com/samuel/go-zookeeper's implementation of not letting you close an already closed connection?

(I've filed this issue here, but the project seems to be lacking in terms of discussion so I'm asking on SO.)

Hamblin answered 30/6, 2017 at 23:46 Comment(4)
I'm not familiar with these libraries, but after perusing the code for a bit I have a question. Do you need it to discard the zk.Conn entirely and dial up a new one, or do you need it to persist and allow for reconnection. If you want to discard, then your issue is likely with github.com/curator-go/curator, else the issue lies in github.com/samuel/go-zookeeper. I'm not sure if I can really help much here, but it might be something you can address with the other library.Commodious
I think either would work. That distinction makes sense to me as a good starting point for solving this.Hamblin
I think that easy way to solve this problem is make fork of go-zookeeper since this issue has been filed in a long time ago. github.com/samuel/go-zookeeper/issues/148Spout
I agree with @Spout that forking it and fixing it for your use-case might be the easiest solution short-term, but you could run into trouble down the line unless you fully understand the ramifications of the changes you make. Use caution, minimize the modifications, and test as best as you can.Commodious
U
1

zk.Conn has a State() method that returns an enum "State", which is one of the following:

type State int32
const (
    StateUnknown           State = -1
    StateDisconnected      State = 0
    StateConnecting        State = 1
    StateAuthFailed        State = 4
    StateConnectedReadOnly State = 5
    StateSaslAuthenticated State = 6
    StateExpired           State = -112

    StateConnected  = State(100)
    StateHasSession = State(101)
)

What state is "conn" in when goroutine B calls conn.Close()?

A possible solution would be to add a switch to goroutine B whereby you do not call conn.Close() if you are in conn.StateConnecting.

Ulu answered 19/2, 2018 at 20:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.