Problem affects getMessage() and sendMessage() (read and write direction) getMessage(): if a malfunctioned or malicious peer writes garbage to socket which is not a valid GIOP header, the ORB does a permanent read() which may result in very high CPU load. Even if this is not an attack but a wrong protocol or anything else, it makes no sense to keep this connection opened, since the chance to get in synch with peer is minimal once a not valid header (12 bytes) was read. This was already discussed and submitted to Andre Spiegel (including a test case) sendMessage(): occuring IOExceptions during send are ignored at the moment, this may lead to hanging resources or extremly slow processing of requests in overload situations. Instead the connection should be closed in these situations since the peer obviously is not able to get the response messages properly. scenario: - peer gets in overload - peer doesnt read data in time from its socket - system buffers fill up with written data - if system buffers are full, write() operation blocks (possibly forever since this operation is not secured by SO_TIMEOUT parameter!) - even by providing own sockets which enable timeouts during write(), request processing is extremly slow, since every thread responding to a request runs into timeout situation a patch is attached (against 2.2.3)
Created attachment 243 [details] patches getMessage() and sendMessage() to close connection in described situations
Applied both parts of the patch on CVS HEAD. Thanks.
There was an error introduced in promoted and applied patch for GIOPConnection. This error may lead to deadlocks between reader and writer threads which might try to close this socket concurrently. This is because 2 sync Objects are relevant for closing the socket: connect_sync write_sync they get allocated by implementation of streamClosed of ClientGIOPConnection in order: connect_sync -> write_sync (reader thread) but while sending a message via sendMessage they get allocated in order: write_sync -> connect_sync (writer thread already has write_lock if he calls streamClosed) solution: call releaseWriteLock() before calling streamClosed() in sendMessage() in error situations. applied patch is against 2.3 beta2
Created attachment 253 [details] patch for deadlock problem introduced by former patches (against 2.3 beta2)
Second patch is installed. Thanks, Marc.
*** Bug 701 has been marked as a duplicate of this bug. ***
I believe that there is still a deadlock situation caused by this code. Here is the test I did. First, in StreamConnectionBase.java, about line 126, I changed it to be this, so that every so often it throws a IOException that looks like the one we were seeing in our clients. Unfortnantly, the real sockect exception has proven hard to reproduce, but I believe this test is a good first test. I only did this in the client's jacorb.jar. Next I ran a simple "hello world" client and server (attached). The client orb sees the IOException, marks the socket as bad, and creates a new socket to the server successfully. However, the local call that triggered the IOException doesn't return with the COMM_FAILURE and the client hangs until the client timeout is reached. Subsequent calls to the same server also timeout. The output of both the client and server is in client.out and server.out in the tarball, as well as a thread dump on the client showing the deadlocked threads. It is not clear to me how to resolve this, but at least there is a test case to demonstrate it. StreamConnectionBase.java, line 126: int USC_count = 0; public void write (boolean is_first, boolean is_last, byte[] data, int offset, int length, long time_out ) { try { if (USC_count++ % 10 == 9) { java.net.SocketException s = new java.net.SocketException("Broken pipe"); java.io.IOException io = new java.io.IOException(); io.initCause(s); System.out.println("Fake 'Broken pipe' "+io.getMessage()); throw io; }
Created attachment 256 [details] test client/server for deadlock/timeout
Yes you are right, at least the client locks up, but this is not caused by this code, its caused by wrong behavior in Delegate.java. The Delegate simply ignores the COMM_FAILURE exception and on next call the Delegate is still in state: bound=true, which means simply send the request (including opening a new transport). BUT: the previous close() has forced the MessageReceptor (which reads answers from socket) to quit and so there is no Reader any more, thats why your ReplyReceivers hangs forever. Possible Solutions Delegate should be adapted to handle COMM_FAILURE exceptions like TRANSIENT exceptions (try_rebind), would be best solution. As a workaround you could throw away a narrowed object reference (to garbage collection) once you got a COMM_FAILURE exception in client code, but this cannot be the final solution.
I tried the idea of making the Delegate treat COMM_FAILUREs in the same way as TRANSIENTs, ie try_rebind(), but it did not seem to help. I addition, I tried creating a new ref by reloading each corba object from the ior via string_to_object, and this seemed to work occassionally but not always. I expect there is a race condition with the garbage collector such that if the gc cleans the old object in time, then there will be a new Delegate, but if not then you are still dealing with the old one. Since you can never force an object to be garbage collected, this doesn't seem like a good solution. Here is the client side logging with debug messages, starting at the point when the fake 'Broken pipe' is generated, till it hit the timeout. [jacorb.orb.giop] INFO : ClientConnectionManager: found ClientGIOPConnection to 129.252.35.30:9876 (3da850) Fake 'Broken pipe' null [jacorb.orb.iiop] DEBUG : Caught exception java.io.IOException at org.jacorb.orb.etf.StreamConnectionBase.write(StreamConnectionBase.java:139) at org.jacorb.orb.giop.GIOPConnection.write(GIOPConnection.java:887) at org.jacorb.orb.CDROutputStream.write(CDROutputStream.java:438) at org.jacorb.orb.giop.ServiceContextTransportingOutputStream.write_to(ServiceContextTransportingOutputStream.java:148) at org.jacorb.orb.giop.RequestOutputStream.write_to(RequestOutputStream.java:251) at org.jacorb.orb.giop.GIOPConnection.sendMessage(GIOPConnection.java:971) at org.jacorb.orb.giop.GIOPConnection.sendRequest(GIOPConnection.java:923) at org.jacorb.orb.giop.ClientConnection.sendRequest(ClientConnection.java:323) at org.jacorb.orb.giop.ClientConnection.sendRequest(ClientConnection.java:304) at org.jacorb.orb.Delegate.invoke_internal(Delegate.java:1024) at org.jacorb.orb.Delegate.invoke(Delegate.java:939) at org.omg.CORBA.portable.ObjectImpl._invoke(ObjectImpl.java:457) at demo.hello._GoodDayStub.hello_simple(_GoodDayStub.java:32) at demo.hello.Client$1.run(Client.java:26) at java.lang.Thread.run(Thread.java:613) Caused by: java.net.SocketException: Broken pipe at org.jacorb.orb.etf.StreamConnectionBase.write(StreamConnectionBase.java:138) ... 14 more [jacorb.giop.conn] ERROR : Failed to write GIOP message due to COMM_FAILURE org.omg.CORBA.COMM_FAILURE: IOException: java.io.IOException vmcid: 0x0 minor code: 0 completed: No at org.jacorb.orb.etf.ConnectionBase.to_COMM_FAILURE(ConnectionBase.java:143) at org.jacorb.orb.etf.StreamConnectionBase.write(StreamConnectionBase.java:153) at org.jacorb.orb.giop.GIOPConnection.write(GIOPConnection.java:887) at org.jacorb.orb.CDROutputStream.write(CDROutputStream.java:438) at org.jacorb.orb.giop.ServiceContextTransportingOutputStream.write_to(ServiceContextTransportingOutputStream.java:148) at org.jacorb.orb.giop.RequestOutputStream.write_to(RequestOutputStream.java:251) at org.jacorb.orb.giop.GIOPConnection.sendMessage(GIOPConnection.java:971) at org.jacorb.orb.giop.GIOPConnection.sendRequest(GIOPConnection.java:923) at org.jacorb.orb.giop.ClientConnection.sendRequest(ClientConnection.java:323) at org.jacorb.orb.giop.ClientConnection.sendRequest(ClientConnection.java:304) at org.jacorb.orb.Delegate.invoke_internal(Delegate.java:1024) at org.jacorb.orb.Delegate.invoke(Delegate.java:939) at org.omg.CORBA.portable.ObjectImpl._invoke(ObjectImpl.java:457) at demo.hello._GoodDayStub.hello_simple(_GoodDayStub.java:32) at demo.hello.Client$1.run(Client.java:26) at java.lang.Thread.run(Thread.java:613) [jacorb.giop.conn] ERROR : GIOP connection closed due to errors during sendMessage [jacorb.giop.conn] DEBUG : ClientGIOPConnection to 129.252.35.30:9876 (3da850): close() [jacorb.giop.conn] WARN : Abnormal connection termination. Lost 1 outstanding replie(s)! [jacorb.orb.iiop] INFO : Client-side TCP transport to 129.252.35.30:9876 closed. [jacorb.giop.conn] DEBUG : ClientGIOPConnection to 129.252.35.30:9876 (3da850): streamClosed() [jacorb.giop.conn] DEBUG : ClientGIOPConnection to 129.252.35.30:9876 (3da850): closeAllowReopen() [jacorb.orb.delegate] DEBUG : invoke: SystemException [jacorb.orb.delegate] DEBUG : Delegate.try_rebind Caught exception in 18th loopIOException: java.io.IOException [jacorb.orb.iiop] DEBUG : Transport to 129.252.35.30:9876: stream closed Socket closed [jacorb.orb.iiop] DEBUG : Caught exception java.net.SocketException: Socket closed at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at org.jacorb.orb.etf.StreamConnectionBase.read(StreamConnectionBase.java:81) at org.jacorb.orb.giop.GIOPConnection.getMessage(GIOPConnection.java:393) at org.jacorb.orb.giop.GIOPConnection.receiveMessages(GIOPConnection.java:517) at org.jacorb.orb.giop.MessageReceptor.doWork(MessageReceptor.java:71) at org.jacorb.util.threadpool.ConsumerTie.run(ConsumerTie.java:61) at java.lang.Thread.run(Thread.java:613) [jacorb.giop.conn] DEBUG : ClientGIOPConnection to 129.252.35.30:9876 (3da850): getMessage() -- COMM_FAILURE [jacorb.giop.conn] DEBUG : ClientGIOPConnection to 129.252.35.30:9876 (3da850): streamClosed() [jacorb.giop.conn] DEBUG : ClientGIOPConnection to 129.252.35.30:9876 (3da850): closeAllowReopen() [jacorb.util.tpool] DEBUG : [1/1] job queue empty [jacorb.orb.delegate] DEBUG : Delegate.getReference with POA < empty> [jacorb.orb.delegate] DEBUG : Delegate.getReference with POA < empty> [jacorb.orb.giop] INFO : ClientConnectionManager: found ClientGIOPConnection to 129.252.35.30:9876 (3da850) [jacorb.giop.conn] DEBUG : ClientGIOPConnection to 129.252.35.30:9876 (3da850): sendMessage() -- opening transport [jacorb.orb.iiop] DEBUG : Trying to connect to 129.252.35.30:9876 with timeout=90000. [jacorb.orb.giop] DEBUG : ClientConnectionManager: cannot release ClientGIOPConnection to 129.252.35.30:9876 (3da850) (still has 4 client(s)) [jacorb.orb.delegate] DEBUG : Delegate released! [jacorb.orb.giop] DEBUG : ClientConnectionManager: cannot release ClientGIOPConnection to 129.252.35.30:9876 (3da850) (still has 3 client(s)) [jacorb.orb.delegate] DEBUG : Delegate released! [jacorb.orb.giop] DEBUG : ClientConnectionManager: cannot release ClientGIOPConnection to 129.252.35.30:9876 (3da850) (still has 2 client(s)) [jacorb.orb.delegate] DEBUG : Delegate released! [jacorb.orb.iiop] INFO : Connected to 129.252.35.30:9876 from local port 54778 Caught exception in 19th loopclient timeout reached [jacorb.orb.delegate] DEBUG : Delegate.getReference with POA < empty> [jacorb.orb.delegate] DEBUG : Delegate.getReference with POA < empty> [jacorb.orb.giop] INFO : ClientConnectionManager: found ClientGIOPConnection to 129.252.35.30:9876 (3da850)
well, after some testing with your client and mine, i think i found the root cause of this behavior. The GIOPConnection and the ClientConnection on top of it, are designed as lets say "outliving" objects, so calling close() on GIOPConnection should not be done from lower layers or GIOPConnection itself. Instead to indicate a problem with underlaying transport it is quite sufficient just to call streamClosed() operation and leave the real close() for outer objects (Delegates or GIOPConnectionManagers). Calling close() before all Delegates have finished their work with this Connection leads to error situations where a new transport connection is opened but no MessageReceiver is created to read the answers. So i simply removed all calls to close()introduced by my patches from GIOPConnection and now it seems to work fine. See my attachement.
Created attachment 257 [details] patch against 2.3 beta to remove all calls to close() operation
are these patches included in the 2.3 - release? couldn't find it in the release notes... maybe someone could tell me... regards... christian
in addition to my previous entry... i compared 2.3 beta (unpatched) to the patched 2.3 beta and to 2.3 final and found a mix of both old and patched code... so i'm a little bit confused about this... are there any comments to the patch from 12/11/06 and/or 2.3 ? regards chris
The latest patch from 12/11/06 is not part of 2.3.0 But with this patch it now works very stable here. Regards Marc
I have installed the final patch from 2006-12-11 into the main code base now. Sorry for the delay in getting this sorted out. It will be part of the next release.
Dear all, A deadlock is found in my environment. JacORB 2.3 with la patched (removed all calls to close() from GIOPConnection). Here is the log message. Name: RMI TCP Connection(103)-10.0.10.101 Status: WAITING at java.lang.Object@20d65 Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:474) org.jacorb.orb.giop.GIOPConnection.getWriteLock(GIOPConnection.java:812) org.jacorb.orb.giop.ClientGIOPConnection.closeAllowReopen(ClientGIOPConnection.java:130) org.jacorb.orb.giop.ClientGIOPConnection.streamClosed(ClientGIOPConnection.java:106) org.jacorb.orb.giop.GIOPConnection.sendMessage(GIOPConnection.java:989) org.jacorb.orb.giop.GIOPConnection.sendRequest(GIOPConnection.java:900) org.jacorb.orb.giop.ClientConnection.sendRequest(ClientConnection.java:323) org.jacorb.orb.giop.ClientConnection.sendRequest(ClientConnection.java:304) org.jacorb.orb.Delegate.invoke_internal(Delegate.java:1024) org.jacorb.orb.Delegate.invoke(Delegate.java:939) org.omg.CORBA.portable.ObjectImpl._invoke(ObjectImpl.java:457) CustomerModule._CustomerStub.getProvisioning(_CustomerStub.java:485) ... Name: RMI TCP Connection(91)-10.0.10.101 Status: BLOCKED at java.lang.Object@5e320f ,Owner: RMI TCP Connection(103)-10.0.10.101 Stack: org.jacorb.orb.giop.GIOPConnection.sendMessage(GIOPConnection.java:930) org.jacorb.orb.giop.GIOPConnection.sendRequest(GIOPConnection.java:900) org.jacorb.orb.giop.ClientConnection.sendRequest(ClientConnection.java:323) org.jacorb.orb.giop.ClientConnection.sendRequest(ClientConnection.java:304) org.jacorb.orb.Delegate.invoke_internal(Delegate.java:1024) org.jacorb.orb.Delegate.invoke(Delegate.java:939) org.omg.CORBA.portable.ObjectImpl._invoke(ObjectImpl.java:457) CustomerModule._CustomerStub.getProvisioning(_CustomerStub.java:485) ... My workaround solution is 1) removed all calls to close() as it is 2) create a new method streamClosedWithoutWriteLock() {just remove the call to getWriteLock()} It works. Kent
Hi All, Please let me know if this issue is similar to http://www.jacorb.org/cgi-bin/bugzilla/show_bug.cgi?id=858 and the same solution can be applied here. Kent, can you explain where to implement the method streamClosedWithoutWriteLock() ?
Created attachment 363 [details] Solve potential deadlock in GIOPConnection. My test env. is a client on Windows XP along with TcpView + a server on Linux. Test step: 1) start up client, server and TcpView 2) triger a request from client, that will create a connection 3) close the connection with TcpView Expect result: a new connection will be created and system still function.
Dear Abhishek Singh, I'm not 100% sure it is as same as #858. I've got COMM_FAILURE (broken pipe) in my software only when there is high stress. But I don't know why. My solution is using TcpView to close conection to mock one, and try to fix it. I've test my patch yesterday and have some minor turning. See attachment. Hope it can work. Kent
Created attachment 364 [details] Solve potential deadlock in GIOPConnection. Last update 08 Jun 2009. A bad mews is the pervious patch does not survive during a weekend's stability test. There is still a deadlock withing write_lock and connect_sync. Thus I made another patch. This time, my strategy is getting write_lock before sync connect_sync when we need both of them. See the attachment for details.
FYI, the patch on 8 Jun works perfectly so far. Cheers!
Reopen to make sure we get Kent's final patch into main code base.
Final part of patch is now committed (only the synchronization part in ClientGIOPConnection, as the other changes had already been applied).
(In reply to comment #24) > Final part of patch is now committed (only the synchronization part in > ClientGIOPConnection, as the other changes had already been applied). > Andre, I think there is still a problem here. I've been working with a customer who has a heavily threaded client, which happens to do quite a lot of Name Service resolving. They ran into a race condition that your fix ought to remedy, but I have thought of another possible problem. They have many, perhaps a dozen, threads all waiting to send a request to the same name service. This means a whole bunch of threads in GIOPConnection.getWriteLock(), called from GIOPConnectoin.sendMessage(). Something happens and causes a COMM_FAILURE to be thrown to the thread that got the write lock and tried to send. This thread then calls releaseWriteLock() on its way to calling ClientGIOPConnection.closeAllowReopen(). A new thread gets to return from getWriteLock() and proceeds to try to send its message. However the transport is corrupt and unclosed, since the first thread hasn't had a chance to close the transport as it is blocked in getWriteLock(). So a second COMM_FAILURE is thrown, and another thread enters closeAllowReopen() and blocks. Since there are many threads waiting for the write lock, this scenario can occur many times before a thread in closeAllowReopen() actually gets the write lock and proceeds to close the transport. Now at this point, you've got at least one thread waiting in closeAllowReopen(). It is entirely possible that you also have threads waiting from sendMessage(). Lets say one of those gets a chance to go, it successfully opens a new transport, sends its request, and releases the write lock. Now, one of the threads waiting to close gets the write lock and closes the transport, which by this time will be the new one just opened. Oops! Now you have a client thread that successfully sent a message waiting forever (or timeout) for a reply it will never receive because another thread closed the transport out from under it.
I've finally had a chance to test my theory about an additional deadlock scenario. I now believe this is not a problem, as sendMessage() calls ClientGIOPConnection.streamClosed() which calls ConnectionListener.streamClosed() after closeAllowReopen(). This ensures that even if a race as I hypothesized occurs, the result will be a COMMFAIL exception rather than a deadlock. Now, I have observed that it is possible to have multiple threads waiting in closeAllowReopen(), and even for a thread to close the stream and have another reopen, to have a third previously blocked in closeAllowReopen() to then close the stream again. It might be possible to do something a little more efficient here to ensure a "reclose" doesn't happen but I think the conditions leading to this are fairly rare so I won't propose any change now.
*** Bug 759 has been marked as a duplicate of this bug. ***
*** Bug 858 has been marked as a duplicate of this bug. ***