Tolerate packet reordering in the early init process

Fixes a bug where packet reordering made the server give the
client two peer ids instead of one. This in turn confused
reliable packet sending and made connecting to the server fail.

The client usually sends three packets at init: one "dummy"
packet consisting of two 0 bytes, and the init packet as well as
its legacy counterpart. The last one can be turned off since commit
af30183124d40a969040d7de4b3a487feec466e4, but this is of lower
relevance for the bug. The relevant part here is that network
packet reorder (which is a normal occurence) can make the packets
reach the server in different order.

If reorder puts the dummy packet further behind, the following
would happen before the patch:

1. The server will get one of the init packets on channel 1 and
   assign the client a peer id, as the packet will have zero as
   peer id.

2. The server sends a CONTROLTYPE_SET_PEER_ID packet to inform
   the client of the peer id.

3. The next packet from the client will contain the peer id set by
   the server.

4. The server sets the m_has_sent_with_id member for the client's
   peer structure to true.

5. Now the dummy packet arrives. It has a peer id of zero, therefore
   the server searches whether it already has a peer id for the
   address the packet was sent from. The search fails because
   m_has_sent_with_id was set to true and the server only searched
   for peers with m_has_sent_with_id set to false.

6. In a working setup, the server would assign the dummy packet to
   the correct peer id. However the server instead now assigns a
   second peer id and peer structure to the peer, and assign the
   packet to that new peer.

7. In order to inform the peer of its peer id, the server sends a
   CONTROLTYPE_SET_PEER_ID command packet, reliably, to the peer.
   This packet uses the new peer id.

8. The client sends an ack to that packet, not with the new peer id
   but with the peer id sent in 2.

9. This packet reaches the server, but it drops the ACK as the peer
   id does not map to any un-ACK-ed packets with that seqnum. The
   same time, the server still waits for an ACK with the new peer
   id, which of course won't come. This causes the server to
   periodically re-try sending that packet, and the client ACKing it
   each time.

Steps 7-9 cause annoyances and erroneous output, but don't cause
the connection failure itself.
The actual mistake that causes the connection failure happens in 6:
The server does not assign the dummy packet to the correct peer, but
to a newly created one.
Therefore, all further packets sent by the client on channel 0 are
now buffered by the server as it waits for the dummy packet to reach
the peer, which of course doesn't happen as the server assigned
that packet to the second peer it created for the client.
This makes the connection code indefinitely buffer the
TOSERVER_CLIENT_READY packet, not passing it to higher level code,
which stalls the continuation of the further init process
indefinitely and causes the actual bug.

Maybe this can be caused by reordered init packets as well, the only
studied case was where network has reliably reordered the dummy
packet to get sent after the init packets.

The patch fixes the bug by not ignoring peers where
m_has_sent_with_id has been set anymore. The other changes of the
patch are just cleanups of unused methods and fields and additional
explanatory comments.

One could think of alternate ways to fix the bug:

* The client could simply take the new peer id and continue
  communicating with that. This is however worse than the fix as
  it requires the peer id set command to be sent reliably (which
  currently happens, but it cant be changed anymore). Also, such a
  change would require both server and client to be patched in order
  for the bug to be fixed, as right now the client ignores peer id
  set commands after the peer id is different from
  PEER_ID_INEXISTENT and the server requires modification too to
  change the peer id internally.
  And, most importantly, right now we guarantee higher level server
  code that the peer id for a certain peer does not change. This
  guarantee would have to be broken, and it would require much
  larger changes to the server than this patch means.

* One could stop sending the dummy packet. One may be unsure whether
  this is a good idea, as the meaning of the dummy packet is not
  known (it might be there for something important), and as it is
  possible that the init packets may cause this problem as well
  (although it may be possible too that they can't cause this).

Thanks to @auouymous who had originally reported this bug and who
has helped patiently in finding its cause.
This commit is contained in:
est31 2016-05-18 06:18:08 +02:00
parent f64a6259b2
commit 423d8c1b0d
2 changed files with 8 additions and 24 deletions

@ -2167,12 +2167,12 @@ void ConnectionReceiveThread::receive()
throw InvalidIncomingDataException("Channel doesn't exist"); throw InvalidIncomingDataException("Channel doesn't exist");
} }
/* preserve original peer_id for later usage */
u16 packet_peer_id = peer_id;
/* Try to identify peer by sender address (may happen on join) */ /* Try to identify peer by sender address (may happen on join) */
if (peer_id == PEER_ID_INEXISTENT) { if (peer_id == PEER_ID_INEXISTENT) {
peer_id = m_connection->lookupPeer(sender); peer_id = m_connection->lookupPeer(sender);
// We do not have to remind the peer of its
// peer id as the CONTROLTYPE_SET_PEER_ID
// command was sent reliably.
} }
/* The peer was not found in our lists. Add it. */ /* The peer was not found in our lists. Add it. */
@ -2214,11 +2214,6 @@ void ConnectionReceiveThread::receive()
} }
} }
/* mark peer as seen with id */
if (!(packet_peer_id == PEER_ID_INEXISTENT))
peer->setSentWithID();
peer->ResetTimeout(); peer->ResetTimeout();
Channel *channel = 0; Channel *channel = 0;
@ -2756,7 +2751,7 @@ u16 Connection::lookupPeer(Address& sender)
for(; j != m_peers.end(); ++j) for(; j != m_peers.end(); ++j)
{ {
Peer *peer = j->second; Peer *peer = j->second;
if (peer->isActive()) if (peer->isPendingDeletion())
continue; continue;
Address tocheck; Address tocheck;

@ -663,8 +663,7 @@ class Peer {
m_last_rtt(-1.0), m_last_rtt(-1.0),
m_usage(0), m_usage(0),
m_timeout_counter(0.0), m_timeout_counter(0.0),
m_last_timeout_check(porting::getTimeMs()), m_last_timeout_check(porting::getTimeMs())
m_has_sent_with_id(false)
{ {
m_rtt.avg_rtt = -1.0; m_rtt.avg_rtt = -1.0;
m_rtt.jitter_avg = -1.0; m_rtt.jitter_avg = -1.0;
@ -687,21 +686,16 @@ class Peer {
virtual void PutReliableSendCommand(ConnectionCommand &c, virtual void PutReliableSendCommand(ConnectionCommand &c,
unsigned int max_packet_size) {}; unsigned int max_packet_size) {};
virtual bool isActive() { return false; };
virtual bool getAddress(MTProtocols type, Address& toset) = 0; virtual bool getAddress(MTProtocols type, Address& toset) = 0;
bool isPendingDeletion()
{ MutexAutoLock lock(m_exclusive_access_mutex); return m_pending_deletion; };
void ResetTimeout() void ResetTimeout()
{MutexAutoLock lock(m_exclusive_access_mutex); m_timeout_counter=0.0; }; {MutexAutoLock lock(m_exclusive_access_mutex); m_timeout_counter=0.0; };
bool isTimedOut(float timeout); bool isTimedOut(float timeout);
void setSentWithID()
{ MutexAutoLock lock(m_exclusive_access_mutex); m_has_sent_with_id = true; };
bool hasSentWithID()
{ MutexAutoLock lock(m_exclusive_access_mutex); return m_has_sent_with_id; };
unsigned int m_increment_packets_remaining; unsigned int m_increment_packets_remaining;
unsigned int m_increment_bytes_remaining; unsigned int m_increment_bytes_remaining;
@ -776,8 +770,6 @@ class Peer {
float m_timeout_counter; float m_timeout_counter;
u32 m_last_timeout_check; u32 m_last_timeout_check;
bool m_has_sent_with_id;
}; };
class UDPPeer : public Peer class UDPPeer : public Peer
@ -795,9 +787,6 @@ public:
void PutReliableSendCommand(ConnectionCommand &c, void PutReliableSendCommand(ConnectionCommand &c,
unsigned int max_packet_size); unsigned int max_packet_size);
bool isActive()
{ return ((hasSentWithID()) && (!m_pending_deletion)); };
bool getAddress(MTProtocols type, Address& toset); bool getAddress(MTProtocols type, Address& toset);
void setNonLegacyPeer(); void setNonLegacyPeer();