HP3000-L Archives

April 2002, Week 4

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Steve Dirickson <[log in to unmask]>
Reply To:
Steve Dirickson <[log in to unmask]>
Date:
Wed, 24 Apr 2002 21:57:02 -0700
Content-Type:
text/plain
Parts/Attachments:
text/plain (179 lines)
> Can we agree that TCP/IP does ensure that if transmission
> over TCP/IP is
> successful, then the recipient got what was sent?

No.

> And, as such, while
> application errors and resulting data errors may abound, check sums
or
> application level acknowledgements which amount to "I got
> your transmission
> #123,456" are redundant, at least for ensuring that the
> transmission was successful?

Sort of.

The reason? *Your* (and my) idea of "the recipient" is completely
different from TCP's idea of "the recipient".

TCP provides reliable, non-duplicated, in-sequence delivery of a
stream of bytes (or notification of the failure of that delivery) *to
the protocol stack on the other end* of the connection. That is of
basically zero interest to you, or to me. What we want is reliable
delivery of meaningful-to-the-application data from the application on
this end of the connection *to the application on the other end*.

When TCP on my end receives an ACK for a packet, it means that that
packet was received and verified by the protocol stack at the other
end. It does not mean that the packet--or, of greater interest, the
application-level data transfer of which that packet is a part--has
been received, or will ever be received, by the application to which I
am trying to send it. The application may be busy doing something
else, but might get to my data later. The application might be stuck
in a loop, and will never receive my data. Depending on the platform,
the application may not even exist: it may have failed in such a
manner that it was aborted but the socket connection was left open.

Real-world story #1: A number of years ago (in the days of 16-bit
Windows), I designed the parts of a distributed application. The
client end was to run on Windows. Shortly after we started work, MS
came out with their Winsock 1.1-based socket classes for the MFC
framework. My code was already very close to what they produced, so I
modified it to use their CAsyncSocket class. Worked fine--or so we
thought.

Some months later, when we brought up a new component of the system
that required much higher throughput, we started having problems: data
disappeared. It left the server fine, and no errors were reported to
the server, indicating that everything made it to the client's
protocol stack, checksums matched, ACKs were exchanged, etc.--but the
data was not making it to the client application. It turned out that
that the design of the application was not compatible with the
window-message-notification system used by the MS socket classes. TCP
was working perfectly, but the application was broken. The high-level
belief in the reliability of TCP-based data transfer was not supported
due to my invalid assumptions about what went on in the intervening
layers.

> This is the argument I have with certain mainframers, who want files
> transmitted over a reliable medium to include header and
> trailer records
> (just like, what, a card deck?), some of which do not even
> include useful
> checksums, totals, or record counts (just like a card deck),
> but are no more
> than BEGIN and END (just like a card deck). If ftp worked, I
> got the file,
> and no other validation that I got what was sent is
> necessary. If it failed,
> both ends can tell that it failed, although the receiving end
> might not be
> careful to confirm that.

I agree that redundant framing that adds no value is a waste of time
and effort. However, that doesn't mean that the application shouldn't
be aware of--and concerned about--the details of the transfers going
on below it.

Real-world story #2: I occasionally have the need to transfer multiple
gigabytes of files--sometimes multiple dozen gigabytes--between
drives. The reasons are usually hardware-related: upgrading to a
larger/faster drive, sometimes an entire new machine, etc. I always do
those transfers as a copy-everything, followed by a
compare-and-delete. Why? Because, on three occasions over the last
10-12 years, the post-copy compare has revealed that the destination
file was not an identical copy of the source file. Some number of
bits/bytes had gotten damaged somewhere between source and
destination. The copy operation had gone bad.

Obviously, there could be a number of reasons for the error:
uncorrected disk-read or disk-write errors, network error, cosmic rays
zapping a byte in the buffer, etc. However, in all three cases, the
error was on a machine-to-machine cross-network copy; I have *never*
had an error on a same-machine disk-to-disk transfer. Did TCP fail to
do its job? No idea. Personally, I'd be much more suspicious of a
disk-transfer error--if it weren't for the fact that the failures
occurred exclusively on transfers over the network. But the bottom
line is that there were errors (somewhere), but the connection was not
broken because TCP was dissatisfied with the state of the connection.
IOW, higher-level components that assumed everything was OK because
they didn't hear about problems (perhaps because they didn't ask) from
lower-level components were misled.

> So, if someone's protocol sends a message to a socket on my
> system, I do not
> need to explicitly send a reply at the application level.
> TCP/IP either ACKs
> or NAKs. It's awfully fun during debugging and development to
> see such a
> message, but in production, it's wasted.

Other than survival-critical systems, application-level integrity
checks and message-by-message ACKs are probably overkill. But that
doesn't mean that you should blindly push the "I believe" button when
TCP is being used. Fortunately, basic idiot-checks on TCP-transmitted
data are (unlike me) both cheap and easy.

As previously discussed, there's no such thing as a "message" at the
TCP level. However, most real-world applications *do* send messages:
the next 'x' bytes of the file, the next record from the database,
etc. Which means the application has to frame the data stream into
messages. The most common way to do that is with short "header" blocks
that define (at a minimum) the length of the message. It is close to
zero-cost to put a sequence-number field in that header. As long as
the receiving end gets all messages, it is (probably) safe to rely on
TCP's internal mechanisms. If the application-level sequencing is not
maintained, the application can take appropriate recovery action.

I say "(probably) safe to rely on TCP's internal mechanisms";
apparently "always safe" is overly optimistic. Real-world story #3,
from a correspondent on another mailing list:

"I have a client/server application where the client half runs on a
Pocket PC 2002 device and the server half runs on a Windows 2000
desktop....The Pocket PC device is attached to a Motorola Timeport
phone and I use the iStream software from VoiceStream to establish a
connection to the internet. My server is connected through my
company's LAN through a firewall to the internet....Then, each side
proceeds to send data on this socket on one thread, and receive data
on the socket on another thread.  The problem is that sometimes the
recv call executed in the server-side of my application is getting
what appears to be corrupt data.  It literally looks like there are
bytes missing in the stream.  I haven't been able to detect any kind
of pattern in when this happens either.  (It's not happening every x
bytes, for example, or for packets where the send call exceeded a
certain number of bytes.)

The data that is sent from the server to the client is never
corrupted; only the data sent from my client to the server.  Also,
this doesn't happen when I use a different mechanism to connect the
Pocket PC device to the internet (a card in the extension sleeve, for
instance).  The biggest surprise is that the call to recv in the
server code is actually returning a valid return value, not
SOCKET_ERROR or 0; the socket seems to actually think the data coming
in is valid."

The correspondent didn't come back and tell us what was found, what
was done, etc. do resolve the problem--if, indeed, it was resolved.
But, again, "if TCP says it's OK, it must be OK" would seem to be a
bit optimistic.

FWIW, most people apparently aren't aware that UDP is, on a per-packet
basis, just as "solid" as TCP with respect to data integrity: both
protocols protect the data portion of the packet (IP already protects
the header), and both use exactly the same checksum algorithm. The
difference between the two being that, when TCP determines that it got
a bad packet, it will ask for another copy; UDP will simply throw the
bad one away without mentioning it to anyone.

But, going back to the basic question at the beginning: "Can we agree
that TCP/IP does ensure that if transmission over TCP/IP is
successful, then the recipient got what was sent?" Regrettably, "we"
can not.

Steve

* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *

ATOM RSS1 RSS2