Material for Thu 2 mar WG meeting

FYI.
----- Forwarded by Ravi Thuraisingham/Enron Communications on 02/29/00 01:15
PM -----

nevil@ipn.caida.org
02/28/00 10:35 PM

To: metrics@caida.org
cc: (bcc: Ravi Thuraisingham/Enron Communications)
Subject: Material for Thu 2 mar WG meeting

Hello all:

Appended below you'll find a summary of ideas that people have contributed
since we at CAIDA began to get the metrics wg started. Some of this
material has already appeared on the metrics mailing list, the older
material comes from personal emails. I'm posting it again to give us all
some more ideas to think about.

For those who are able to attend the WG meeting in San Jose this Thursday
(2 Mar), I'd like to spend time in the morning getting to know about
attendees interests and experience in making Internet measurements.
I suggest that a good way to do this would be to have each group give a
short (10 minutes, <= 5 slides) presentation of their measurement work.
That should allow us to make a longer list of metrics to discuss in
more detail for the rest of the meeting.

Comments please! I'll send out a revised/updated agenda late tomorrow.

Cheers, Nevil

PS: We now have about 40 people on the mailing list. Please copy all
further 'metrics' email to the list so that we can all see it.
(We now have a web-searchable list archive in place at
http://www.caida.org/archives/metrics/ )

-------------------------------------------------------------
Nevil Brownlee Visiting Researcher
Phone: (619) 822 0893 CAIDA, San Diego

Several people have expressed interest in passive measurements
on TCP streams. One simple example is keeping track of DNS response times.

Daniel McRobb offered the following ..

I also believe a similar simple technique could be used to track TCP
< response times, and this gets even more interesting... you only need
< to be able to see traffic in one direction (from client to server) to
< get a fairly accurate round-trip time from the client to the server
< (irrespective of how far away the client is from your measurement
< host), because of TCPs three-way handshake for connection
< establishment; when you see the opening SYN, you wait for the client
< to ACK the server's SYN ACK and you have the client-<server round-trip
< time. I think you've actually already done some of this? We could
< also potentially try to correlate DNS queries and TCP connections,
< perhaps determining some probabilities in a given environment of a
< host initiating a TCP connection to a host for which it just received
< an A record, and track various application-level response times (I'd
< personally love to have a lot of data on the ratio of DNS response
< time to TCP transfer time for HTTP connections). Even if the
< measurement makes no attempt to discern what constitutes a Web
< transaction to the user (a full page, usually many TCP connections).
<
< Anyway, I think there are some fairly interesting things we can
< measure with simple techniques, these are just some simple ones.
< There may be some interesting things we could do by digging into
< payloads of a few of the other highly-utilized applications, but doing
< it for TCP is a nightmare perfomance-wise. But we now have the basic
< infrastructure to do things with DNS over UDP. We can probably cover
< any open UDP protocol without incurring performance penalties that
< would make it completely unusable. SNMP, for example... while its
< application is limited, network service providers would find
< round-trip time information for SNMP packets from their network
< management stations to agents very useful. I think there are some
< rudimentary things we can do with TCP as well, and I also think some
< sites would be interested in having some rudimentary information. For
< example, a weighted (by traffic) round-trip time distribution for TCP
< traffic to/from all networks with which you communicated (say at /24
< granularity). This gives sites a notion of how close they are to
< those they talk to most frequently. With a little more work, we could
< probably make reasonable bandwidth estimates for every end network for
< which we see TCP data (we could certainly get a reasonable number for
< the maximum observed bandwidth).
<
< These methods also decouple the measurement from the traffic. While we
< all know the value of that, I think it's significant in that active
< probers can be run on the same host, decoupled from the actual
< measurement, and with timestamps coming from kernel space (BPF). I've
< been thinking about this for a long time, because eventually I'd like
< skitter (and other tools) to be able to do use TCP packets for probing,
< with no need for things like IP firewall code; if I can just properly
< time non-blocking connects, and count on the passive tool to see the SYN
< and SYN ACK, I can use any TCP port for round-trip time measurements and
< not trick my kernel by sending TCP packets on a raw socket. I need
< feedback from the passive tool for tracing like skitter (it needs to
< know when a reply has been seen from a hop and when the destination has
< been reached), but it's not difficult (simple bytestream-oriented IPC is
< sufficient).
<
< Going further, I like the idea proposed by some others (maybe funded at
< this point, I'm not tracking it) of instrumenting the freenix TCP/IP
< stacks. A lot of useful information could be pulled from the stack.
< But eventually someone's going to have to come up with what pieces of
< information are desirable enough for someone to want their stack
< instrumented (and it'll have to be somewhere between what's currently
<
< implemented and a full-blown metering system like RTFM). And I think in
< the interim, there are a lot of things we could do using libpcap on our
< local machines, non-promiscuous and in user space (safe, easy to
< implement and test). To me the benefits here are:
<
< - a user is likely to have a tangible, well-understood relationship
< with their workstation (understood by them). This is particularly
< true for those of us with network expertise. So it's at least
< plausible that we can find correlations of measurements with
< user experience in a fairly short period of time, helping us hone in
< on what's useful. Even if we only find weak correlations (for
< whatever reason), once a correlation exists, more people will be
< interested in helping with development because it'll be something
< they'll use personally.
< - we're essentially guaranteed to see traffic in both directions.
<
< I'd personally be interested in trying to find some small sets of
< information that correlate to user experiences, so that it doesn't
< take a terabyte of disk and 64 processors to deal with data from say
< 10,000 desktops. :-)
<
< Anyway, just some random thoughts. The hard part for me at the moment
< is thinking about useful generalizations and infrastructure to support
< them...

Cindy Bickerstaff responded ..

< We're just starting to capture the MSS and window size negotiations between
< timeit and servers to get an idea of what is typical or usual. Wonder if
< Daniel's code could do that too from the various traffic monitoring points
< caida has out there? The data could be used to fill in some parameters for
< trying to model some of the passive data collected. Since a typical web page
< has many elements between a single client and server the first or base page
< gives you a starting idea of what to expect and the subsequent elements are
< like repeat measurements of the same path over a very short time interval.
< Since there is a strong time of day/week effect (volumes and perf), the
< short duration of the data collection from a single web page (incl.
< elements) might give some options for modeling (and maybe getting a better
< estimate of packet loss). I've really enjoyed joining the e2e-interest group
< for the ideas on factors and the references to past modeling attempts (e.g.
< mathis, et.al. paper... ack its bulk transfer focus).

Paul Krumviede highlighted the difficulty of agreeing on details
of how to measure common metrics like availability, reliability,
throughput and latency.

< an example was throughput. since it was proposed
< that this be measured using a TCP file transfer of
< random data, one concern was what does this do
< to latency? and where does this get measured? as
< many customers of this service would not be large
< businesses, measuring this from the customer end
< of a 56- or 64-kbps circuit was almost certain to
< drive end-to-end latency measurements beyond
< defined bounds. measuring it from within points in
< the provider's network(s) might show figures that the
< customers could never see themselves.
<
< the compromise was to define the throughput metric
< as ocurring at access link loads of no more than 50%
< of the bit rate. but how does one measure that? and
< so on...
<
< given that this discussion was recognized as being the
< prelude to the imposition of minimal SLAs, as part of the
< "certification" process for service providers, this may be
< an illustration of the perils of discussing SLAs and metrics
< in some interchangable form. but they may not be easy
< to separate in the minds of some.

Carter Bullard said

< I'm specifically interested in bridging
< RTFM and IPPM work, particularly to describe how an
< RTFM can be a generic connectivity monitor. IPPM
< describes Type-P-unidirection and bidirectional
< connectivity, and a single point RTFM can detect
< these conditions, but standard descriptions for how
< this can be done would be very useful.
<
< Personally, one of the things that I'm interested
< in is TCP based streaming services performance, such
< as that seen in Internet Radio distribution, and of
< course IP Telephony wireline performance metrics would
< be very good to describe.

Nevil Brownlee comments that he's also very interested in
developing the RTFM meter as a platform for more general measurements.
Two areas seem interesting (apart from getting more information
from TCP streams):~
+ QoS / DifServ performance
(what metrics do we need to characterise this?)
+ IPSEC (what metrics will still be useful when we can't
look at port numbers? what kinds of traffic can still be
identified?)

Curtis Villamizar said
< I'm particularly interested in the work on DoS at
< http://www.cs.washington.edu/homes/savage/traceback.html and would
< like to see that on the agenda.

This would provide a general way of determining where packets came from,
not just a tool for tracing DoS attacks.

-----------------------------------------------------------------------