Duke ITAC - July 31, 2003 Minutes

DUKE ITAC - July 31, 2003 Minutes


July 31, 2003

Members present:Ed Anapol, Mike Baptiste, Pakis Bessias, John Board, Dick Danner represented by Ken Hirsh, Brian Eder, Tracy Futhey, David Jamieson-Drake, Scott Lindroth, Roger Loyd represented by Andy Keck, Greg McCarthy, Melissa Mills, Lynne O'Brien represented by Jim Coble, Rafael Rodriguez, Molly Tamarkin, Robert Wolpert

Guests present:Kathy Pfeiffer; Phil Lemmons, News Services; Chris Meyer, OIT-DS; Richard Gutten; Michael Gettes, OIT; Jen Vizas, OIT-ATS; Chris Cramer, OIT-SO; Nhan Vo, DHTS

I. Minutes and announcements

Robert Wolpert welcomes members.

Chris Cramer explains the recent Microsoft vulnerability. Within two weeks it turned into an exploit. At 2PM today port 135 on the router was blocked. This will affect some people, but not the Health System. The plan is to keep the port blocked temporarily and evaluate where we are next week.

Robert Wolpert asks if there are patches we should apply.

Chris Cramer says yes, but there is a bug in the Windows system patch so we have to deal with that first. Also, Exchange servers require port 135 to be open so and Exchange servers will have problems until at least next week. Chris suggests ITAC should revisit soon the notion of blocking ports on the router-what ports do we block?

Tracy Futhey announces that OIT Solaris computer labs are having the desktop interface changed.

Jen Vizas explains that the change allows conversion from Solaris to the Linux environment. The current environment has not been changed for 7 years.

II. ResNet utilization statistics

Presented by Bob Currier

Bob Currier's presentation is a follow up from previous ITAC discussions about ResNet use and how to monitor and deal with students who use a disproportionate amount of bandwidth.

In the past, tools used to track network traffic included the following:

  • Optical splitters on border router fiber
  • Linux-based sniffers
  • Running custom perl code to sniff and parse
  • Data was stored in MySql databases
  • Only 1 minute out of every 15 captured

The tools used in the present are not much better:

  • Optical splitters
  • Linux-based but...
  • Perl-based capture, but Snort post process
  • Awk data distillation 10x speed vs perl
  • Still 1 min out of every 15 is captured, but all hosts are logged vs only the top 250 as before
  • Now we're only saving the IP source address, IP port, and Packet size
  • We generate graphs using php/jpgraph
  • We use a custom perl ldap/cgi code for user identifier
  • NetReg dhcp.conf and dhcp.leases files parsed hourly and the results are stored in MySql
  • LDAP code runs daily
  • Daily totals are generated using numerical integration

Problems with the present method include:

  • Too much data collected?
  • Data is not accurate enough?
  • Data is not useful in big picture

For the future, the tools and procedures we should use include:

  • Use netflow from Cisco
  • Use cflowd/flowscan from Dave Plonka
  • Flowscan uses RRD and RRDgrapher
  • Can produce reports at campus or subnet level
  • Individual identity is "hidden" by aggregation
  • Widely-distributed stats available to network/system admins on real-time basis
  • Open-source so is very tweakable
  • Non-vendor specific (Cisco, Juniper, etc)

Other institutions with a similar problem are doing the following:

  • Columbia using flowscan and CUflow
  • Cornell using commerical app fromApogee
  • U Wisc using flowscan/netflow
  • CMU using custom NetSage/MON/Cricket
  • Tufts using Argus/flowscan/Cricket

Issues ITAC needs to consider include: Lots of data can be useful, but there is a trade-off between how much we need to maintain network operations versus issues of storage space, personally identifiable data, etc.

The faster networks become, the more of a problem measurement becomes. National Lambda Rail will give us an incredible opportunity to experiment with instrumentation techniques and possibly develop new hardware instrumentation and/or software applications for processing data.

One last addition, Dave Plonka, author of FlowScan and other network analysis tools, will be on campus late Sept/early Oct to give a presentation on Advanced Passive Network Analysis Techniques.

Bob shows some live statistics. Bob explains the statistics being viewed live are from 7,000 or 8,000 hosts, but there is no recording of where data is going, what it's doing, or what the content of the data is.

John Board asks if the stats Bob is showing are capturing every minute of is it sampled?

Bob says the software captures every minute, but the data displayed is sampled from the capture.

Michael Gettes asks if it is capturing every packet or every flow?

Bob answers every flow. Bob asks ITAC if we want all this data? Is it accurate enough? How long do we keep it? Columbia University (where the software comes from) is keeping their data for one month.

Robert Wolpert asks if Columbia's one-month retention is a resource allocation issue or a privacy issue.

Bob says it is mostly privacy, but both issues overall.

David Jamieson-Drake wonders why Columbia picked a one-month time period.

Bob thinks they just captured the data until the disk got full then decided it was a good time to throw away the data and restart.

Robert Wolpert asks if there are any requests from Duke researchers for this data for research purposes.

Bob says there are no significant requests.

Tracy Futhey asks Bob to talk about Columbia and their policy so ITAC members can be more familiar with the issues and why this was done, and also why Duke is considering something similar.

Bob explains that Columbia had the same bandwidth issues as Duke. They put together a policy that looks at bandwidth use over 1 hour increments. They give each student a 100-MB allotment. If someone sends out too much data the system catches it and writes a policy that limits the usage for a certain length of time, effectively limiting that person's bandwidth usage rate. When someone gets rate limited, the Help Desk is notified who the person is and that they have been limited in case they call. With the software, Duke can do something similar.

Bob asks ITAC what a limit at Duke might be and what the penalty for exceeding the limit might be.

John Board asks, in terms of our own ResNet traffic are we going to get the integrated version of our data? John doesn't think ITAC can answer Bob's questions until we know more about who the abusers are and how much bandwidth they are using.

Molly Tamarkin asks how Columbia students responded.

Bob answers that from the standpoint of some of the students this was a fair and acceptable policy.

David Jamieson-Drake says we need to keep the purpose in mind: If the point is to keep ResNet use down to an acceptable level, what is the acceptable level?

Mike Baptiste thinks rate limiting is better than blocking.

Brian Eder asks if the policy can change at different times of the day? For example, can we allow more bandwidth use in the middle of the night?

Bob Currier says yes.

Tracy Futhey suggests we should have parallel discussions on the customer support side of this with the other universities using these policies.

III. Report: ITAC network task force

Presented by John Board & task force

Background: The task force presented its findings regarding the state of the Duke network infrastructure and guidelines for facilities being renovated or constructed. A draft of the in-depth report from the Networking Task Force was distributed to ITAC members.

Mike Baptiste relates his experience with the construction of the Center for Interdisciplinary Engineering, Medicine and Applied Sciences (CIEMAS) building. He says one thing discovered is that architecture/engineering firms are not very adept at designing or constructing an IT infrastructure. Another interesting discovery is the need for some control of the network closets-there is a need to have an audit trail in the closets of who came and went. Also, there should be a mandate of emergency power and UPSes on the equipment. The bulk of changes came in the outlet type-instead of pulling cable in the beginning, they planned ahead for pulling cable later by installing conduits, junction boxes, etc.

Mike says things didn't fit into a construction standards document. The committee hopes to complete a recommendations document to address issues like making sure your architect lays out mock up office diagrams so you know where to put the ports, requiring data ports and electrical outlets to be within 16 inches of each other, things like that.

Mike adds that overall where the Duke IT infrastructure is now and where it's going are both good.

John Board investigated the state of the network now, broadly speaking. He concludes that we can be proud of the state of Duke's network. Issues now are how to plan for the future. For example, 10-gigabit links are great and we could put them in today, but we don't need them because none of our 1-gigabit links are anywhere near crowded.

Robert Wolpert suggests that any two people on campus with a gigabit link on their machine could saturate a network.

Mike Baptiste points out that the machine hardware actually can't go that fast even if it has a 1-gigabit network card.

John Board adds that this kind of situation is what monitoring the network and statistics are for. At some point we will get there, but now we aren't close. But we need to keep an eye on it.

John Board says wireless b is the way to go. New technology will allow us to come close to having a seamless wireless network on campus. The challenge is the campus and Health System wireless networks do not talk to each other.

We should also consider a parallel, high-bandwidth network for people with high-bandwidth research needs.

Also, we will need to go to more routers and more division of the network to keep problems contained when they happen.

We need to start considering how much it will cost to get network to off-campus buildings.

As for disaster planning, the Telcom building is a massive point of failure. We should consider replicating everything at another location.

Mike Baptiste adds that there was a feeling on the committee that disaster recovery is such a big issue that we need to possibly address that separately.

John Board says external architecture, NCNI, NCREN is on a good and steady course. Via NCNI we connect to NCREN to reach the outside world.

Scott Lindroth asks if any recommendations will be passed along to new construction projects ready to happen now?

John Board answers yes. In fact, OIT was represented on many of the building committees.

Tracy Futhey points out that these are decisions that should save Duke money over time, but in the short term we haven't closed the loop on that. We need to present a case for why we should do this now.

Robert Wolpert says he didn't see anything on power or cooling in the committees report.

Mike Baptiste says the power per office recommendation is difficult because different office types have different requirements.

John Board says this is a draft. We want comments. We will integrate them and send the report back through ITAC for a blessing.

Melissa Mills asks if there are any issues of research in medical center and in the university being able to connect.

John Board answers that the committee found this is very hard to do.

Robert Wolpert says the committee report recommends a parallel high-performance network should be constructed as needed, but it doesn't discuss cost or policy.

Tracy Futhey says the report references rogue wireless networks, but says nothing about what we should do about this.

John Board answers Tracy by saying there are many wireless networks in dorms and research labs, some more secure and some less secure. The issue is that you cannot interfere with the production server. But until the wireless network is ready to serve everyone, we can't forbid people from creating wireless networks.

IV. Update: SISS and ACES upgrade

Presented by Kathy Pfeiffer and Chris Meyer

Kathy Pfeiffer reviews: We faced two upgrades

  1. Peoplesoft upgrade that took Peoplesoft to a Web-based platform, and
  2. An upgrade of ACES and STORM because wanted more flexibility to address enhancement requests from faculty. We spent most of the year doing the upgrades. We have 16,000 active users of the system. If you add applicants, that's another 30,000 users.

The main impact of these upgrades, particularly for students, is look and feel. But there is also improved performance, simplification of maintenance, and a reduction in operating costs, flexibility, and security enhancements.

John Board asks if there was any load testing done on the registration module? Was anything done to simulate load at beginning of semester?

Richard Gutten says we are going to put into place a queue mechanism that will put students in line and notify them of their position. That way we can control the queue and actually move students through the system faster, we anticipate, but this has not been tested yet.

David Jamieson-Drake asks if there has been an assessment of how the risk structure changes. Are there any new risks?

Richard Gutten says all servers have a failover and backup, but we use sticky sessions so if a server fails the students on that server are kicked off and have to go back to that server.

Melissa Mills thinks the result of the upgrade looks excellent and they did a wonderful job.

V. Other business


End: 5:27