Duke ITAC - March 15, 2012 Minutes

Duke ITAC - March 15, 2012 Minutes

ITAC Meeting Minutes

March 15, 2012, 4-5:30

Allen Board room

 

Agenda

  • Announcements
  • OIT Networking Updates (Registration, Wireless VoIP) - Joseph Lopez
  • Shibboleth Login Screen & Phishing Policy - Richard Biever
  • Research Computing Update - Jim Siedow, Julian Lombardi, John Pormann

Announcements

Robert Wolpert called the meeting to order at 4:02pm.

Tracy Futhey introduced Robert Wolpert, previous ITAC chair as guest chair for this meeting in the absence of Alvy Lebeck.

OIT Networking Updates (Registration, Wireless VoIP)   -  Joseph Lopez

Robert introduced Joe Lopez.

Joe Lopez started by stating that DNS, DHCP,  and  NetReg will be replaced by a commercial IPAM system. 

  • DNS is how we currently access everything by name.
  • DHCP is how we manage the IP addresses that are issued to devices.
  • NetReg is how we manage registration and matching of IP addresses to devices by mac address.

IPAM is a solution that replaces all of this. It is a system that manages this whole space.

NetReg is a home-grown solution and no longer viable for a number of reasons;

  • Requires UNIX that the network group has limited expertise to work with and modify the code going forward.
  • There are issues with it not timing to the system correctly.
  • Issues with new registrations during peak times when the system can't handle the load.

Joe listed the key steps as we are move to the IPAM system.

  • We will put everything behind the F5 for load balancing. This is a major change that means that all the DNS entries will need to be changed at some point. Pointers will be put in place during a four month window so that everything will be in place on Day 1.
  • Web registration and branding will have to be changed and put in place.
  • The communication plan is already in progress. Already did a brown bag, this meeting & CLAC coming up next week. Emails have already been sent out to all the admins with initial dates.

Robert asked who will be affected. Joseph said all. Bob Johnson said that previous devices will be grandfathered & won't require registration. There are currently over 70,000 devices in the system.  Devices that haven't touched the network for over a year will not be grandfathered. This will clean up old devices that no longer need to be in the system. All systems should behave the same way once we switch over to this system.

  • The migration will include train the trainer training.
  • The OIT website will be updated.
  • All documents will be tracked in the Sharepoint space for the project.

Joe showed a slide that showed the current infrastructure. We're going to a virtual environment that is behind load balancers. We also have redundancy and we're reducing our footprint as we go virtual by reducing the number of necessary servers needed to handle the load.

 Joe next explained some of the virtues and values of BlueCat for Duke.

  • Ability to push roles to different schools to manage locally.
  • Ability to still support centrally.
  • Scalable registration will help with high use points.
  • Audit trail and reporting of changes will be available.
  • Support for IPv6

Wayne Miller asked about speed of registration.

Joe said speed should be much better.

Robert noted that increasing number of devices don't have keyboards. Will this work well for them?

Joe said "yes" web-based interface with nice features like mac address detection by the system.

Next Joe showed a mocked up version of the registration page. The real version will be ready soon. The page will be integrated into our authentication system. The system will also allow guests for short-term connection. Policy has not been determined for this. The guest mode is temporarily disabled and waiting on policy.

Robert asked if  guest registration will be associated with an actual account.

Joe answered that, yes, details for how this would be handled are still in the works.

It was clarified that this means for the guest access to the "real" network, not the "guest" network.

 

New interface is a lot cleaner, easier to use than the old NetReg interface.

  • System is dynamic, so it will look at devices and automatically gather the mac address.
  • Users/managers can go in and review all registered devices & then register and unregister devices.
  • Currently there is no limit to number of devices.
  • Students have a limit of 5-10 devices. This can be changed as needed.

Tracy noted that occasionally we have an issue with improper use of a device. Cease and desist issues will be linked to the registered user, based on registration information from this system. This poses a challenge if devices are being registered by system administrators for users other than themselves.

Joe stated that people will have to register to the proper IDs to keep this from being an issue. Some can be registered with a system ID versus personal ID.

Tracy said "We'll need to think about how we do this in cases where admins register devices for others in schools and offices."

Joe answered "We should be able to modify and cleanup over time for old accounts. This will need to be cleaned up over time. It won't be a part of the migration."

It was noted that the current mode is that the admin gets notice if something is wrong. Users don't always deal with notice that there is something wrong with their device.

A question was raised about how far off  IPV6 is.

Joseph said that from the network side we've been testing things and most of the web's not ready. We're going to continue testing. We did an IPv6 day not too long ago. None of the web pages worked and there were really a lot of issues.

Bob added that this project will help support the migration to IPv6 from IPv4.


A question was also raised to whether we can be in IPv6 when the rest of the world in still in IPv4. The answer is that we can, but would have to incorporate some translation to make this work.

  • Project timeline moved back from April 6th to the 14th (tentative)
  • The old system won't go away. Old users will be notified to move to the new one.
  • Trinity will be first.

A test environment is planned for the first part of April. This will be a small part of Trinity. The failover is to revert to old system if there is an issue. The test is just one department within Trinity. 

Bob - To Duke campus, all except NetReg will be transparent. The target for complete transition will be 4-6 months for people to make the necessary DNS changes.  The new portal is pretty straightforward for registering devices after the change.

Wireless refresh is taking place.

  • Replacing a lot of APs with the 3500 model.
  • Clean air technology
  • Upgrading switches to get Gig connection to devices
  • Support for enhanced IPv6
  • Redundant controllers that control all the APs
  • Better tools for Identifying weak signals, interference and failed APs

 

Tracy noted that we have approximately 3400 wireless units. 3000 are being changed out now because we're at 4-year refresh. This will also add features even though we're in the same family of APs.

A committee member asked if wireless access points can operate on more than one network at a time.

Bob & Billy answered that technically it's possible,  and we're working through the risks and how to best do this.

 

UCS Enterprise VOIP project

This is a huge project over the next few months.

  • Moving the whole voice system to UCS.- Cisco Unified Communications Platform
  • Currently in the design phase
  • Will touch 32000 devices and all call centers
  • Will add cell phones and soft phones
  • Portals for web web access for voicemail, phone resets and syncing with Exchange
  • Lots of nice features will be enabled

Bob said that this should be transparent to the users. He also wanted to be sure that all go the announcement about the need for 10 digit dialing starting on April 1st - not related to this upgrade.

A committee member asked if SIP access will be just inside Duke or outside? The Health system needs an inclimate weather system.

Joseph said that this should work outside Duke. Tracy noted that we should test.

Susan Gerbeth-Jones asked if switches will be replaced.

Bob said that we are currently assessing need. We will upgrade where needed. If there are any issues with connectivity and speed please pass along through remedy and service now.

 

Shibboleth Login Screen & Phishing Policy  -  Richard Biever

Richard started by saying that we received a request to modify the Shibboleth sign in screen. He showed new screen with link to self service site & help desk resources.

Tracy asked Richard to explain why these are links.

Richard responded that Phishing can try to obfuscate the links so we are showing the actual links. There is also a note that describes the link and check to be sure that the link is legitimate.

Richard then showed error screen you get if you enter the wrong password. It offers links to reset password, and links to the helpdesk. It also allows to you try again.

The password reset process is done by challenge and response questions. You are given three questions to answer. We ask you two of the three. We are contemplating increasing to 5 questions when you initially setup. Once you successfully complete the answers, you are asked to enter your new password, confirm and submit.

Robert asked if there are guidelines for acceptable passwords.

Richard replied that are making some changes. We have password strength meter.  The meter will popup while you enter the new password and provide guidance.  Medium strength is required. Dictionary matches are not allowed.  Tips on making stronger passwords are provided. Anyone interested in testing should contact Richard.

A committee member raised a point that DHTS has been sending out emails that warn users when their password is getting ready to expire.  The constraints in DHTS aren't the same. They are more rigorous.

Richard said that they are now the same. Academic requirements have been increased. DHTS requires more frequent change of passwords.

A follow-up question was whether the university will move to 6 month expiration.

Richard said "no."

The update on the policy of vulnerability scanning is on security website in draft form. No changes. This is a formalization of what we are doing.

  • Run regular scans.
  • Grant exceptions for mitigating circumstances.
  • Working on quarterly reports.

Phishing recap;
·       Increase in accounts affected by phishing this year.
·       Resulted in tweaks to the spam/phishing blocking level.
·       Looking at long term solutions to help combat the threat.
·       Accounts that are being used to send spam or phishing messages can be locked.  Tracy clarified with Richard and Mark McCahill that this means users who fit a spamming profile.  The messaging team does review to ensure that only the compromised accounts are locked, not those sending legitimate mail.
·       Piloting digital certificates for email.
·       Also looking at options for users to recover quarantined messages from the spam system.

 Robert noted two types of stops "quarantined and blocked"

Richard and Tracy noted that ~90% of all messages are quarantined or blocked.

 

Research Computing Update - Jim Siedow, Julian Lombardi, John Pormann

John Pormann showed a presentation updating the status of research computing. The good news is that there have been some updates made to the cluster computing environment. The bad news is centered around the budget.

Storage has been a major problem for a while now. We've been seeing 65% annual growth for years. For about a year we have been having to tell profs that there is no increase in quotas. We did another tray of discs to add some capacity last March. Three weeks ago we were able to deploy some new storage.

The old system was a NetApp 3070 unit. There was only one of them. It had 4, 1Gig connections going into it. The new system we've got is actually two independent file-servers that are 3270's. Each server has a 10Gig pipe going into it. They are upgradable to 2 10Gig pipes, each.  We've also added a NetApp Flash Cache Card. This is a memory card that caches recently used files. Users who are running jobs against the same basic filesets should see a significant improvement in performance.

Robert asked how big the cache is? John answered that it's 512Gig on each filer. This may be possible to expand if we see a need.

A question was raised to whether this might also possibly extend the life of the disc drives?

John replied that it's possible.

This also means that we have failover and redundancy as well. We will no longer have to take the whole system down for maintenance.

We are also looking at how to corral high end users so that they don't slow everyone down. We now have some resources that allow us to adjust if any users start to monopolize the resource.

Randy Haskin asked why Netapp vs EMC?

John [TF1] said that a lot of it came down to performance. The performance of NetApp was better than EMC on Duke cluster through connections available.

The new deployment also meant that we could re-use old discs and add capacity. This was another advantage from staying with NetApp. We're now in a position where we can add storage lease options to users.

There are two types of storage available; Fast SAS $1600/TB/yr or Data @ $500/TB/yr. These are for cluster storage.

We are also doing research on research storage.  Pricepoints for EMC/ Iomega were aggressive. Purchase price is $250/TB with 3 years of service through CDW-G.

We're also currently looking at Intel GPU and CPU chips that are on the same piece of silicon. We're not sure it's of value for research computing. They sent a loaner system with 2.7 CPU that shows 3.0 Ghz performance.

We did get our first order in the DSCR for a GPU enabled compute mode. It's a blade based NVIDIA Tesla with ~ 1TFLOP processing power.

 

Research Computing has hired Angela Zoss in Visualization. She'll be here in June as she's currently teaching at Indiana University. She has a great background in network based visualization, GIS based visualization and a couple of others. She has an understanding that interactivity is going to be a big part of what visualization becomes.

 

The Condor grid environment is still not an official service for us. We are starting to stabilize the environment to a VM based infrastructure.  This will give us a lot easier reset capability in case anything does go wrong.

One of the reasons we selected Condor is because of what they call a flocking mechanism.  Basically this allows campus-wide pooled resources to be shared with at department level and department resources to be shared at the campus level.  Both sides get full control over their own resources.   Access can be shared to other users, with priority to owners of the resource.

DSCR/Condor support:

We've been running with absolutely no back-end support for grid engine, - the primary batch scheduler for the DSCR. When things went wrong we had to self-support and figure out where things went wrong. We just finalized a support contract with Cycle computing. They have expertise with grid engine, Condor and some cloud computing areas. This means we have another set of eyeballs that can look at the system and help when there are issues. Primarily they are helping with grid engine and Condor, but they have been open to helping with other issues.

The bad news side is that the budget is off.

It just hasn't been what we were expecting, though some of it we knew in advance. The initial budget included a bunch of fees - $200/machine for machines on the Duke shared cluster. Because of timing, most were grandfathered because word was not communicated in time for users to budget for the cost.

Revenues for SCSC are off based on slow uptake in fee-based services.

There was discussion about why this decrease in use wasn't expected.

NSF money has become less available. Was Cloud and Fog solution communicated?  Is it competitive with other solutions? Are we communicating that this service is available here? Flexibility of SCSC isn't as good as alternatives. Are potential new users getting contacted?

We have gotten a few new users, but not what we'd hoped.

Corrective measures to improve current financial position were discussed such as delayed equipment refresh and possible salary savings.

A question was raised regarding what happens when an owner of a Condor blade can't pay the fee. John answered that we could turn the blade over to user, but we don't have a plan.

A follow-up question was whether might happen very much. John replied that  we have a case with 110 machines where this is the case. The machines were purchased with grant money, but no money was allotted for service.

It was pointed out that this might be the case with most machines.  John replied that we're hoping that they can find some other funding.

Robert asked if there is a chance of rethinking the process and system.

John answered that we knew that this year was an experimental year.

Jim Siedow added that we anticipated the drop-off in blade purchases, but not the Cloud.

Julian Lombardi added that the grant system is evolving with hardware and support financing. This is a 2 million operation. Half is supported by the provost. We are seeing if we can transition to a better distributed computing model.

Another question was whether this started about a year ago, and whether grant requests anticipated this fee. There was also a question about providing a boilerplates so that they will?

Molly Tamarkin added that Angela's office will be in the library, we should have a welcome for her. As we start to work with faculty on data management there could be an opportunity to pass along information.

There was discussion about looking at other options like Amazon and Penguin.

John explained that we don't have a good idea how much is being done with outside cloud computing. Accounting doesn't have a code for cloud computing.

Robert asked how the committee could help.

John replied that getting the word out that the services exist and are cheaper than what they might be paying in the market.

A question was raised about flexibility of the system. John said that we probably won't be able to offer much more flexibility in the next 6 months.

Julian asked if it accurate to say we offer software as a service, whereas Amazon offers infrastructure as a service.

John confirmed.

There was discussion about getting the word out to users of popular software such as Mathematica.

 

Robert adjourned.

 [TF1]I'm guess this was supposed to reference John Poorman... not sure who the Richard is otherwise....