Duke ITAC -January 17, 2019 Minutes
ITAC Meeting Minutes - Thursday, January 17, 2019
4:00 – 4:05 – Announcements
Ken Rogerson warmly welcomed everyone as the meeting convened and minutes from August 9th, 2018 and August 30th 2018, attached to the agenda, were motioned upon, seconded and approved.
4:05 – 4:20 – Connectivity Update, Bob Johnson
What it is: The presentation will review recent updates to local- and wide-area connectivity at Duke, including the introduction of a regional fiber ring and improvements to wireless provider coverage.
Why it’s relevant: As access to top-of-the-line technology becomes increasingly critical to Duke’s academic mission, getting and staying connected is of the utmost importance, and Duke continues to evolve its services to meet this need.
- Regional Fiber Project and Triangle Fiber Group (TFG) project was initiated to allow Duke to cost effectively control connectivity to critical assets in the region. first major milestone was accomplished on December 19th, 2018 with the completion of the Core Ring. additional fiber to connect from the Ring to individual site connections (~80) are scheduled to be deployed incrementally throughout 2019. re are 7 sites currently utilizing fiber and higher bandwidth (DUML, Erwin Mill as examples).
Duke’s Triangle Fiber Project consists of:
- 80+ sites to be migrated to new Fiber Ring
- ~60-mile radius conduit for 120 miles and 432-strand fiber installed
- 15+ 10g sites
- 10Gx10G and dark fiber connections between the Health System and Campus datacenters
The Old & New Ring Layouts are as follows:
- Inner ring consists of ATC, Telcom, Erwin, Friedl Building, Durham Ctr and the Old Coke building, originally constructed more than a decade ago on a very limited downtown Durham footprint.
- New Outer ring consists of University Tower, Watts School of Nursing, DRaH, NC-1 and Durham Regional.
Possible Future Triangle Fiber Circuits include:
- Leveraging Triangle Fiber Project Circuits helps to reduce OPEX by consolidating multiple service provider DWDM transport fees
- Installing a 100 Gb/s capable edge router at NC-1 will provide enhanced geographic resiliency for both Duke OIT and DHTS
- While further analysis is conducted, an interim solution utilizing three-edge routers will be deployed.
- conclusion, all 80+ sites including the Health System and Campus will be allowed to connect to the fiber network at an equivalent cost of their existing carrier expense, but for an order of magnitude more in terms of added benefits, so sites that could only afford 1 GB, would now have 10 GB along with higher bandwidth and redundancy to ensure that no site is disadvantaged in the event of a fiber cut. Although logistically challenging, this is a very well received effort and truly delivers a win-win outcome as we expect a 6-7-year Return-on-vestment that will meet the ever-growing demands of high-speed, bandwidth, and redundancy as an affordable and reliable solution.
Q1: Visualizing the ring around the triangle, are there similar rings and overlaps in effort?
A: The ring intentionally runs towards UNC-CH and also past NC State as well as Durham city. In addition to our 80 sites at Duke, there are 80 more sites from Durham City and County on the same fiber footprint that will connect and run their own networks.
This project can be viewed as a replacement and renewal of the 25-year collaborative ownership and partnership between major area universities and MCNC which is now in its 22nd year.
4:20 – 4:40 – Wireless Test as a Service, Richard Biever, James Nesbitt, John Robinson
What it is: Though our “commercial strength” WiFi environment serves us well for many purposes, students and faculty alike have made us very much aware that it is often incompatible with consumer-grade wireless devices that are proliferating in the IoT era. OIT has partnered with Cisco Systems for testing consumer grade wireless devices to determine how to best configure Duke’s wireless network infrastructure. The purpose is to prove out any architectural changes and provide test data to demonstrate proposed wireless network improvements. The project will take place in a phased approach, building upon a baseline, with each phase introducing agreed upon devices for testing.
Why it’s relevant: Wireless connectivity issues continue to challenge institutions throughout higher education. With this leading-edge collaboration, Duke is leading the charge to tackle this pervasive issue. ITAC members are invited to share feedback as the project prepares for future phases.
The goal of this effort is to fine tune our wireless environment and also partner with Cisco Systems in developing a pilot project to package our residential network layout and configuration as a service that may be benefit other Higher Education institutions as well. Our wireless stability has been steadily improving with on-going efforts but there are some challenges due to interruptions and disconnects based on code, hardware/software configurations of various device types, and an overall proliferation of IoTs.
The development will proceed in multiple phases:
- Firstly, the lab at Cisco will replicate the layout of one of our residence halls as a prototype in a controlled environment so as to capture the number of devices and device types with connection settings and building construction configurations (walls, ceilings, metal framing etc.) around wireless routers and hubs to gather connectivity data and establish a baseline.
- Secondly, after the lab at Cisco is operational, we plan to take Cisco’s prototype and use Trinity Hall to implement configuration changes and re-define, document and re-establish our baseline to validate findings at Cisco’s lab. This allows us to have a test plan to prioritize and analyze student feedback instantly for code validation without distorting the original baseline.
In conclusion, this effort will be a worthwhile exercise in defining and documenting our baselines, validating our configurations and Cisco will be able to provide a Proof-of-Concept to other research universities that are also interested in this service.
Q1: Is Cisco replicating our device registration protocols?
A: No, they will replicate our router configuration paths from clients to the internet, their usage and scenarios of running various device simultaneously.
Q2: What is the level of optimism for different types of consumer devices that may only be found on a home network?
A: At the macro level our goal is to have as many devices to work but we recognize the challenges in working with an enterprise network vendor and will need to approach any modification in phases.
Q3: What is the timeline for this rollout?
A: There’s no defined end date as it is a rolling project. We’re in the first phase of testing at the Cisco lab and then we will thoroughly test at OIT, and finally complete our testing in the campus dorms.
Q4: Do we anticipate any wireless connectivity issues when we roll this out?
A: Upon roll-out it will have a configuration that exists today and will be modified in phases so we can obtain feedback and troubleshoot most issues before the final phase.
4:40 – 5:00 – Increasing Virtual Computing Manager Capacity, Mark McCahill, Michael Faber
What it is: Virtual Computing Manager (VCM) is a Duke platform providing access to a variety of computing resources, including both Linux and Windows-based virtual machines. During the fall semester, VCM saw significant and unexpected growth in usage. After gathering ITAC feedback, we implemented changes to extend VCM's capacity: a new policy to automatically power down VCM virtual machines each morning, the option to opt-out of automatic power downs of machines, and additional hardware added to the VCM cluster to meet demand for the service.
Why it’s relevant: VCM allows Duke users access to a variety of computing resources needed for coursework, hosting short-term technology exploration, class projects, or accessing specialized software without installing or licensing on your own computer. This new approach has been successful in maximizing the capacity of the VCM physical hardware with limited impact on users.
Duke’s Virtual Computing Manager augmented students’ computers with remote access to Virtual Machines (VMs) running pre-installed coursework-oriented application suites as follows:
- 2008: VCL (Virtual Computer Lab)
- 2013: VM-Manage -semester-long VMs & containerized applications (RStudio, Jupyter, etc.)
- 2017: VCM - (Virtual Computing Manager) - more VM images, semester long Windows VMs
Virtual Computing Manager allows:
- access to virtual machines and software for academic, student, and coursework purposes
- semester-long Linux & Windows 10 reservations
- automated application suite builds via Ansible and Chocolatey scripts simplify staying current on security updates
- faculty and staff outside OIT to update their VM templates via Gitlab’s continuous integration
In late October we implemented a policy change as follows:
- Default VMs to power down each morning at 6:00 AM instead of staying on indefinitely
- Users can opt-out of the power down policy
- The new policy will be implemented in phases so we can judge the impact and adjust our approach if needed
Spring 2019 will include the following next steps:
- Track the power up/down and opt-out behavior
- Look into tooling for VCM transient peak demand via cloud bursting (Azure/AWS) using automated power down
- Implement port automated power down code to research Toolkits RAPID VMs
Reservations for 3 different clusters & audiences include the following eco-systems:
- students/coursework: vcm.duke.edu
- faculty/researchers: rtoolkits.web.duke.edu
- admin/departmental: clockworks.oit.duke.edu
In conclusion, we’re experiencing somewhat of a “Success Disaster” due to the unexpected growth in usage and ease of obtaining computing resources as we continue to monitor and add capacity to meet our on-going demands.
Q1: If I’m actively using my machine at 6:00 am, will it be rebooted?
A: Yes, if you don’t opt out. Also, once a week the machines need to be powered down for security patches even if you opt out.
Q2: What if someone grabs as many spare cycles for computing resources in terms of CPU and memory?
A: That is a perfect example of a scavenger queue which obtains excessive resources until no one else requests more, and as soon as the resources are needed elsewhere, the oversized VM will automatically scale down. It’s a good practice to submit batch jobs in such scenarios.
Q3: Are the administrative and departmental VMs under the same constraints as far as sizing?
A: There are separate configurations for that group and our monitoring shows if a machine is swapping or has other capacity related issues, that gets our immediate attention.
Q4: Is there a policy on the default quotas for the number of VMs, for example if a student is enrolled in two classes and wants two VM images etc.?
A: We can look into increasing the default quotas for such cases.
5:00 – 5:30 – STINGAR, Richard Biever, Jesse Bowling
What it is: STINGAR, or Shared Threat Intelligence for Network Gatekeeping and Automated Response, started as an initiative led by Duke to generate threat intelligence about attacks against Duke, dynamically and quickly block those attacks, and share the information with other schools. Duke is now the recipient of two NSF grants to further develop and deploy STINGAR with other higher education partners. Richard and Jesse will be discussing the STINGAR architecture, progress around the two grant efforts, and what we have learned from the data generated thus far.
Why it’s relevant: Higher education continues to be a prime target for cyber attacks. STINGAR not only benefits the Duke community, but, by pioneering this initiative and extending it to other universities, Duke has introduced an innovative threat intelligence tool with a much broader reach.
An update on Duke’s approach to shared threat intelligence was discussed.