Duke ITAC - August 15, 2013 Minutes

Duke ITAC - August 15, 2013 Minutes

ITAC Meeting Minutes

August 15, 2013, 4:00-5:30

Allen Board Room

I.    Announcements

·     Duke Mobile – Last week an email was sent out to ITAC for feedback on Duke Mobile (mobile.duke.edu). Results will be accepted until the end of next week.

·     Minutes from 4/11/13 & 4/25/13 – Status: Approved

·    The first ITAC meeting of the new semester will occur in two weeks on 8/29. At that time, Ashutosh Kotwal will step down as chair and we will welcome Mark Goodacre as the new chair. There will be a reception during that meeting.

II.   Agenda Items

Network Outage – Bob Johnson

On July 26th, there was a major network outage which had a widespread impact on Duke services. There is redundancy built into the data centers, both in between and within. There is a primary and secondary router. The specific event that caused this problem was a group that connected a 10 gigabit server with multiple links which caused a loop. The loop overloaded the two core routers that the primary router connects up to so that nothing could get to the primary router and nothing could get from it. All the services within the data center and that router were healthy, so none of the failover measures kicked in automatically. After about 29 minutes, it was determined that the primary router was not moving traffic so the secondary router was brought up and most services resumed normally. The most notable service that did not come back was Paging and this cause a Code Black in the hospital. This can have a serious impact on patient care. The root of the problem with paging was that the secondary failover application was connected to the router in the primary data center. Because the service was down for a good period of time, there was a large backlog of pages (over 5,000) that had to clear the system once the service was restored. At around 5:15pm, those backlogged pages cleared. 

Lessons learned: 

There will be a continuing need for people to connect up high-speed, particularly in the research areas. The new network design is more intelligent than in the past and we are building it such that the intelligence is moving out to the building edges. If problems like this arise in the future, they should be limited to the building only and should not make their way up to the core network. However, there is still some old technology on the network so some caution must be taken in the meantime.

Questions and Comments:

·       When was the last time something of this scale happened? To the best of our knowledge, twice in the past two years. More frequently, issues occur that have more impact locally and aren’t as widespread throughout the university. The new network equipment is more capable of handling the kind of load that we saw in this situation.

·       Impact to the hospital was serious since the primary mechanism for communicating emergency response codes is via paging. Fortunately, there were not any terribly negative problems during this outage because the problem was identified quickly within the hospital and alternative methods of communication were set up. This highlights the need for more change management and communication among groups.  If the proper group had been notified before and/or during the change, its impact on the network could have been monitored in real time. And while it may not have prevented the outage entirely, perhaps the length of time it took to resolve the issue could have been significantly reduced. IT personnel in both organizations are well aware of this, but with an open network, there are a lot of groups in the community that may not be aware of their potential impact on the organization as a whole.

·       This particular issue took 29 minutes for individuals to diagnose and intervene. Could it have taken longer? When the core router hits 99.9% visibility is lost to the router. The root cause could be just about anything (e.g. a fiber cut), but because you lose that visibility it takes a little extra time to diagnose. The current monitoring kicks in when the CPU reaches 50%, but because this issue scaled up so quickly, the monitoring couldn’t catch it in time. Very little could have been done to prevent this issue aside from knowing ahead of time what was being changed.

·       How long did it take to clear the 29 minute backlog of pages? For the over 5,000 pages, it took 1 hour and 45 minutes to clear.

·        Is paging technology outdated? Do we need to stay with this technology or are there better options? There are two types of paging: wide area and in-house. The in-house paging has been needed up until this point because cell coverage has not been reliable enough whereas the frequency at which we were operating paging could reach just about anywhere. Once the DAS project is completed in about 18 months, we will no longer have a need for in-house paging. The hospital will be moving to a more generic, notifications gateway type system that will allow users to choose their preferred method of notification.

Tableau Reporting, Update and Future Direction – David Jamison-Drake

What is Tableau? It is data visualization software that can create pictures and charts based off of data that is presented to it. Data is interactive and you can generate reports with it. 

We have a licensed server in-house so that institutional data can be secured. One example of securing a report would be to link a BFR (Budget and Financial Reporting) Code with it that would allow only users associated with that BFR Code to access the data. 

Tableau also has a free service that can be used in situations where data security is not an issue. The Duke Library is equipped to assist users with this free service[7].

Demos: 

·       Hans Rosling’s data visualization of the developing world over 60 years[8]

·       U.S. News Rankings of school endowments

Google has a similar product for free, but it is not supported by Duke and when changes are made things can break. Tableau has been integrated with Duke's single-sign on technology.

Current statistics:

·       1814 views

·       260 workbooks

·       37 report publishers

·       664 viewers with secure access

Goals for this year:

·       Security for reports is currently done on an individual basis.  The preferred method going forward would be to set up security by role via IDMS. 

·       Summer prototype of exporting database from enterprise server in a standard, clean format with standard data definitions that can be fully documented.  Doing this would allow groups to write reports against each other’s data sets.

Questions and Comments:

·       How much time does it take to do the programming behind these reports?  It is a very user-friendly platform which allows you to do a lot of nice visualizations in a short period of time with little effort.  The Hans Rosling report was generated in about 30 minutes and most of that time was in downloading the data off the website.  With the data loaded into Tableau, the end users have the capability of modifying the view of the data as they like – it is not simply a static reporting tool.

·       How is the data moved to the Tableau server?  There are two methods:  push and pull.  The pull method allows near real-time refreshing of the data so that departments such as the Admissions Office can be entering data and check their reports against the updated data as they go.  Every time the report is opened, it will display the most recent data.  The pull method is accomplished by hard coding credentials on the report and configuring it to connect to the source server and refresh the report at a defined interval.

·       Is there a data warehouse behind Tableau or how is the data stored?  Behind the Tableau server is a SQL Server engine.  The data can be a single flat file or a combination of files.

·       Are the research units beginning to use this tool?  Faculty are encouraged to use this tool for institutional, private data.  Data that might be used by graduate students available from public sources are still encouraged to work with Perkins Library on the free Tableau.

·       Do we have an Enterprise license for Tableau?  Yes, for unlimited seats on a single 4 core server.  It should go up to 12 cores at which point we would need to buy another license.  The licensing allows for running a production, development and test environment. 

Service Desk Update and Service Metrics – Debbie DeYulia, Paula Batton, Susan Lynge

Who is contacting the service desk?  Requests are mostly from students and staff.  The “Other” category is much smaller but includes parents during registration, move-in, etc.

What do they contact us about?  NetID issues make up the largest category (including password issues, not being able to log in, etc.)

How are they contacting us?  Chat looks small on the chart because that capability was lost during the move to Service Now, so it only includes about 3 months of usage.  Because it is one of the lowest cost options, the Service Desk is going to try and push this method.  Web-submit category:  User entering ticket directly into Service Now or filling out the web form available on oit.duke.edu/help.

Resolution Times are calculated from time the ticket is opened until there is a mutual agreement of problem resolution calculated by business hours.  Email typically has a high resolution time because there is a lot of back and forth, information provided in the request can be limited (e.g. “I can’t log in” with no further details provided) and a lot of time can lapse between responses.  Walk-in also has a longer resolution because those are typically people dropping off a computer to be fixed.  10 days after resolution, if the customer does not respond to confirm that the problem has been addressed, the ticket is automatically closed.  67% are resolved within the first hour.

Customer Satisfaction:  Surveys are sent to users after an incident is closed as long as they have not received a survey in the last 30 days.  The areas that are addressed in the survey are:  Courtesy, Timeliness, Knowledge, Quality and Overall each with a 1-5 numerical score.  The goal is for all of these to be at a 4.5 or above.  Last month 513 surveys were sent out and 21% were completed which is a typical response rate.  The majority of feedback is positive.  Timeliness suffers the most during move-in week as well as the first two weeks of the Fall and Spring semesters.  This semester they will be trying to address this by pulling in Collaborative Services (4 team members) as well as management to assist with the web and email submissions.

Tableau was used to further investigate some of the negative feedback received over the past year.  It is easy to connect back to the underlying data (in this case the tickets) so the problems can be investigated more efficiently than in the past. Some of the problems identified were related to the handling of the problem by the individual at the service desk while others were related to specific services and capabilities that were not available at that time.  With this information, upper management could be notified of these gaps and addressed accordingly.  The Tableau reports can also serve to identify gaps in knowledge at the Service Desk so that management can tailor education and training to address that.

Changes made over the past year as a result of the metrics:

·       Quarterly meetings have been moved up to bi-weekly.  The Link covers anything coming in during that hour.  Other groups are brought in for education refreshers.  Group conversations during these meetings allowed them to better catch trending issues.

·       A/V and web conferencing was one of the lowest scoring feedback areas last year so the Service Desk was included in the planning, preparation and documentation of WebEx.

·       Keeping the Knowledge Base up to date has been made part of the review process so that the Analysts know it is part of their job and should help with consistency.

Questions and Comments:

·       What is the highest cost method of communication?  Email.  The Service Desk email address is being removed from some of the support webpages to encourage users to use the other methods.  However, this method is still going to be available.

·       Do you know which groups tend to use which method more?   Not at this time, but we are working on gathering these metrics.

·       The quality of service appears to be different depending on the method used.  The web form may be able to better remedy this issue since it prompts the user for additional information that can cut down on the back and forth as well as in certain cases, taking the user directly to the Service Catalog where the problem can be addressed more directly.

·       Are there more questions now about cell phones and tablets and less about PCs and laptops?  They get a lot of questions regarding set ups of mobile devices, but less about troubleshooting issues.

·       Does the Service Desk have any expectations regarding the 360 migration?  A couple of people at the Service Desk have been migrated and overall the process went okay.  Team members have been involved in the project so they have been able to update the rest of the group, provide feedback and begin documenting.

·       How many people support the Service Desk?  Is there a lot of turnover?  7 at ATC, 5 at the Link (3 work during the day, 2 work 3pm-midnight).  Retention has improved.  Most of the people who have started at the Service Desk have moved into other positions within Duke.  This is a good hiring tool to be able to say that 80% stay at Duke.

Adobe Enterprise Agreement – Evan Levine & Glenn Setliff Jr.

Duke is still operating without an Enterprise agreement with Adobe after almost 1 year.  Any products being purchased now are being done through consumer channels on an individual basis. We are at an impasse despite regular conversations with Adobe regarding a potential agreement. This problem appears to exist at some other institutions as well. For now and through the Fall semester, it is best to approach the situation as though we will not have an agreement in the near future. This will probably affect the labs most, where purchasing individual licenses is not a feasible option. Adobe is moving toward the Creative Cloud model where software cannot be bought; rather it will be done on a subscription basis.

There are going to be individuals and groups at Duke that will need to use Adobe products and are encouraged to continue purchasing Adobe products in those cases.  However, there are also other alternatives that may work for the majority of people.  Therefore, the current emphasis will be in identifying the best alternatives and spreading that information across the Duke community.  The current set of alternatives is listed on the Wiki[9].  On the OIT software download page there is also a link to the Adobe Alternatives document.  As feedback is received about them, some may end up with their own enterprise agreements as others may move down the list.  The end goal is to identify Duke recommended alternatives and possibly Duke supported alternatives.  People are urged to provide feedback about the products as well as features that are important for software to include. 

For example, some alternatives to Acrobat are as follows:  PDF Studio Pro ($15 per year per license), Microsoft Word 2013 (Free), Preview (Free).  In three weeks 176 licenses have already been sold for PDF Studio Pro and this could grow to an expected 700 per year.  These alternatives represent substantial savings to Duke and often equivalent productivity.

Questions and Comments:

·       Can we put signage up at the Computer Store to inform people about cheaper options?  Yes, that would be an option, but the Computer Store isn’t currently selling Adobe products because of the change in licensing model (subscription vs. retail packaging).

·       Is there any danger of old licenses being deactivated or invalidated by Adobe?  Standalone and Concurrent licenses that we currently have will continue to be valid in perpetuity.  More than likely hardware and OS versions will be the driving force behind needing to upgrade perpetual software, possibly before the user requires any features provided by the newer software. 

·       If we continue to be unable to sell Adobe products and people select the alternatives, students who begin using those alternatives are likely to continue using them throughout their career.  This could be undesirable to Adobe in the long run.

·       Has this been discussed with other universities?  Yes.  There doesn’t appear to be a consensus among universities on which way is the way to go.  There is a big mix of universities that have gone with the new Adobe licensing model as well as those that have chosen not to.

Blackberry Use, Update and Direction – Billy Willis

About 625-650 people in the hospital still use Blackberries.  Every week about 10-12 of them are deactivated.  If a device hasn’t been used in 30 days or more, the user is contacted and if they don’t respond the device is deactivated.  The university shut down their Blackberry server about 6 months ago.  Once the Q10 (a newer model with the keyboard and ActiveSync capabilities) was announced in the U.S. the decision was made to officially announce the DHTS BES server shutdown at the end of December 2013.  In another couple of months, the users will be reminded.  Things look good to have that server shut down on schedule.

Questions and Comments:

·       The newer Blackberries do not require the server, correct?  Yes.  The Q10 and Z10 use Microsoft’s ActiveSync and do not need the BES server.

·       How expensive was the BES server to the university?  Approximately $30,000 a year, based on the number of users.  This was not the main reason for decommissioning BES.  In moving to Office 365, the support mechanism changed from using a BES server to a BIZ server.  The BIZ server is supported by Microsoft through an arrangement directly with Blackberry.  Since we were supporting the BES server locally there was no need for a BAA, but with the BIZ server Microsoft would not sign a BAA and that presented a huge security concern.