Duke ITAC - October 10, 2013 Minutes

ITAC Meeting Minutes

October 10, 2013, 4:00-5:30

Allen Board Room

I.    Announcements

·        Reminder about the 7th Annual Tech Expo[3] which will be held Thursday, January 9th, 2014 at the Washington Duke Inn.  Further details will be sent to the ITAC email list.  Session proposals are requested as soon as possible.
What is Tech Expo?  A collaborative effort between Duke University and Duke Medicine for IT across the organization.  Includes some presentations by on-site vendors.  About 6 simultaneous sessions over 5 hours with a lunch included.  If you have any questions that are not answered on the website, it is suggested that you contact the ITAC committee at techexpo@duke.edu.

II.    Agenda Items

 4:05- 4:25 – Innovation Co-Lab Update of summer projects and the Duke Mobile Challenge, Michael Faber, Evan Levine (10 presentation, 10 minute discussion)

What it is: The Innovation Co-Lab is a creativity incubator connecting student pioneers with the human and technological resources necessary to jumpstart their success.

Why it’s relevant: We will provide an update on the early progress of the Co-Lab for the current semester, share plans of upcoming events and talk about our new Idea Submission and Feedback platform called Blue Sky. We will also discuss the recent DukeMobile Challenge where students were asked to participate in the process of shaping the new DukeMobile app.

  • The Innovation Co-Lab is a creativity incubator giving the students at Duke more ownership of the technology environment through a series of events and building a community where technology originates from not only OIT, but from the other places, including the student body.
  • Events held this semester:
    • The Big Byte Challenge: 4-day long hack-a-thon where students where challenged to use Apple or AT&T API in their project.  The hack-a-thon model is a learning opportunity in that some of the participants had no experience developing in Objective C or working with APIs prior to the event, but were able to gain some hands-on experience while in attendance.
    • Duke Iconathon: The event’s goal was to take a list of topics related to innovation in education (MOOCs, asynchronous learning, problem-based learning, flipped classrooms, etc) and distill them into a black and white icon or visualization, creating a visual language around each topic. 
    • The Noun project will take the work from the Iconathon collaboratively, along with the group’s critiques and convert them into a usable set of icons for the public.
    • Duke Mobile Challenge: Students were asked to imagine what apps would belong in the DukeMobile App.  Eight apps were submitted, with winners determined this morning. The idea of having students participate in the DukeMobile App has been a positive experience.
    • Innovation Stories Challenge – Students are asked to create multi-media story-telling projects around innovations at Duke.  The goal is to shine a light on the innovations already happening at Duke (Faculty research, student entrepreneurship, etc.)  A list of topics to report on is needed and interest forms to nominate subjects/topics for the challenge are available and can be submitted through Qualtrix.
    • Git Repository: Place where code can be stored and shared has nearly 100 projects now.  The Co-Lab, while originally slated for students,  is being used to progress institutional technical resources
    • VM-Manage: Requisitions virtual machines for long-term use.  There are approximately 50 active Virtual Machines for extracurricular, class, and Co-Lab projects
    • Streamer – The data behind DukeMobile (API of public data) is a growing collection of public data streams and internal collaboration at OIT.  A student could develop his/her own interface to the Duke Directory, Duke Map System, DukeCard data or courses to build a schedule online on a mobile device, etc.
    • OAuth + Streamer will permit an app to fetch data on the behalf of an authorized user and display it in a mobile interface. 
    • Blue Sky (A Place for Ideas at Duke) – A community for idea collection and feedback.  Topics will be posed/sponsored by a familiar face on campus.  One of the first topics is the Duke Experience: “How can students best design their Duke experience based on the array of substantial opportunities they have?” (Larry Moneta & Steve Nowicki)  Students and other members of the Duke community can submit ideas and post comments and feedback, creating a repository of ideas that are possibly brought into the Co-Lab for execution.  A regular flow of questions and topics will be most contributory for the success of this project.
    • What’s next for Co-Lab – Continued Discovery(working to connect with groups to get a pulse of what the landscape of innovation is on campus), Continued Partnership (Getting involved with projects in Durham outside of Duke), Continued Experimentation (New technology and new frameworks for Co-Lab projects)

Question: When will the final Duke Mobile Challenge app winners be announced? Tomorrow at 10AM.

Question: Where and when will the Streamer API be published?  The public API is currently available on http://streamer.oit.duke.edu

Question: Is there access thru Streamer to the non-private data thru campus use as well? Not currently.

Question: Will the library, working with something like OAuth enable people to access and share licenses around the accessing of data?  ITSecurity Office has been building a protected network for sharing digital information across the country and across the world.

 4:25 - 4:45 – Research Computing, John Pormann (10 presentation, 10 minute discussion)

What it is:  The Scalable Computing Support Center (SCSC) helps manage the Duke Shared Cluster Resources, a large Linux cluster environment that is available to all researchers. We are continuing to look at the business model with an aim towards improving the services offered, and better aligning with community needs and expectations. 

Why it’s relevant: We will go over proposed changes to the Duke Shared Cluster Resource service as well as the proposed budget for FY14.  Proposed changes include potential decreases in fees as well as reduced functionality that would be required to offset those lost fees.

  • Condorr Grid Computing is a free service for anyone with an active Duke Netid.  We partnered with the University of Chicago to create an open science grid leverages the Condorr grid.  It will allow Duke users to access OSG
  • Duke Shared Cluster Resource – Linux cluster driven by a batch queuing system.  Hardware is purchased by researchers off of grants and is contributed into the cluster. 
  • Condorr Grid Environment: A Wide Area cluster computer service based out of the University of Wisconsin, and now supported by Red Hat. Thousands of machines are contributing to an open science grid pool with Duke’s goal being to have access to the pool of computers and begin contributing to it in the future. Approximately 700,000 jobs per day are regularly run through the grid, with 28 petabytes in the last 30 days.
    • Duke jobs are scheduled to contribute to the OSG (Open Science Grid) by the end of the year, with a parallel effort to enable the jobs flow back into Duke to make use of our resources. 
    • There has been some interest in performing automated data transfers.  Globus Online allows a user to run a software client close to their own storage and register it with Globus Online, making it a controllable endpoint.  Department-owned storage can be potentially controlled enabling high-bandwidth, automated data transfers between local storage to the OSG or between departments.
    • Intel Xeon Phi – GPU (Graphics Processing Unit): Graphic processors are normally used to display pixels onto a screen.  A GPU is a highly-capable computational engine easily handling a teraflop of computational power.  While pricing isn’t readily available, Intel has agreed to allow us to test the devices, containing 60 micro CPUs in each unit.  There are other GPU devices in the DSCR (Duke Shared Cluster Resource). 
    • Short-term GPU reservations can be done through the VCL (Virtual Computing Lab)
    • Research Computing at Peers:  The majorities of our peers have a Condor and do not charge for its use. 

Question: What’s the use of Condor now? Usage is increasing in parallel with knowledge of its availability.  However, the DSCR meets the needs for large, tightly-coupled computations and as the need is seen, people are directed to its availability.

Question: How do you help people figure out which resource will meet their computational needs(Does the website provide information as to what resource might best suit their needs)? We accomplish this thru consulting.  The primary determining factor is whether 1 machine or multiple machines simultaneously are needed.  1 machine is more of a Condor grid, while 100 machines will make a better use of the DSCR.  

4:45 - 5:00 – Office365 Update, Charley Kneifel (5 presentation, 10 minute discussion)

What it is: Office365 is a cloud-based email service that will better meet the needs of the students, faculty and staff across the university and health system.

Why it’s relevant: We will provide an update on the pilot for OIT and the proposed schedule for completion of this implementation across campus and the health system.

  • Where are we today? Nearly all of OIT has been converted, and there are approximately 66,000 more users across the University, School of Medicine, School of Nursing, and the Health System.  We are working through our documentation for the various client, ensuring it’s up to date.  Once the cloud migration is complete, Duke will have a unified Global Address List (GAL), but a unified directory will not be available immediately. 

Question: Will Alumni be added to the scope? Yes, but only after the University and Medical Center users have been migrated.

Question: Office 365 is supposed to provide a lot more than just email and calendaring, so is there a plan for the other features? The other features that could come (SkyDrive Pro – the storage in Microsoft’s Cloud) would be free.  There is some ability to perform sharing, but Box seems to be a better fit.  The Office Apps are very browser specific and other features are add-on costs.  Our focus has been on ensuring the basic functionality is progressing prior to

Question: Where are we with communicating with schools regarding the timeline? Some discussion around the communication plan was discussed in the last IT Council meeting.

Question: Does the move impact privacy policies? Our privacy policies at Duke will all extend to the cloud.  It is analogous to what we currently have and our agreement with Microsoft ensures the data centers storing will only be in the U.S.

Question: Is this a back and front end change for students? The desktop clients (i.e. Outlook, Mac Mail, Thunderbird) will not change, but the webmail clients will.

Question: Will students still be able to forward email to Google? If their accounts are already setup to forward, we are honoring that configuration, and the option to establish forwarding remains after migration.

 5:00 - 5:25 – Data Management at Duke, Molly Tamarkin, Joel Herndon (15 presentation, 10 minute discussion)

What is it: Duke Libraries’ Data & GIS Center have several tools to help manage and archive research data. The presentation will discuss these new tools and our services in data management and access.

Why it’s relevant: Many faculty have grant agency mandates to preserve their data. This presentation shows ways in which the Libraries can assist researches in meeting their obligations.

  • Duke Libraries Data & GIS Center has some new service offerings:
    • In addition to sharing a position as a visualization consultant with OIT, there is a data analytics staff member on hand, and John Pormann has joined us to help with scientific computing consultation and IT architectural issues at the libraries.  We are challenged with finding ways to store things like Kinemage as executable objects to be cataloged and available in perpetuity.  John is working to create virtualized images of tools and preserving them.
    • We receive a lot of questions about managing research data in the Data and GIS area of the Library.  Faculty members have posed questions about getting credit for the data they’re producing.  There have also been a lot of questions about data management plans, and questions about long term storage of research data.
    • Data Management Planning: 
      • Since 2011, many U.S. granting agencies, starting with the NSF, have added data management planning regulations to their grant requirements. Most regulations require no more than two pages describing how the researchers are expected to share the data on their research project, at no more than incremental cost and within a reasonable time.  There are a lot of questions from faculty about how to meet those requirements while still meeting the remaining goals for writing the grant.  In an effort to provide support for these questions, we’ve created a Data Management Guide that has information about what the different agencies require, information about how to comply with those guidelines, things to consider when writing the plans, and information about long-term storage issues.
      • In response to the needs of our researches, particularly targeted support for when grants are due, the Library joined the California Digital Library Data Management Tool Project.  The initiative links research libraries across the U.S. by providing an online tool that guides researchers through the step-by-step process for data management planning through different granting agencies.  It enables Duke to provide customized information for the data management plan. 
      • The number one question related to data management is centered on data sharing/archiving: Will the library archive my data?   Depending on the discipline the research is based on, how and who will need access to the data in the future, different storage resources are recommended and utilized.  Databib[1](link is external) and the re3data.org[2](link is external) site are examples of resources we direct researchers to.  We have an institutional repository at Duke - Duke Space.  For the last few years, we’ve been accepting research data attached to scholarly publications and more recently, the storage has been opened to data sets not associated with a research data project. 
      • Another major question/concern from faculty members is people are reluctant to share data due to the risk of not getting credit for the work.  We spend a lot of time talking with students about traditional citing methods, but increasingly, we are talking more and more about Digital Object Identifiers.
        • (DOIs)– 1.  A human readable citation standard for referring to digital objects in publications.  2.  An associated location of the digital object that can be resolved to a machine address.  3.  The ability add metadata (who, what, when, where) to enhance discovery and reuse. 
        • There are many indexes services available now (Google Scholar, Thompson Reuters Digital Citation Index) that allow scholars to claim indexed datasets as part of their academic output.  We’ve been talking with the group managing Scholars at Duke to possibly add data indexes into that system. 

Question:  There is one type of data not referred to (program codes/simulation codes).  The concern is there are currently no institutional, reliable, permanent methods to store this data.  DukeSpace can be used to store this type of data.  The DOI system we’re using from the California Digital Library does have the capability to provide DOIs for different types of digital objects so there is a possibility for long term storage and availability. 

Question: How are you planning to store this kind of data?  We have been discussing the option to take the code and wrapping it with the other pieces required to run that code as a virtualized image, allowing the code to be executed at any time. 

Question: Is there a sense at all of what fraction of faculty 1TB of data that covers; what is the split in percentage of faculty 1TB covers? The anecdotal evidence we’ve seen thus far is the social sciences numbers tend to be very small, but the sciences and humanities (video) can have large datasets.  One of our challenges is we have unique collections that would require an enormous amount of storage to digitize.  This same data storage need is reflected in research datasets. 


[1](link is external) Databib is a searchable catalog/registry/directory/bibliography of research data repositories.  Originally sponsored by a Sparks! Innovation National Leadership Grant from the Institute of Museum and Library Services, hosting for Databib is provided by the Purdue University Libraries. 

[2](link is external) The goal of re3data.org is to create a global registry of research data repositories covering different academic disciplines.  re3data.org will present repositories for the permanent storage and access of data sets to researchers, funding bodies, publishers, and scholarly institutions.