Duke ITAC - November 2, 2017 Minutes
Duke ITAC - November 2, 2017 Minutes
4:00 - 4:30 – Special Guest: Provost Sally Kornbluth, (30 minutes)
What it is: Each year the Provost attends an ITAC meeting to answer questions and share her perspective as it relates to technology and other topics; ITAC members are invited to pose questions of the Provost.
Why it’s relevant: As the university's chief academic officer, Sally Kornbluth has broad responsibility for leading Duke's schools and institutes, as well as admissions, financial aid, libraries, and all other facets of the university's academic life.
The provost, Sally Kornbluth, started by saying how pleased she is with the role our IT has played especially with DKU in getting all systems up and running. Without Duke IT, we could not have rapidly deployed and maintained the DKU systems as we could not have guaranteed the freedom and access to information in a seamless way.
In terms of IT services and systems, the relationship to the strategic plan can be placed in 2 categories:
- Pedagogical Innovation with the use of online/digital educational tools.
- Enabling Research particularly in Sciences and the School of Medicine and to raise funds to hire more faculty on both the Campus and Medical side as we see increasing need for more and more sophistication in research IT infrastructure.
On the Pedagogical side, the notion of an “Arc of Learning” is very attractive as we can start this process with students even before they come to Duke to all the way to their time at Duke. Using digital technologies to enhance classroom experience, we can do a lot with the educational modules with High school students by building a level playing field using online course modules, all the way to post graduation and how we interface with alumni for their Continuing Education. The use of IT Online, Advising, Career Services, and Visual tools etc. will likely be our major focus going forward.
As far as research, there are different flavors as people need more computing cycles and more complex needs for data and there is concern as to how we grapple with large datasets. OIT and DHTS are working together seamlessly but this has been challenging and a continued frustration on the campus side as far as access to the Health Systems data. 9 scientists were recently hired to do quantitative data analysis of electronic health records but researchers need continued access to data and they sometimes have had difficulty getting access to be able to perform on the leading edge. Some investigators with very large grants that needed access to data only managed to get it at 11th hour. We need to find ways to make data seamless, accessible, and be able to anonymize it.
The other concern is regarding the cost to upgrade IT and Research infrastructure which remains a significant portion of the provost’s budget. We need to find a way to sell the critical pieces as part of a package to elevate particular areas of science. If we raise money in one area, we could triangulate some of the monies to raise funds in another but it will be a challenge.
The notion that we do a lot of high tech things that could be done easily digitally, such as advising on curricular requirements not met, is an area where there ought to be tools to help in the services arena. In more sophisticated systems as in Cambridge and Oxford tutorial models, a staffing problem can be solved if you had digital content. A complex network between students and individual faculty communication is a long term vision but it will influence how we think about our education in the future.
Funding still remains a major concern as we are trying to maintain a leading edge with a target that is constantly moving forward.
Questions and Comments:
- On the research computing side, one of the reason we invest more of Duke’s monies is because the Federal funding environment remains unfriendly and proposed taxes on University endowments will make it difficult to fund scholarships and tuition grants.
- The funding environment is placing more and more expectations on Universities that they will pick up more of the financial burden because of how the funds are funneled to recipients.
- We need to think of ways to intellectually package infrastructure in an attractive manner so the funds could come from several donors.
- How can we make the concept of computing reach every student in the class?
- How can we leverage what we have such as OIT’s workshops and learning labs, short courses, units and if there are ways for different units to team up and collaborate?
- An example maybe a boot camp for covering a skillset that undergraduate students will need and serve a much larger population if you make it an Interdisciplinary accessible computing course.
- Another example is Statistics where there are patches of instructions outside of the Statistics Department and the Statistics department is not able to meet full demand.
- Online content experiments would be to do an umbrella course to introduce general principles and then make custom modules for each area.
- If we could pilot and deploy teaching modules successfully at DKU there may be more interest in porting them back to Duke.
- Students are more and more sophisticated and we may want to think of how we pose our practical problems to students and challenge them to come up with better solutions.
- Finding synergies like mixing Hack Duke with the Winter Forum and keeping the alumni connected to Duke.
In conclusion, data access is very important, it should be doable and technically it is better but still an issue.
4:30 - 4:35– Announcements (5 minutes)
Faculty Tech Fair – 11/3/17 1pm – 4 pm at Technology Engagement Center
4:35 - 5:05 – Scaling up Research Data Management Services, Jen Darragh, Sophia Lafferty-Hess, Patrick Charbonneau (20 minute presentation, 10 minute discussion)
What it is: Duke Libraries have expanded research data management (RDM) services, based on the recommendations of last year’s Digital Research Data Services Faculty Working Group. This presentation will discuss the RDM education program, the suite of services and tools available across the research data lifecycle – including services within the Duke Digital Repository, and future goals for the program. We will also look at one lab’s experiences with integrating data curation and deposition into its workflow.
Why it’s relevant: The storage of digital research data is a pivotal area of focus to support Duke’s research needs, and there has been much discussion of tools and best practices for data management. After last year’s presentation on the working group’s findings, we will see how the RDM initiative has progressed and discuss where it is headed in the future.
The presenters, both Jen and Sophia, started at Duke in January 2017. The Digital Research Data Services Faculty Working Group put out a report recommending that Duke commit significant resources to supporting Research Data Management needs and scale up RDM services. This generous funding enabled 4 new positions within Duke University Library that include 2 Research Data Management Consultants (Jen and Sophia) and 2 Contact Analysts for Digital Curation & Production Services who help with ingesting the data into the Duke Digital Repository.
To conceptualize, the suite of services can be grouped into the following 3 key areas:
Education Program - this included 6 new workshops and webinars, talking with graduate students about their experiences and being able to go into classrooms to customize their data management efforts.
In Fall of 2017 the RDM Workshops included the following:
- Data Management Fundamentals
- Reproducibility: Data Management, Git,& RStudio
- Writing a Data Management Plan
- Increasing Openness and Reproducibility in Quantitative Research
- Finding a Home for Your Data
- Consent, Data Sharing and Data Reuse
- Research Collaboration Strategies & Tools
For Spring 2018, the RDM Workshops will include the following:
- Data Management Fundamentals
- Data Management Tools: Colectica for Excel
- Data Management & Reproducibility Consent,
- Data Sharing, and Data Reuse
- Data Management Tools: The Dataverse Project
- Data Management & Grants: Complying with Mandates
One of the aspects of scaling up our RDM services was increasing our web presence to firstly advertise our services and secondly implement Education Programs.
For Education we created a guide on the Library web site – a Lib guide which is a self-service tool to learn RDM best practices and how to manage your data and find resources available at Duke as well as other locations.
The other aspect of expanding our web presence was to defining on the web what service we provide to researchers from the writing of grants to all the way to depositing the data.
Jen talked about how RDM conceptualizes and communicates these services.
Defining Lifecycle Services –
Data Repository Support
Data Management Planning
- DMP Review of drafts
- DMP Tool Support and Language regarding long term storage and archiving
Data Workflow Design
- Organization – folder structures, storage options, version control, and backup strategies etc.
- Tool Selection
- Open Science Framework – a tool to create open transparency and research
Data & Documentation Review
- Open File Formats
- Metadata schema
- Ethical Sharing
Curation Services –
Below is the workflow of how the Duke Digital Repository Data Curation & Ingestion services work:
- Research Data Depositor
- Prepare data using submission guidelines
- Completed metadata form/deposit agreement
- Reviews files & metadata for completeness
- If all is ok, it proceeds to the content analyst
- If not ok, the depositor receives an email with issues and optimization suggestions
- Content Analysts
- Ingest file, verify metadata, & assign DOI and QA
- Publish data & notify depositor data are published
To learn more you can email email@example.com
Dr. Charbonneau talked about the various ways his Chemistry department has simulated and utilized the Digital Data Management and Curation services in the past year and a half and illustrated a typical DDR entry.
A Typical Data Stream may have the following properties:
- Computer Code ~ 1MB (in house or commercial)
- Configurations of molecules of up to ~100 GB are generated
- Analysis Scripts ~1MB (in house or commercial)
- Observables (Data) ~1MB
- Plotting Script ~10kB (in house)
- Figure ~1MB
The DDR for each publication flows as follows:
- Typical Publication Timeline
- Paper submission and review with mention in acknowledgment - a few months
- Paper acceptance - within a week
- Data deposition to DDR and DOI creation - couple of weeks
- Insert DOI in page proofs
- Paper publication
- Update DDR publication info
- Not all data can be archived or needs to be archived
- DOI takes a few months to review and friendly to deposit
- Finally it goes on to publication
- Promote and market our services
- Build collaborative relationships with other data
- Support/Research groups at Duke
- Contribute to a network of data support across Duke
In conclusion, the above process is working well.
Questions and Comments:
- What do you do if scripts do not run due to a change?
- Scripts, if documented well, should be able to construct the basic functions.
- Docker containers to support scripts can also be stored in the repository.
- How difficult is it for the students to fall in line with the file structures?
- The file structure is simple. The first ones were hard but someone had to write the scripts. But now the students feel relieved that they are doing it right the first time.
- How does the system support External collaborators?
- Currently we are using Box to share files and that works well.
- We are also joining National Projects in the Spring called Data Curation Networks. University of Minnesota is leading this effort and we will be one of seven other Universities collaborating on data curation.
- How to promote and market to get the word out to researchers?
- The grant heavy departments need to know that DMPs can solve DDR problems
- Search engines like Google can rank and index DMP and direct to appropriate web sites
- Technology workshops as in the Co-Lab and GitHub.
- Building collaborative relationships as with the School of Medicine as mentioned earlier by the Provost.
- The School of Nursing would love to learn more and could benefit from this talk. The IRPs on the Health side can also use these services.
- What is the scale of datasets that we are able to manage and curate?
- The Data Curation network, in their initial one year pilot determined that individual curators can probably do hundreds per year and so it would be a matter of building a program that scales up and we will need to document and make a compelling case to get more resources to achieve this.
- In January we will be building a prototype for self-submission and automate curation.
- We would like any feedback on how our service models to improve services and fill any gaps.
- Is versioning possible on the DOI?
- DOI is the permanent version.
5:05 – 5:30 – Virtual Computing Manager, Evan Levine, Mark McCahill (15 minute presentation, 10 minute discussion)
What it is: Virtual Computing Manager (VCM) provides easy access to virtual software packages and semester-long virtual machine reservations. Duke students and instructors can access specialized software without installing it on their own computers, host their own server for development projects and coursework, or customize their own environment to use for the semester.
Why it’s relevant: As the successor to the legacy VM Manage and VCL services, VCM allows the Duke community to reserve a virtual machine or container for use in coursework, in software development, or as test servers for projects. Evan and Mark will demonstrate how VCM can more easily and effectively bring this technology into the classroom.
History and Background: For about 10 years, we wanted to have ways of augmenting student computers to some kind of remote access to Virtual Machines (VMs) running pre-installed coursework-oriented application suites so that Students don’t have to do it themselves. To provide a stable computing resource that can be spun up easily on demand.
2008: VCL - Virtual Computer Lab short-term VM reservations
2013: VM - Manage semester-long VM & virtualized app reservations project
2017: VCM - Virtual Computing Manager combine VCL and VM-manage
Virtual Computer Lab
- Short-term (2 - 4 hour) reservations for coursework
- Originally developed at NC State - Duke OIT added support for Shibboleth and VMware virtual machine provisioning
- Spring 2017: major upgrade needed to support
- Windows 10 & declining usage - time to transition
- October 2017: decommissioned
- Semester-long Linux VM reservations designed for Innovation CoLab projects
- Organic adoption for coursework grew VM-manage far beyond initial design
- VM-manage extended to semester long access for web-based software in Docker containers (RStudio, Jupyter, Matlab, etc.)
- User interface and provisioning updates were needed - transition to new platform
- Fall 2017: VCM replaces VM-manage and VCL
What is VCM?
Virtual Computing Manager
- single place for access to virtual machines and software for academic, student, and course purposes
- user experience interface redesigned by Duke Web Services
- semester long Linux & Windows 10 reservations
- automated application suite builds via Ansible and Chocolatey scripts simplify and staying current on security updates
- advanced notice on a large number of VM requests is recommended
Unique users in 2016 vs 2017
- VCL: 511 unique VM users
- VM-manage: 203 unique VM users
Fall 2017 (mid semester)
- VCM: 987 unique VM users
Demo: If you go to vcm.duke.edu, from a top level domain you see a splash page as to where you are and if you login you see the VM options. Most users will see one VM or none. The user experience is great.
During the transition from the old system, the Security Office was finding that the Spammers and Phishers who had hacked accounts, were also transitioned and the only way to verify them was to introduce Multifactor authentication.
- As demand for VMs increase and access to software becomes easy, the physical computer lab space continues to decrease.
- We are working hard to stay ahead of the curve on storage, RAM, and IP addresses.
- When the VM spins up, a control panel allows you to turn off/on, reload original image, transfer the ownership at the end to the semester etc.
- Public facing IP addresses due to the original Co-Lab addresses and especially for iPhone apps
- A new wrinkle: login used to be as admin user now login as VCM or using netid, if you know the hostname, or use RDP which downloads an RDP settings file and requires credentials before connecting to your Windows VM.
- The mix of VMs is about 450 Windows and 600 Linux machines.
- Reserve a Container will be replaced by reserving a VM
- Need a build script from the repository to spin up a new VM
- Now has the ability to apply Security patches in a timely fashion
- Help includes various topics on VMs or Containers or Reservations and a quick turnaround.
- Option of installing additional software packages - when students or faculty get a VM that needs additional software, they can go to Software Center and have those installed.
VM ecosystems - Reservations for 3 different clusters & audiences
- students/coursework: vcm.duke.edu - simple
- faculty/researchers: rtoolkits.web.duke.edu
- admin/department: clockworks.oit.duke.edu – more customization
Questions and Comments:
Q: Moving forward is there going to be a policy developed for instructors that want a VM for the course especially the large courses so that the students can have a better and more attractive experience?
A: Since Eclipse VMs are containers in a web browser,
Q: Are they Renewable from semester to semester and will they last 5 years?
A: Yes, but may need to be done promptly to avoid restoring from backups. Student VMs are not being backed up. Backups are the student responsibility. Clockworks has an option to request TSM backups.