Duke ITAC - January 20, 2011 Minutes

ITAC Meeting Minutes
January 20, 2011, 4:00-5:30
Allen Board Room

  • Announcements and Meeting Minutes
  • Update on Open Access and Digital Information Strategy (Paolo Mangiafico, Kevin Smith, Molly Tamarkin)
  • Graduate Students' Perspectives on IT  (Brian Kiefer, Yang Yang)
  • OIT Lab Engineering Update (Samantha Earp, Evan Levine)


Announcements and Meeting Minutes

Alvy Lebeck called the meeting to order at 4:03pm.

Tracy announced that Duke has hired a new chief information security officer, Richard Biever, replacing Paul Horner, who left in September.  Tracy said Duke performed a national search that led to the selection of Richard, who currently works in the IT security office at Georgia Tech. Tracy added that he was familiar with a decentralized, research-oriented environment but that he also understood the needs of securing central IT.  John Board noted that Richard's been in a role at Georgia Tech where he's had to work closely within a framework of faculty governance and seems to have the right approach to that structure.

Update on Open Access and Digital Information Strategy (Paolo Mangiafico, Kevin Smith, Molly Tamarkin)

Paolo reminded the committee that Duke received a Mellon Foundation grant a couple of years ago to develop a strategy for managing the campus' digital assets; the Library and OIT provided matching funds.  The provost appointed a task force in spring 2009 to help develop Duke's strategy in this area.  Paolo shared a lengthy list of individuals from various IT, faculty and campus units who are collaborating on this task force.

Paolo said that the group began by looking at the array of digital materials the community was producing, particularly those that lacked systems and processes to ensure their preservation. Publications were seen as a starting point because they have long-lived value, small file sizes and low format complexity; he added the task force also felt publications would be an area of significant initial impact.

An open access policy based on a model in use at Harvard was developed starting in 2009 and approved last year after consultation with ECAC and other groups.  Tracy asked Paolo to clarify the purpose of an open access policy.  Paolo noted it is meant to "increase the reach and impact of Duke scholarship" to the end of knowledge in the service of society. The open access policy retains rights for Duke as well as faculty authors, allowing works to be used and shared as wished.

Paolo noted this arrived in the wake of ongoing restrictions by scholarly publications on the acceptable use of their publications, and should help to encourage broader access to knowledge resources online and improve the efficacy of metaanalyses by making more works widely available. He added that many leading research institutions are taking on open access policies, something that in the aggregate could help to change the stance of third-party publishers.

The policy makes it the default position that faculty authors grant Duke a non-exclusive right to reproduce and distribute their scholarly articles for free, and to allow others to do so, assuming they are not sold. Faculty can opt-out of this license to Duke if they wish or if they are required to do so by their publisher. The provost and ECAC are called on to interpret the policy and resolve disputes, while the Duke libraries and the Library Council are poised to support the implementation of the policy, which is slated for review after three years.

The campus' DukeSpace repository has been in place for a number of years, but has been used primarily for theses and dissertations to date; DukeSpace is now being adapted for use in storing works under the open access policy. The repository is currently open for self-submission by faculty, with an intern in Kevin Smith's office checking rights and posting materials for faculty and departments who are interested in batch loads now.

Kevin Smith said that DukeSpace contains 1,100 dissertations and theses; almost 750 faculty-authored articles are in the repository or pending publication as well, which Kevin says was a surprisingly high number to him.  Some faculty are choosing to put their own out-of-print books into the repository, along with monographs and other works from Duke's rare books collection.  Kevin added that compared to even longer-lived archives at other institutions, Duke is doing well, in the middle of peer institutions, even before some of the automated processes planned for implementation are complete.

John Board asked how high-profile for-profit publishers are reacting to Duke's policy, given their opposition to this idea some years ago. Paolo said some are saying they are okay with this, given that they are seeing it as a voluntary model in many places and not seeing a wide number of institutions doing it; he adds that they may be expecting faculty not to opt into this. Kevin Smith said that about 70% of publishers will allow a final manuscript to be posted to such repositories, but not the final published work; Molly noted this can lead to variations in pagination, missing illustrations and the like.

Kevin said that some publishers ask for a link to be added from the manuscript to the published work.  He added that Elsevier briefly contemplated a policy allowing faculty to post manuscripts in institutional repositories as long as there wasn't an institutional policy on such repositories, but the publisher backed away from that after much discussion in the scholarly communication world. Deborah Jakubs added that the library had received positive feedback from some faculty for whom the presence of works in DukeSpace led to their receiving contacts about their work from around the world.

Paolo said a developer is working on processes to make submissions to DukeSpace more automated and less manual, and exploring the possibility of integrating subject librarians to work with faculty in the fields they support, especially in cases where a version other than the final published work is needed. Kevin said a second intern is working specifically on works from Duke's global health initiative.

Paolo added that there is an effort to connect the open access repository to the Scholars@Duke effort based on Vivo; the DukeSpace team in the Library and the OIT team supporting Scholars@Duke's development are now meeting regularly to try to coordinate those activities and deduplicate the need for faculty effort.  OIT has received a grant to develop Vivo widgets, allowing the material in either Vivo or DukeSpace to be easily embedded into a personal or departmental web site to give links to your repository works.

Paolo said the library has started efforts around creating a platform for Duke community members to publish to an open journal system based on an existing open source technology, with a couple of journals getting set up on this.  Kevin and Paolo noted Duke was also providing funds to faculty in support of efforts by COAPE, the Compact for Open Access Publishing Equity, an international initiative intended to reduce barriers to open access publishing, to partially or fully reimburse faculty for the publishing fees that some open access journals charge in cases where the faculty member's grant will not cover those costs.  Seven faculty have been interested in the program since the fund opened in October; three have been ineligible, two of the remainder have been funded to date and we are waiting for further information from the other two. Kevin emphasized the initiative does not fund "hybrid" journals that are subscription-based but where authors can opt for an open access alternative, because this fund is about creating incentives towards a sustainable model.  The per-article open access publishing fees for faculty to date have ranged from $1,000 to $2,000.

Paolo said a web site (http://library.duke.edu/openaccess) had been created providing more information and video testimonials. Kevin noted that one of the faculty featured on the site has found he gets more and more accurate press coverage, visibility and conference name recognition because his work was published openly.

John Board noted many publications today were providing "articles-plus-plus," including not just the written work but data sets, high-definition video and more, and whether the repository provides for that. Paolo said this is a big issue, particularly given the National Science Foundation's emerging mandate for data maintenance.  Paolo said the repository today takes small objects, but that the group is looking at DukeSpace this spring to see how to manage large objects.  Molly Tamarkin said this is something that's coming and that the team is working on.

Paolo clarified as background that in October the NSF announced that for any proposal sent to them, PIs must now submit a two-page data management plan stating on how they'll maintain data and make it available, including how they will keep their data accessible for perhaps as long as multiple decades.  Paolo added that NSF doesn't want to prescribe a single approach to data management, but wants to see what emerges in different disciplines and to see a standard to emerge from the grant peer review process, though this adds initial complications as faculty are today unsure what level of data management is expected.  Jim Siedow noted something similar came forth with NSF's attempts to add an outreach component to grants a few years ago, and that the NSF would likely similarly wait to see how the scholarly community reacts to data management plans, and which ones are liked or disliked.

Paolo added that they're collecting such plans at Duke to build a database of boilerplate text and best practices that can be shared. Robert Wolpert said he understood NSF might require data retention for twenty years, which is longer than any storage technology has lasted; he asked how faculty were going to manage this. Paolo said he expected plans including the length of retention would emerge from community expectations, and it would be up to peer reviewers to determine what would and would not be appropriate.

Paolo said that besides NSF, Duke policies require retention of data in original format for at least five years, and that there's pressure and rising expectations from schools and other sources including funding agencies, journals and databases, along with reputational benefits to faculty for maintaining their data.  There are benefits to doing this better, he said, including a rise in citations for scholarly works where original data are available, along with inefficiencies inherent for a researcher choosing to do it alone.  A series of conversations are underway, he noted, within Duke to think about what can be done to improve support now using existing resources.  A web site (http://guides.library.duke.edu/dukedata) now gives some guidance and best practices for data management, along with sources of peer support examples of best practices.

For the Digital Futures Task Force this spring, Paolo said the top agenda items this spring would be:

  • What can be done to help researchers better manage, archive, and share data?
  • Where and how should these services live and be funded?
  • What should be retained locally at Duke versus in repositories set up within disciplines, at a national or international level, or through commercial services?
  • How can incentives be created to encourage people to do the right thing?

Some likely elements of a program around research data, Paolo said, would likely include the following:

  • Convenient, cost-effective storage pools to encourage managed campus storage;
  • Domain-specific virtual research environment systems combining collaborative tools with data management, version control, metadata, data packaging along with interpretive environments for later, etc.;
  • Data repositories to publish and archive data;
  • Assistance services for researchers to help them in planning data management, description and conversion; and,
  • A searchable registry of data at Duke and elsewhere facilitating citing just as academic articles can be.

Robert Wolpert noted this is a national problem not a Duke one; are major cloud vendors offering services to support these needs?  Molly said the team is watching the space, but that the greatest potential is in domain or discipline-specific repositories.  She added that she and Julian Lombardi had discussed engaging with a cloud service, but that there isn't a solution available for purchase.  Julian added that it's not clear what requirements will emerge yet from the needs of individual disciplines; once they do, Duke can decide whether to contract out, provide services locally, or point researchers to discipline-specific repositories.  Paolo added that some disciplines like genomics and astronomy are further ahead than others are.  In terms of cloud storage services, Paolo said some groups like Duraspace are trying to build common contracts for universities to use for such contracting.  Julian noted some disciplines might have requirements for storage at a scale that doesn't work for what could be provided.  Molly added that there is a concern over how to provide data integrity assurance for very large file systems and storage.

Dave Richardson said that he was a user of national databases since the early 1970s, and isn't sure disciplines is the only organizing principle, adding that there are sometimes individuals within or across disciplines who've formed their own groups for these purposes. He added that there is often derived data that does not easily fit into a specific mold or delineation, and that what's being proposed here of the library playing a role in storage and archiving fits with their long historical mission and purpose; that is not true of cloud services and other organizations, Dave said.

Mark Goodacre asked what the impact is for faculty who have materials online in their own private or personal web space, as he does; all his articles are on his home page but not at a Duke site. Mark said he finds it attractive to have his works appear a Duke imprimatur but that it is a big effort to move his data.  How would a faculty member in his situation approach past and future data planning, and should he use Duke resources for the need?  Paolo said that it was very likely an institutional repository would last longer than a personal web site, and the repository would offer a permalink that was citable and stable, with the widgets described earlier available to support embedding within a personal web site that becomes a source for how people find you. Paolo said that he would advise people to transfer existing content in but that a better system is needed to support that migration.  Molly said that the focus initially has been on meeting the needs of the open access policy and that retrospective content has not been a focus to date but would become one. Mark added that he guesses that services like Google Scholar will be more likely to pick up your work from a Duke repository rather than a personal web site.

Tracy noted that the presenters had mentioned opt-in as well as opt-out approaches, and asked whether we have a sense of what percentage of people are choosing to participate versus choosing not to?  Molly said that without the automated workflow that's under development, all of the material comes only from self-identified interested faculty.  Kevin added that while the process is technically opt-out, it's practically opt-in at the moment without an automated process.  Molly said the group hopes that when the citation harvesting feed begins, faculty will be notified that certain work will be placed inside the Duke repository unless the faculty member chooses to opt out.

Joanne Van Tuyl noted that she has a colleague who published work in the 1960s to 1980s that weren't accessible to Soviet researchers, and that there is now an interest in accessing some of those works, with works in demand and largely discovered via word of mouth. She said that there are needs in some areas like this to make things accessible to populations who otherwise would not have them.    Paolo said DukeSpace is available for this purpose but that there just wasn't a comprehensive process for doing that retroactively.

Graduate Students' Perspectives on IT  (Brian Kiefer, Yang Yang)

Brian said his and Yang's presentation would focus on graduate and professional students' perspectives on IT at Duke, adding he would bring a primarily Fuqua-based perspective given his course of study.

Brian said there have been problems with wireless networking in Fuqua, and that he and others were sometimes unable to connect to the "DUKE" SSID network, in what had seemed to be an elusive issue to solve.  Brian speculated it could be because Fuqua is on its own local school network versus the campus-wide private network in place at Fuqua, or due to different technologies in place at Fuqua versus the campus-supported network next door at Law.  Tracy noted that Fuqua is the only place on the campus side of Duke that has run its own network independent from the rest of the campus network.

She said that about a year ago OIT started working on integrating their network into the university network such that Fuqua will soon have wireless on campus identical to that in other parts of campus, adding that she hopes this would be a short-lived problem.  Brian said that Fuqua similarly had its own VPN service, which had caused challenges for off-campus students who had single Time Warner connections from off-campus. Alvy noted that when Computer Science transitioned to OIT's wireless service there were initially issues with cached IP addresses and the like; Tracy pointed out that if old leftover access points remain after such a transition, those may be causing problems.  Brian noted that the VPN issue appeared to have abated since students began moving to the OIT-supported, campus-wide concentrator.

Brian noted that students at Fuqua were challenged at times to follow multiple sources of communication and information, including email, Duke Groups (CollegiateLink), Blackboard, internal CMS systems and other web sites.  He added that there are also a large number of career services web sites and information sources, and that together there's "multiple sources of truth" with challenges trying to integrate across them. He noted that the school was working to develop a single platform for certain pieces of information and data. Brian suggested this was not a Fuqua-specific issue, as students from throughout the campus have varying sources of information through web sites and other sources.

Yang said that representatives from the various graduate and professional schools had raised a few questions and thoughts on technology during a meeting back in November with Pakis Bessias and the IT Council.  The stability of the new portal.duke.edu VPN service had been raised as a concern, with some students saying they were unable to log in.  Cell phone reception including dead spots in basements and inside rooms were described as poor, particularly for AT&T.  Finally, students noted the presence of multimedia equipment like cameras available for check-out at the Link, but Yang shared the utilization of these seemed to be high and that it was difficult to access the equipment when needed.

He added that he thinks OIT is offering good services, but that many students aren't aware of what service offerings exist.  Yang said he thought it would be beneficial to have more information and presentations at orientations and new student events, along with training, on-demand tutorials and the like. John Board noted that with undergraduates there is an organized orientation to leverage, and asked what parellel option might exist for graduate and professional students? Tracy noted that some departments have their own such events; Pakis and Yang noted there is an IT presentation during the all-student orientation for graduate students, with separate orientations for Law, Business and other professional students.

Yang said that students seem to appreciate "quick and precise" replies from the OIT help desk.  He added that he would like to see more information and advocacy of new technologies and how and what to use, including in the area of green technology.  He added that he could offer to advertise information to graduate and professional students via the weekly newsletter or through GPSC General Assembly meetings, to discuss training classes as well as significant changes or evolutions in services.  Yang added that for graduate students, computer labs are likely not needed, but that software licensing or large monitors for group discussions would be needed.

John said that VPN was mentioned as being important by both Yang and Brian, and asked why they use it.  Yang said that international users need VPN to access library resources; Brian added that Fuqua's team-tools like shared mail, calendar and file storage as well as library resources required VPN access.  Molly noted there are alternatives to access library resources outside VPN, but that it's hard to get people to be aware that you don't need VPN for all their resources.  Brian added that collaboration platforms are really what he means by communications platform; there's little individual work but plenty of group efforts and analysis, presentations and the like.  He added that services like Google Apps and Dropbox are being used in greater numbers.

Mark Elstein asked if there aren't three VPN services at Duke, two web-based and one traditional VPN concentrator requiring software download.  Ben Getson noted the presence of a login announcement on the traditional VPN system noting its pending retirement.

OIT Lab Engineering Update (Samantha Earp, Evan Levine)

Samantha updated the group on the review of public computing conducted 18 months ago after a recommendation from DART; an ITAC subcommittee with faculty, student and staff membership was charged with evaluating demand and future directions for public computing.  That latter group identified kiosks, general computing labs, classroom labs and specialized labs as relevant services.

Based on the group's deliberations, four kiosks were removed at the Devil's Den and Brown, and five general computing labs were closed, including Crowell, Edens 2, Kilgo, and Wanamaker.  The general computing lab closures reduced OIT public lab seats from 11 labs with 94 machines to 6 labs with 62 computers. Additionally, a lightly-used classroom/lab located in Gilbert-Addoms dormitory was closed. A Central Campus computer lab in Alexander was repurposed at the request of Residential Life to become a group study space, though it retains ePrint services.  Remaining general use and classroom labs were refreshed with dual-boot Macintosh computers, except for Teer and Hudson; those OIT public labs have not been refreshed pending the conclusion of an ongoing discussion with Pratt on what the school wants to do with those spaces.

Evan displayed a map showing before-and-after locations of public computing resources.  He noted that geography and the presence of nearby alternatives played an option as much as did usage data, with some of the labs seeing as little as one login per CPU per day.

With the refresh, $67,200 was saved in hardware costs in kiosk, residential lab and classroom lab savings; some additional administrative and software savings have been realized but their full benefit is not yet known.  Evan added these savings are on an annual basis, not simply in hardware refresh years.  Robert Wolpert asked if these savings were net of the cost of deploying virtual computer lab (VCL) services; Samantha clarified that it is not the "ecosystem" cost, but specifically the lower spending level on labs. Evan noted the amount saved beat the DART goal.  Alvy added based on the data, that we probably could have gotten rid of these entirely and not missed. Tracy clarified the plan was to use the savings from the first year or two from lab closures in order to build out the infrastructure in virtual labs to be ready for future demand.

Samantha added there were suggestions from students participating on part of Campus Council and Duke Student Government; Campus Council had a more aggressive proposal to remove labs from residentce halls, which OIT declined to pursue so that we would have more time to study what the impact was.  Molly Tamarkin added that the Library reduced its public computing offering by about 20% and has seen little impact.

Samantha noted that in most locations after this summer's lab reductions, the number of logins per computer per day have risen in fall 2010 vs. fall 2009. Samantha says she wonders if this is because, outside OIT's work, other groups like the Library have reduced their computer footprints as well.  Evan described the change as far more efficient use of the installed hardware base, while Tracy noted that this was not a marked increase of the average login length. Michael Ansel noted that this could still translate to up to 8 ½ hours a day on a computer, which could be intensive use. The committee had some further discussion on the lab metrics and their implications.

Mark E. asked whether students were still represented on the labs committee after the student representative passed away this year.  Samantha noted that that committee had ended its work and that the labs group was working with its routine operational processes to gather feedback on lab uses and needs.  Evan said he expected to use additional data gathered beyond one semester's numbers to drive future discussions on OIT computing reductions; Samantha added that the group continued to evaluate software utilization and needs.

John asked about the net impact on total public computers university-wide after so many groups independently reduced their base of installed lab computers; Tracy said there was no automated, straightforward way to gather those data across units, and that it had been a challenge to collect the first time. John added that mass complaints from students had not been heard in the wake of lab closures; Tracy hearkened back to Julian's recent presentation on cross-institutional directions in IT showing widespread plans to reduce computer labs and said Duke's experience seemed typical in that way.

Samantha moved on to the Virtual Computing Lab (VCL) service, demonstrating what they system looks like on her Mac.   Samantha reminded the committee that VCL allows users to bring up a standard image and run software with it.  She demonstrated that there are department-specific images for different areas, like Chemistry, the Nicholas School and others, along with generic images and those with specific software suites like Matlab, Microsoft Office, etc.

Evan noted that fall 2010 was the first non-pilot, full-production semester.  The fall saw 3,111 reservations and 3,704 total hours used by 839 unique users, with Windows systems seeing significantly more usage demand (772 Windows XP users versus 140 Linux users.)  Evan and Samantha also pointed out that the trend was for on-demand service, with 2,902 "immediate" requests versus 211 future-use reservations.

The list of the most popular VCL images includes the Chemistry Applications image, which Evan described as a real success; it was the first test of a department using VCL to specifically design their own image for a course.  He noted that almost immediately people were saying this image was slow to access, a sign of the challenges presented by scheduled versus on-demand utilization; once this was known and more images were pre-provisioned, there was better satisfaction with the service. Evan described this as a sign that OIT needs to be able to pre-provision images based on projected demand.

Even told the committee that 945 VCL instances ran less than thirty minutes; 900 between thirty minutes and an hour; 709 for between one and two hours; 402 for two to four hours; and 157 uses for more than four hours. Evan added that there are some discussions on whether the VCL pool is the right resource for long-term computer uses (as with longer logins), or whether the VCL interface could be used to reserve other kinds of hardware.

Robert Wolpert asked whether there was any resistance to use VCL due to the presence of deprecated software versions like Windows XP; Samantha said the governance process and campus communication are needed to review such questions and get feedback. Evan later clarified that Windows 7 is in the campus computer labs and would be coming to VCL as well.

Samantha discussed trends of VCL uses, and noted some spikes in usage that ran concurrently with times when some faculty were approached about VCL and tested its use.  She noted a test with Mike Gustafson for a very intensive short-term use of the system for his Engineering class, as Mike tested the system capacity by making many reservations in a small period. John Board asked if the test broke VCL; Samantha said VCL did not break but other issues were encountered, notably in how VCL worked in a classroom environment where students are using their own personal laptops to access the VCL cluster.

Going forward, Samantha identified several areas where additional work is needed. She described the need to get the word out further about the service. Samantha noted that many graduate students hadn't worked with VCL but were excited to hear about it.  Evan pointed out that VCL was now considered full production and would get mentioned more.  Alvy said that it seems like pre-provisioning is important, and asked if we ready to do that.  Samantha noted that at NC State, which pioneered and heavily uses VCL, faculty members report which days will be heavy utilization periods for their courses, which is used for advance management. Mark McCahill pointed out that besides advance awareness of demand, the VCL system can do adaptive analysis to predict the need for images.  Alvy said that you know when classes or going to happen, but that it's hard to predict when work will be done outside of class and what will drive that demand.  Molly asked whether you could look at repeat reservation patterns and offer scheduled access to students.

Alvy asked whether people could get the virtual machine to use on their laptop versus requiring access to the VCL back-end; this would mitigate some of the demand and use local hardware resources.  Samantha noted licensing might be a concern; Evan said that some software vendors are now prohibiting the virtualization of software, while other pieces of software are locked down to a specific network MAC address.  Samantha said that some of the software packages requested, like Adobe packages and video software, fall into these categories.  Evan added that the group had previously looked at having machines pre-provisioned with images for individual applications, but that makes it hard to project demand; having a "statistics" image instead with a range of applications helps to make pre-provisioning easier.

Samantha said that, as noted in the tests with Mike Gustafson as well as other evaluations, there were challenges in supporting multiple users in a fairly small classroom space.  She said that while VCL performed fine, but that the bandwidth to connect people simultaneously over wireless exceeded what the network was designed for, while the background noise of students using YouTube or other activities has also caused challenges.  Mark said that once students completed their tasks they shifted to video or other non-academic uses, taking up bandwidth.

Samantha added that it is important to build consistency between physical lab images and VCL images; she described future collaboration with local IT units, as with the chemistry software VCL image, as very important and as representing the best logical direction for VCL growth. Mark Elstein noted he had used VCL, and asked whether issues he encountered with Adobe Reader and ePrint being missing had been fixed.  Evan said those were fixed for a majority of images by mid-semester.  He added that departmental requested images are built off a base image, and that the department then builds the full image to meet their need; the final software for the Chemistry image, for instance, is determined by the department, not OIT.

Michael Ansel said you might have some software that is installed only in a lab open to a certain class due to licensing restrictions, and asked if images can be similarly restricted to individuals in a class. Samantha noted that the Nicholas School has restricted their images to only approved individuals for that reason.

Mark Elstein asked why some images saw the presence of elevated privileges why others did not. Evan pointed out that as a virtual machine, the system is deleted after use, making elevated privileges less risky. Samantha noted this would be a good question to be addressed from a governance perspective.