Minutes from 5/11/23

TEC Conference Room

 

4:00 - 4:05pm: Announcements (5 minutes)

 

Victoria Szabo:  

  • Welcome 
  • Approval of minutes from 4/13/23 
  • Motion to discontinue distribution of paper ITAC agendas (passed)
  • Next on our agenda:

 

High Performance Computing Cost Study – Ginny Schroeder and Albert Slater 

 

What it is: OIT performed a detailed cost study on the fully burdened costs to provide High Performance Computing (HPC) and other services provided by the Research Computing Group.  This analysis led to a cost per hour for CPUs, GPUs, and Storage that included staff time, electricity costs, rent, depreciation, etc.  Those costs were then compared to cloud-based offerings to validate if it’s cheaper to provide certain services on-premises or in the cloud.

Why it’s relevant: The costs to deliver compute, storage and other research computing services to campus are a key component in evaluation of where best to provide those services.  This analysis allows us to compare the costs associated with delivery of services on campus to that of the cloud.  It’s important to note that this analysis is done for using the DCC and other services for data that isn’t classified as sensitive.


Charlie Kneifel: Ginny Shroeder from WTC will be doing the primary presentation, and Albert Slater--who is also on the call--did a lot of the data center lay out work, among other things. 

Late last fiscal year (June 2022) we decided we needed to have some comparison work done on the cost to provide research computing, including all of the overhead costs, how much of the power we were consuming in the data enter, and all of the pieces with that. We engaged WTC to help with that.  They will tell you a little bit more about themselves.  Now over to Ginny and Albert:

 

Ginny Schroeder

Thank you so much, appreciate the time here today.  

 

As Charley mentioned, the purpose of the work was to identify the fully loaded costs for providing OIT HPC services and also to compare the HPC costs as you are providing them today (premise-based) vs. moving to the Cloud. 

 

We will tell you a little bit about ourselves, how we did the work, and then present the results.

 

Our firm (WTC) specializes in higher education and academic medical centers. We only accept engagements from institutions like Duke. We specialize in IT networking and telecommunications, and our firm has been in continuous operation for 40 years.  

We have been doing cost-study work for over 30 years now, activity-based costing specifically.

 

We have done cost rate and funding engagements for these institutions along with others(shows slide of institutions.) The public institutions I have circled here were specifically involved either in high performance computing, data center or cloud services as part of an overall cost rate and funding engagement for these institutions. Similar circumstances for the private institutions, but usually their engagements were beyond just data center or HPC (although they were certainly components of our overall cost studies we did for these private institutions.) 

 

We did the work at Duke in 4 steps:

 

Step 1: planning assumptions: We based our cost study on FY 21-22 actuals---that was our starting point. We looked at OIT staff and non-staff that have activity toward HPC.  We included their benefits…we included things like depreciation on equipment, data center operations and debt service…the cost was to be based on activity, consumption, or physical properties, depending on the element that we were looking at.


Step 2: Identifying the data center cost metrics for allocation.  The purpose of the metrics was to establish a way we could prorate data center costs.  

We looked at 6 different metrics:

  • Racks: We did a count of all the racks in data center.
  • Weighted racks: count of weighted racks, splitting that into the racks that were part of the 4 KV unit bays and the racks that were part of the 9KV unit bays. 
  • Rack units: We did a count of the number of rack units.
  • Weighted Rack units: Count of weighted rack units, again using the two different phases of the data center implementation (phase 1 or phase 2)  
  • Maximum kilowatts in data center based on OIT power distribution measurements.   
  • KVA consumption, based on OIT power distribution measurements.

After looking at 6 metrics, we determined the most appropriate 4 of them for cost allocation and took these metrics forward in allocating the cost for the data center.

 

Step 3:  identifying the cost basis that we would use for OIT HPC.  The cost basis had six components:

  • Staffing, including benefits. We used an activity-based costing methodology to interview managers in OIT who had staff who supported OIT HPC.  We then allocated their portion of time used to deliver HPC services. 
  • Non-staff costs directly related to HPC
  • Data center costs, where we started using the metrics we’d established (from step 2) to determine the portion that would apply to HPC.
  • Equipment depreciation schedule (Note this analysis is of cost, irrespective of funding source)
  • Debt service—what portion applies to HPC 
  • Data Center building itself (with depreciation)


This left us with a total net-cost basis of $5,698,682 for HPC at Duke.

 

We then allocated the total net cost by line of business. We identified 5 lines of business that are part of HPC:

  • GPU Compute
  • General Compute
  • Storage 
  • Consulting
  • Education and Training

 

We took the 6 categories (cost components from above) and allocated the 5.7 million in total cost across these 5 lines of business.

  • We identified units of measure we would utilize to create the unit cost 
  • We gathered the activity level for each unit of measure  
  • We then came up with a unit cost for each line of business (service cost) 

 

At this point, we have a unit cost for each line of business:

  • GPU Compute – 25 cents per GPU hour cost
  • General Compute -1 cent per CPU hour cost
  • Storage -1 cent per usable GB per month
  • Consulting – cost per hour to provide services
  • Education and Training- cost per hour to provide services

 

Step 4: Compare current OIT HPC cost basis with Cloud alternative

 

First thing we needed to determine is what would change by moving to the Cloud. We needed to determine a net cost after those changes.

 

We made 6 assumptions in Cloud comparison:

  • In the comparison, all OIT HPC services would be moved to Cloud
  • Costs supporting OIT HPC would decrease for GPU, General Compute and storage
  • Staff Costs would be reduced
  • Cost would increase for consulting, education and training
  • We agreed that we would remove all Data Center costs, all depreciation costs, and all debt services costs from HPC
  • Data center will still exist and debt services on that will exist, but we eliminated it from the cost when comparing it to cloud services.

 

We then took the same 6 cost elements (same 5.7 million) and made adjustments to HPC net cost to come up with a cost that would still remain, even if we moved the services to the Cloud.  Results:

  • We reduced staff costs by $180,000
  • We reduced non-staff costs by $1 million
  • We eliminated data center facility cost
  • We eliminated equipment depreciation, debt services, and data center building depreciation costs
  • That left us with the cost that would remain: $947,000 per year

 

Then we updated the unit cost based on the on the new cost basis.  We brought it across the 5 lines of business, coming up with a new unit cost. We did this while keeping the unit of measure and activity level the same to come up with a new net unit cost that would remain.

Since there were no changes to consulting, education, and training, the Cloud comparison was done to the three remaining lines of business (GPU, general compute, and storage)

 

Next, we established the cost basis for Cloud service, using Microsoft Azure as the gathering point. OIT gathered these costs from Microsoft.  We used the lowest-cost Cloud alternative as our cost basis, and compared to staying on prem. 

 

Microsoft Azure reserve instance was used for the cost comparison (3-year contract, 62 percent discount.) Cost for Microsoft Azure displayed in column 4 on slide presentation. 

 

Comparison Results:

Starting with GPU compute: There were 7 different instance alternatives that we gathered from Azure, based on GPUs and ram. We then layered In the OIT HPC net costs that would remain, and came up with a cost for each instance alternative.  

 

We then brought forward the compute net unit cost basis that we’d previously established, and came up with the variance between OIT HPC cost and the Cloud cost alternative.  (cost variance and percentage variance displayed on screen in presentation)

 

In all cases (except for instance number 5), OIT HPC cost was lower than Azure’s cost:

  • Instance 4 similar to the current type of service Duke provides
  • Instance 5 does not equate to the type of service Duke provides
  • Data storage comparisons: Pay-as-you-go (Azure) vs. general OIT storage  

 

Discussion:

 

Tracy Futhey: Thanks to you and the team for doing all the work around this. I’m sure there are questions.

 

Colin Rundel: What is meant by OIT HPC?   I’m more familiar with the the research computing thing which has a very different funding model--

 

Charley Kneifel: It is the DCC (Duke Compute Cluster) and the research computing infrastructureWe took all the costs with the people. Jimmy Dorff participated in many meetings and allocated the costs for the people who run the machines, management, VM deployments, etc.  Essentially all computing costs to deliver all services.

 

Colin Rundel: Not caring where the money comes from?


Charley Kneifel: That’s the important thing. When we did this, we took all of the costs. The depreciation an estimate of the cost of what it would take to replace that.  We took all of the current capacity and said if we had to buy it on Duke dollars today, here’s what it would cost (using a 6-year life cycle for general compute and a 4-year life cycle for GPUs.) Here’s total cost, here’s what it would cost per hour, along with all the other costs…The reason the non-people costs in the line for storage were larger is that OIT pays for all of that and does cost recovery. In the comparison, we didn’t include cost recovery, just cost to deliver services.

 

Prasad Kashibhatla: This isn’t informing a particular faculty member to see if it’s cheaper to go one way or another?

 

Colin Rundel: Funding models make everything way more complicated. 

 

Steffen Bass: I think I know where this is going. The total cost for the local solution is a burden that OIT takes, along with the contribution from the faculty member. The total cost to the faculty member is less than what you see here. But for the Cloud solution, the faculty member would bear the entire cost, and it would be even more expensive. Which I find fascinating, I didn’t expect this outcome.

Tracy Futhey: We weren’t sure. We thought, when we compared the prices cloud providers they were expensive. But every time we tried to articulate that, the assumption was “No, it can’t be more expensive because the Cloud is cheaper.”  This is the first time in my 20 years at Duke that I’ve asked an external consulting service to evaluate something. Not only because I’ve heard such great things about this firm and their costing method, but also because we felt like this was just a case where we needed to ask someone to tell us objectively, after providing them with all of our information.

 

Mark Palmeri: How does this take into account programs like NIH Strides? Some of these funding sources provide cloud-based resources at no cost.  That would break this model a little bit.

 

Tracy Futhey: Yes, those would always be the cheapest and best options--if we had free access all the time for everybody, we wouldn’t have had to do this analysis.  But since those are competitive, since not everyone has access, and since it takes a lot of effort for faculty to write these proposals to get access, we thought it was good to at least have an understanding of how close they are. (Even though OIT costs came out lower, it doesn’t necessarily mean there wouldn’t be reasons you wouldn’t want to pay more for the Cloud, if it provides better services.) By doing this comparison we also wanted to put this notion to bed that the Cloud services will always be cheaper than services we are currently providing, which this analysis shows are pretty cost effective. 

 

Sunshine Hillygus:  I have two questions. One is the latter point, that decision making doesn’t have to be based on cost alone; what are those other considerations? I don’t know what those are. Second, this is a snapshot based on use at this moment; what are the variables so that we can make projections and test the assumptions?

 

Tracy Futhey: Charley can you talk about sensitivity analysis maybe--

 

Charley Kneifel: Yes, we did some sensitivity analysis in terms of things like what if we’d underestimated costs by 20 percent, for example, would it fundamentally change things? The answer, luckily, was “no.”  Because the denominator is so much larger. One other important thing here: 90 plus percent of clusters are paid by grant dollars.  And those grant dollars don’t generate overhead typically.  That’s one reason why researchers like getting grants such as NSF grants. They can buy computers and keep them for 4-6 years---

 

Tracy Futhey: And 100% of that grant money goes to computer.

 

Charley Kneifel: Right, there is not an extra 60% to offset the overhead cost.  From that perspective and size of the buy in…I think we’ve had hundreds of thousands of dollars of additional computing purchased by researchers in the last 90 days.

 

Tracy Futhey: Which keeps raising the denominator, in terms of amounts of cycles we have.

 

Charley Kneifel: That’s right. Coupled with the fact that in 2008 the university built out a large data center. For the first time, we are approaching 30% usage of the data center in terms of power capacity.  We have a facility that is efficient, and we can take advantage of. 
The current state of the art for power in a data centers are 200 Killowatts per rack.  We are at 9 so this model might not be one that carries us 5 years out.

 

It also wouldn’t transition to when the funding model changes.  Mark mentioned Strides. You don’t get the benefit of having the infrastructure you had between periods of funding. And, you would pay overhead on Cloud services, for example if you used $100,000 from a grant to pay for Cloud services.

 

Sunshine Hillygus: Should we look at savings and give access to faculty who don’t have large grants?

 

Charley Kneifel: The biggest benefit of the cluster we get to use these well-subsidized
resources. Nobody keeps their computers on 24/7.  We are able to use the resource when its idle.

 

Tracy Futhey: - For example, provide free access to a junior faculty member to do the computational work so they can get the funding for their next grant. Pay it forward.

 

Steffen Bass: Take home lesson here is that DIY is cheaper than buying from a commercial provider.  The funding agencies have realized the same thing—not to give the PI dollars for setting up their own compute shop; instead, they build those shops themselves.  But it only lasts for the duration of the award.  

 

Prasad Kashibhatla: Can we do similar analysis of carbon footprints, in-house vs. Cloud?

 

Charley Kneifel: We talked about that, but it’s very hard to understand many of the Cloud providers footprints because you don’t know where they are. Where there is cheap power, some places have good numbers.  If we had a cheaper power solution with all renewable energy sources, that would be ideal.

 

Tracy Futhey: 15 years ago, Massachusetts built a big shared computing facility for all of their educational institutions which is all powered by hydro.  High power density, low cost. There might be, in higher education, big regional aggregation opportunities that we haven’t taken advantage of.

 

Mark Palmeri: Is there any risk by putting all of the eggs in in the basket of one consolidated resource?  Like if data center was flooded or, caught fire.

 

Charley Kneifel:  I can’t say there isn’t, but there are lots of safeguards against that. The disaster scenario that we see is if there is something like chemical leak that you can’t turn on power.

 

Tracy Futhey: We have a secondary data center with health system in RTP. Our main enterprise systems are linked between the two data centers. Things like AD, SAP, SISS. But we haven’t done that with the research computing environment because it would be cost ineffective.

 

Mark Palmeri: We’ve had cases (not in this data center) previously, where AC goes out in a room, a generator spikes, equipment got destroyed.  There was no recourse to re-purchase any replacements.  The grant is not going to give you that.


Tracy Futhey: If there is good news, this is with our enterprise-grade everything. We have the redundancy of all networks and power.

 

Charley Kneifel: One of the reasons we brought all research computing equipment to the new data center is was to get out of North building, which had insufficient power and cooling. You would lose nodes, it would take a day or two to go back up so it made sense to roll into the enterprise data center.   

 

Victoria Szabo: We need to wrap up, thank you.

 

4:30 - 4:45pm: OIT Data Analytics Fellowship Program – John Haws and Jen Vizas (10-minute presentation, 5 minutes Q&A)

What it is: OIT is piloting a new Data Analytics Fellowship program. This program is a natural expansion of the co-curricular experiential programs such as Data+ and Code+. With funding contributions from the Office of the Provost, the Office for Research & Innovation, the Rhodes Information Initiative (iiD), and OIT, the fellowship program will recruit new college graduates and train them to become core members of OIT’s Data and Analytics Practice (DAP) over an 18-month term.

Why it’s relevant: By providing participants with on-the-ground experience, we envision providing a pipeline to connect exemplary Duke graduates with longer-term positions both within OIT and in other Duke departments. This program also has the potential to serve as a model for establishing a pipeline for graduates from NCCU, which could help to support Duke DEI commitments.

 

Tracy Futhey: John has been involved in leading our data analytics practice for 4 years now.  They have grown quite a bit and have had great partnerships with other units. In a recent meeting the EVP suggested it would be great to find a way to help more people understand how to do analytics, so that we might have a ready pool of people who know how to do this kind of thing when the need for these kinds of  positions arises on campus. John subsequently proposed we create some kind of fellowships that would be short-term positions for recent graduates. This would then give these graduates the skills they needed to apply for more permanent positions within Duke, or other jobs elsewhere. 

 

John Haws:  It was a “watercooler” conversation that Jen and I had about this. We both brainstormed this idea.

 

For those who don’t know me, I oversee data analytics practices within OIT, and we work with a huge variety of offices across campus--Facilities, Parking and Transportation, ORI, plus a dozen more I could go through.

 

Part of the conversation that Tracy was just recounting was that there seems to be an opportunity to incubate talent for the University.  Not only could they help our group, but we could train them up, and they could leverage these data skills elsewhere the University.

 

As result of this, we are now piloting an 18-month program summer fellowship program. It provides hands-on, real-world experience for the fellows.  They are joining as our team members and will be moving throughout all the aspects of our work.  Short-term there will be overhead making this work, but one long-term consequence is that it will be adding capacity to our team.  But the real goal is to produce folks who are good at using data to make decisions, and working with data.

 

These folks are going to be here 2 summers, so there is a great opportunity to also leverage these people for our Plus programs. In a sense, it’s kind of an extension of a plus program. 

 

We have a lot of sponsors across the University helping us kick this off and funded this pilot.

 

I mentioned the fellows will participate in all aspects of work we do: managing data flows, analyzing those data, and reporting on them. We want to give them a taste of each of those aspects.

 

The idea is that they will come out of this program really good with SQL, and with an understanding of relational databases, including how you model complex data sources. They will be fluent in Python, including some of the scientific computing components of Python, as well as with data visualization. This is the bread and butter of what our group does here. We are also an enterprise software and data management group, so they will have hands-on experience with enterprise software development.

 

What I’m hoping is that they’ll not only spend time with our group, but also with other groups, such as ITSO and Charley’s automation team. And all the while they will be partnering with offices and subject matter experts across the University.

 

We’ve got three applicants in place right now for the one of the fellowships and we’ve started the screening process. Right now, we’ve limited to Duke graduates but we hope to expand to include NCCU, and maybe expand more broadly as a pipeline of talent into the University.

 

We’ve got one in position in progress right now, and we’ve gotten funding for a second position as well. We are going to test out this model of having one that is for a recent undergrad, and one that is for someone who completed their Master’s degree.

 

Tracy Futhey: Funding comes primarily from other partners, including Provost Office. They are providing the resources to get it going and try it this year. 


Victoria Szabo: How are they learning the skills? What is the strategy for educating them on what they need to know?

 

John Haws: We will use some of the same infrastructure we have in place.  For some of the introductory stuff we’ve got Roots and Colab to get them up to speed. My group is used to onboarding people, so we have a nice infrastructure for bringing people up to speed on these kinds of things.

 

Victoria Szabo: I’m imaging the gap between these “shiny graduates” and what you are doing…

 

Tracy Futhey: The gap will be less than it is between the Code Plus and Data Plus students!

 

John Haws: Part of the reason I mentioned the participation in the Plus programs is the community that it fosters.  There is the kickoff, they are together, meeting every day. We also want to leverage that as a great onboarding opportunity.

 

Sunshine Hillygus:  That was going to be my question….I was imagining how undergrads fit in?  (Looking around at the ages of the people in the room) It worried me a little bit….

 

John Board: John has made some really excellent young hires!

 

John Haws: Our recent hires, both have had around 2-3 years of experience.

 

Tracy Futhey: The market is causing us to want to grow our own.

 

Paul Jaskot: If this is successful, it would be great…this is something Ed and I have talked about, growing at the local level. …We’d like to offer our graduates a post-graduate experience for 2 years. It would be really helpful with the lab support.

 

Victoria Szabo: Thank you. Last topic.

 

4:45 – 5:15pm: Update IT Support for Research – Tracy Futhey (15-minute presentation, 15 minutes Q&A)

What it is: Over the past year, a report was created that gathered faculty input related to improving IT support for research at Duke. This information came from multiple working sessions with faculty culminating in a session in August where ITAC members voted on prioritization of the IT needs and then a final report to Academic Council on December 1. In Phase 2, each of the 6 major priorities was assigned to a working group composed of faculty and staff with insight into that particular priority. Each group met over a 10-week period to devise potential solutions to their assigned priority.

Why it’s relevant: Today we’ll hear about the solutions proposed by each working group and then discuss what happens in Phase 3.

 

Tracy Futhey: Zipping through the first couple of slides here (remembering that Jenny Lodge and Joe Salem are my co-sponsors in this effort.)  We’ve talked a lot about Phase 1 (faculty-driven process) in ITAC over the past academic year. We came out with a report and series of 6 findings. We are almost finished with Phase 2 now (service partner-driven phase.) Next we will start Phase 3 (institutionally-determined phase) where we determine how we pay for it, and where proposed services will be located.  There is a feedback loop here; even though Phase 2 was service-partner driven, we have faculty champions in every group. 

 

Phase 1:  

  • Took 7-9 months to establish the 7 Working Groups (WGs) by discipline or domain.  
  • We identified 3 thematic areas: people, processes/structure, and technology.  
  • We had 6 common findings and 10 recommendations.
  • Findings were released publicly on December 1.

 

Phase 2:

  • Took only 4 months.  
  • Each group had 10 meetings over the semester.  
  • Each of the 6 working groups corresponded with one of the 6 findings from the original report.  
  • There were around 50 people involved; service providers and faculty champions.
  • We had our open house 3 weeks ago—all of those “dots” and preliminary recommendations. These have been reviewed by the sponsors.
  • Each group was sponsored by me, Jenny, Joe, as well as Deans and other members of Duke Administration.
  • We came up with 39 proposed services and solutions.

 

Next slide.  Please note that these are preliminary and are being used to let you know where we are going. We took raw voting and came up top 2-3 items in each finding, looking at what would be most effective at solving perceived shortcomings.

 

Categories

A.  People (we don’t have enough people)  

B. Separate IT infrastructures (between campus and health system, hinder collaboration)     

C. Security and compliance (too one size fits all, requirements create limitations to research)

D. OIT services (like VMs not being expansive enough, not enough GPUs, etc.)

E. Many technical solutions, with lots of constraints (takes a lot of effort to navigate through) 

F. Storage solutions (don’t match research life cycle that’s needed)

 

I’d like to do a quick 3-minute flyover, applying one of the processes from Phase 2 to one of the groups.
(Example: Group F slide: “Current storage solutions don’t span research lifecycle or university”)

 

  • Every group started with a list of sponsors and leads.  Notice there were multiple leads, so that the problem doesn’t belong to just one person. Teams comprised of leads, service providers, and faculty champions for a collaborative approach.
  • Down below are deliverables showing what we wanted the group to come up with over a 10-week period.
  • Next slide gives an example of the readout from each of the groups. On the right side is the list of the proposed services. Those were then reviewed and discussed with sponsors.
  • The next slide may look familiar for those that were here three weeks ago.  We took those solutions and had you all vote which ones were important by placing dots accordingly, as well as post-it notes.
  • One of the ones that was very compelling was the need for data management tools.

 

Next slide-Now we are trying to figure out how to work with this. Note this graphic is a illustration of what we are planning and does not yet have final data.

 

Looking at graph.

  • The X axis represents faculty interest, with those items considered most important at far right.
  • The Y axis is what the sponsors think are the strategic value of things.
  • From the graph you can see those items that are considered very important—Item 1, Tools to manage data over its research life cycle as an example here.
  • Size of dots indicates how expensive or complicated it would be to implement
  • Through this process we get a visual sense for how important something is, how high is the demand as well as cost.
  • We plan to have one of these graphs for each of the 6 groups.

 

 We are in May, our close-out month for Phase 2.  

  • We’ve refined list of solution proposals for the WGs. We want more faculty input.   Watch for a survey coming to you soon.  It will list the various service proposals, set by default at medium priority.  Items can be moved to the right if high priority; if that is done, another item should be moved to the left so that all items don’t end up being high priority. 
  • We have one more group that has to have its readout.  
  • We will also need to nail the costs down.
  • We will have a report at end of month, then we will go into Phase 3 (June-August.) 

 

End of slide presentation

 

Prasad Kashibhatla: Go back to the bubbles.  Looking at 1, 4, and 6.  They all sound very similar to me.  Are they (sponsors and faculty) simply understanding the same thing differently?

 

Tracy Futhey: One talks about tools to manage my data. I want it to be easier when I go from active data to cold storage---

 

Charley Kneifel: Or I want it to be easier when I get new data from whatever source to move it in. The acquisition of the data--getting it in and making it useful to me to be analyzed.  How do I help with data lifecycle so I understand where my data is and move it across the continuum to meet requirements for long-term storage?

Tracy Futhey: I’m remembering one of our meetings someone (Lindsey?) talking about how terrifying it was the day that they had to “push the button” to migrate storage.  “What if I got something wrong…”

 

Prasad Kashibhatla: My question is, are sponsors and faculty understanding it to mean the same thing?

 

Tracy Futhey:  I think so. The reason we had the idea that sponsor feedback might be distinct from faculty feedback came from the readout with Dean Lynch.  He was noting that there were some things that the faculty indicated they might not need today, but that he (from his vantage point as Dean and sponsor) could see their value 3 years down the road.  Highly strategic things that could possibly be low cost and should be considered even if faculty don’t see them as being important.  Data integrity, for example—certainly faculty care about it, but maybe not the same way as Deans and other University administrators.

 

Victoria Szabo: Is there a way to educate with the survey without making it horribly long and complex?

 

Tracy Futhey: The survey, there are initial 39 items starting out as “medium.”  For each item, you can hover over it to get a description. Maybe that will help?  Can we test drive it in this room before we give it to the rest of the faculty? [Ed: Subsequent to this meeting, a test drive of the survey with a social sciences faculty resulted in the 39 initial items being pared down to 21 through consolidation of redundant items across groups, and elimination of the lowest rated items from the open house in April.]

 

Victoria Szabo: For example, caring about tools, verses being concerned about data integrity…what it the rationale behind that?

 

Lindsey Glickfield: I would put data integrity in the category of tools. Some of these things might be correlated? it might increase the weight of one, and decrease the weight of another?

 

Sunshine Hillygus: Thinking about “tools to manage the research data over the research lifecycle.” If compliance doesn’t get their stuff together, then none of this is going to matter in terms of tools.  We need to be careful that people aren’t conflating the technology tool with the process. This is particularly relevant for our working group where we said “process is not what we’re supposed to focus on, we are focusing on the tools.” Who is focusing on the process?  The faculty complaints are less about technology and tools, and more about the compliance process.  

 

Tracy Futhey:  We can’t fix problems with the process just by adding tools.  But if tools can resolve issues like data continuity services, then we should list that as a subset of the tools.

 

Charley Kneifel: The question on data continuity is really tied to “how many backup copies does a researcher need?”  A graduate student, for example, who did something 7 years ago, do we still have a copy of that change that they made?  Or, is it “this is the core data set, these are the core pieces, we have good backups and details, what do we need to restore?” And who wants to pay for the services— offsite backups and everything that goes with that are important for parts of your data sets (along with regulatory requirements to keep parts of those things as well.)  But you still have to move things around so know where they are, and what has to happen. They were broken out differently on that, I think the data continuity and backup solutions were a later addition to the list, but they were on there when we did the voting.  I stood there the whole time, nobody asked about what was meant by that.

 

Tracy Futhey: There were folks on the team, from the libraries, who live and think in the data world. It may have been the phrasing of this reflects their innate knowledge of these topics---

 

Prasad Kashibhatla: For example, I ranked number 7, “provide storage that meets these needs…”

 

Charley Kneifel: But if you only had the storage, then you have to make all the decisions about where to put it.

 

Tracy Futhey: You need the tools to move it around and the decision--

 

Ed Gomes: I wanted to respond to Sunshine. We talk about toolsets to add some automation and part of that automation is notification. The tracking of who needs to do what, in terms of process.


Sunshine Hillygus: That is true, but we still haven’t sorted out the entry point.


Ed Gomes: Understood.

 

Victoria Szabo: And you said all the other ones are in progress, getting to the point of being personalized?


Tracy Futhey:  Yes, this was just here to give you an idea, and for you to give feedback. Feel free to jump in now. 

 

Victoria Szabo:  And we were double-checking to see that this makes sense to the non-expert, who is a part of this ecosystem. How to make the survey more legible to those who aren’t already specialists.

 

Tracy Futhey:  At this stage, we are going to give the survey to the faculty who participated in the first and second phases, thinking those are people who know enough about it.  And starting everything at “medium” so that people will have to make fewer decisions.

 

Victoria Szabo: Any other comments? Every time I see these presentations I’m impressed with the range, with the consulting, and the iteration…

 

Tracy Futhey: Thanks to all the folks in the room who have worked very hard on it!

 

Victoria Szabo: I think we are adjourned.