Duke ITAC - February 8, 2007 Minutes
Duke ITAC - February 8, 2007 Minutes
Feb. 8, 2007
Owen Astrachan, Pakis Bessias, John Board, Ken Hirsh for Dick Danner, Brian Eder, Nevin Fouts, Tracy Futhey, Susan Gerbeth-Jones, Michael Goodman, David Jamieson-Drake, Roger Loyd, Dan Murphy, Tim Bounds for Caroline Nisbet, George Oberlander, Lynne O’Brien, Mark Phillips, Dalene Stangl, Molly Tamarkin, Christopher Timmins, Trey Turner III, Tom Wall, Robert Wolpert
Guests:Tallman Trask, executive vice president; Kate Hendricks, university counsel; and Chris Cramer, Klara Jelinkova, Ginny Cake and Pat Driver, OIT
Start time : 4:07
I. Review of Minutes and Announcements:
Tracy Futhey – Last month Billy Herndon announced that we were refocusing Pat Driver’s job to give her more time to take on data center management, plan for new data centers, and help with other things going on around campus. So we carved off a piece of what had been her job around computing operations. We just filled the computing operations job. Carl McMillon, who has been here at Duke since 1979 in the Health Center, will take responsibility for that. He and Pat have worked together for years. Carl will be coming to OIT on March 1, which will give us some additional capacity where Pat has been constrained.
Ginny Cake – We have filled the senior project manager position. Lew Kellogg starts on Monday. He’s already started working on an important initiative on data storage. I’m sure you’ll get to meet him later.
John Board – Congress has changed Daylight Savings Time rules.
Klara Jelinkova – The good news is that all the central IT services are in good shape. Debbie DeYulia is putting together a Web page where people will be able to check the status. Handhelds need to be patched, especially if you’re going to synchronize with your calendar. One problem we’ve found is that for Windows workstations the patch from Microsoft isn’t critical, so it won’t be patched automatically. People have to make the effort. We will have to do more outreach to get the word out.
March 11 is the date.
Molly Tamarkin – This weekend, A&S is migrating from Meeting Maker to OIT’s dCal service. We anticipate it will go smoothly and that on Monday everyone will know where to go.
II. General discussion on technology issues – Tallman Trask, executive vice president
Tallman Trask – I’m just here to listen and answer questions. I have no opening remarks. We have a series of issues, fortunately none are catastrophic. We can talk about data centers, administrative software, research computing.
John Board – Data centers was top on our list. We’ve had a number of updates on them in the last 12 to 14 months. What is our status?
Robert Wolpert – Where will they be?
Tallman – There is need and there is want and universities are driven by both. Part of need is driven by an event that might occur tomorrow or might occur five years from now. Part is the need for the Health System to get out of the Bell Building. It might fall down, or it might be demolished to make room for another building that hasn’t yet been funded but might be tomorrow.
We’ve all agreed that somewhere around the core of campus, some form of shared machine room would be advantageous, but that’s unlikely to be enough to get us through the long run. So we’re going to need to think about more.
We’re also of the view that a few big ones are better than lots of little ones.
The current discussion is around the basement of the CIEMAS building, which I hope we’ll have the final discussion about next week. But I’ve hoped that for the last several months. The dean agreed that 12,000 square feet there would be the next space. There are some issues about where some things would go there. We’re working some things out. Any deal to work them out is a deal, so we’re still dealing on the margins of what she would get for giving up the building I paid for in the first place, but that’s the target. I’m hoping that we can come to some conclusion.
I’ve already authorized the preordering of the heavy equipment in terms of power supplies to get there. What then will become of the joint OIT/engineering machine room down the hall is a little unclear.
As for the next piece, there is general agreement, I think, that we should build a shell not knowing when it will be upfit. But it could be in one of the basements of one of the buildings on Central Campus and it should be in the 12,000 to 20,000 square foot range. We’re currently looking at one of the buildings next to the power station. We might as well take advantage of the proximity. So the plan is to try to work through a building on the east side of Anderson, right across from the entrance to the gardens, where there’s good tunnel room and a power supply.
Given the fact that the CIEMAS space was not conceived of as machine room space, we’re probably better off to think of a Central Campus spot as the high-end permanent site rather than try to figure out how to fit it into engineering, unless there’s some urgent need. That basement shell is in the budget of the existing Central Campus. The fact that your $3 million item is in a budget that’s $3 million over budget should not bring you great comfort.
John – Twelve to 14 months ago, the initial plans were to look at the Telcom Building. What about that space? We’ve seen some proposals for some parts of that building.
Tallman – Where I came down was, other than the fact that it’s connected to things, the building doesn’t have much to recommend it. It was my first offer to the dean. She didn’t like that. If RENCI goes, that takes part of it.
I think we have some long-term issues. I just got nervous about taking out the basement floor of an occupied computer center while it was occupied, and how it would all work. When all was said and done you ended up with a fairly expensive, bad building. You’re better off if you can find the opportunity to do it in a better way. That was a throwaway building. It just doesn’t have the kind of infrastructure in it and it doesn’t offer much. The only thing it has is it’s in the right location.
Robert – Is there any concern of electromagnetic interference with a center next to the power station?
Tallman – Yes, and we’ll have to be very careful about that.
Robert – If this were to open tomorrow, we’ll have more than 12,000 square feet. Do you have in mind some kind of allocation system?
Tallman – What I have in mind is to say that other machine room is in need. The Bell Building is not going to go away next week. This is 12,000 new square feet. We’ve obviously committed to the Health System. The problem they have is they want to build additional beds and operating rooms for which they have to go to the state and for which they haven’t applied yet. So that will take a long time, then an architect, and a builder. The Bell Building needs to be vacated sooner rather than later.
Tracy Futhey – But it’s not too early if people have requests to get in line. Pat is on point to receive those expressions of interest.
Tallman -- This will come and within the next couple of years there will be another one, and we don’t really want to move people around and around. So if you’re doing OK, wait. Otherwise, go ahead and ask.
John – It’s budget time. People who know they face an urgent need, should they budget for that.
Tallman – If you’re planning to expand in the next 180 days, I’d change my plan.
Molly Tamarkin – Can we tell those people who have machines to go away?
Tallman – If there’s not some urgent catastrophe, we’re trying to sort this out and it’s clear we need a couple of big joint solutions. I don’t want to say don’t invest in machine rooms, as long as you understand that you may end up paying twice.
John – But you’ll have an appropriate forum for deciding who has the urgent need?
Tracy – Now that we’re ordering equipment, some of the space will be online within a year.
Tallman – Much of the equipment is already ordered, so the question is what part will come online first, and how many pieces. The first piece will be about eight months after we authorize construction. I don’t want us to fall into the trap of saying all people who’ve let their machine rooms absolutely fall apart and are now urgently in need get first shot at the space.
Molly – I feel like I have a car and I’m trying to decide how much to spend on it. I can spend a bit of money, or spend a lot of money to build some central resources within A&S. I’m trying to decide what to do with that.
Tallman – You’ll have to decide, do you want to run your own machine rooms or not? And I don’t see us saying no you can’t make that choice. But my guess is, half the schools will say yes and the other schools will be more complicated, not so much for technology reasons but for historic and political reasons. The deans are going to have to figure this out.
John – Will cooling be an issue. Is there enough chilling?
Tallman – There’s a lot of chilling in CIEMAS. We’re also looking at the power plant from across the street, so even if Duke Power bails on us it might be a good idea to have that for backup.
John Board – For our next topic, research computing is always fun.
Tallman – In 1979 I knew what it was. In 1989 what it used to be wasn’t what it was when I did it. Some day a new form of research computing will emerge and I’m told by people that that is happening. And I’m told by other people that Duke should do something about it. That’s the extent of my knowledge.
In terms of massive scale stuff, obviously the machines scaled faster than the work did for 98 percent of the people and that’s sort of turning. What do you do about that? One of our institutional bets going into this was to say we’re not likely to have the high-end capacity in the basement of CIEMAS. So the ability to get to it is more important than the ability to own it.
It’s important to a relatively small number of people, so access was more important than figuring out how to handle it. But we’ve seen more projects over the years, and the local scale needs to be bigger than it is.
John – The exciting and frightening thing is that even an assistant professor can afford 128 processors. They show up on the doorstep of one of the people in this room and say plug this in. How much of the central space is allocated for this?
Tallman – There are certain systems we run that need to be hardened and reliable. In my view that’s administrative systems and that’s research. In my view it’s not all administrative systems. Almost no administrative data in any university has any time value at all. For instance, how much money is in the budget? Well I told you how much a month ago and you know how much you’ve spent.
The things that are important are paying people, controlling building access, some research applications, like those dealing with live objects. So I think we’re going to have to say it isn’t the administrative research part.
I can imagine a machine room where, within economic reasonableness, we have done everything we can – that the infrastructure will operate except in the event of a direct nuclear hit. In other places, it’s still good but everything isn’t three times redundant so there’s risk, and we’ll charge you less to be there. But I’m still waiting for what it is we need to buy to respond to the need for research computing. People have what they need, the question is how to manage it.
Robert – Where to put it, how to cool it.
Molly – Sometimes people develop their research based on research that hasn’t been updated. The other issue is that research by its nature should be at the edge, so the ability to centralize is hard. If they’re doing something that’s in the lead, it’s unlikely to be done elsewhere.
Tallman – If you could capture all the cycles people aren’t using, you would end up with at least as many cycles as you have used. But what’s the practicality of being able to do that, if they only cost $500 apiece? There’s an economy of scale. There are higher-end devices that if they could be pooled would help. But something that would seek out unused cycles all over somewhere like Duke isn’t practical.
Klara Jelinkova – Research computing is becoming more important because of reporting and archiving needs and grant requirements.
Tallman – We recognized that four or five years ago, then we did a brilliant job of managing it. We have to go back to that and find a way to manage that and at what level. We’re saying, it’s your machine but trust me I’ll keep your data. Well I forgot to tell you the power went out and you were on the front line and data was lost.
Klara – It goes beyond the machine room and the compute cycles. There are other issues.
Molly – What is considered to be infrastructure used by granting groups, and what are Duke’s expectations?
Robert – What happens to the 54.5 percent?
Tallman – IT supports some forms of infrastructure and not others. I always have to remind people that the “R” in ICR stands for reimbursement, not revenue. It’s to reimburse you for money you’ve already spent. We spend a lot of money for infrastructure and some of it comes back to the university. I would guess that we spend a lot more on infrastructure than on the contracts we collect. The question isn’t why aren’t you giving me money, it’s why am I giving you more? The answer is because you need it.
John – The electrical cost of running these computers ends up, over the life of the computer, costing more than the computer itself.
David Jamieson-Drake – What government expects us to keep track of keeps growing. So a big part of research computing is deciding what infrastructure to invest in. How would you characterize the quality of the conversation between the research teams at Duke and the people who provide infrastructure?
Tallman – It’s better than it used to be. I think it will be in a few years a more serious issue in a different form than it has been. We bought mostly the right stuff and it’s mostly worked, but it has a natural life to it.
There are some very complicated questions about what’s the next level of technology and communications. Who ought to think about those things and on what scale? We agonized about Bell Tower, about whether to just go 100 percent wireless, and we finally chickened out. And I think that was the right answer. That’s a different bet campus-wide. I don’t think that’s on the horizon of the next decade, campus-wide.
The one thing Tracy and I have been trying to do is make long-term bets with long-term players. Not low-bid modems, then find out someone else has something cheaper next week. I think we’re looking for a series of experiments funded by other people’s money to figure out where to go with this. And given that the most logical partner is 15 miles away, there are some real opportunities to find out some things.
III. Data Centers Needs and Planning - Michael Goodman and Molly Tamarkin
Michael Goodman – Data center at CIEMAS was built with enough power to support the racks and servers Pratt had. Another 100 nodes were dumped on us without our knowledge. That put us in crisis mode. Things have changed since then. OIT has helped upfit the room. Now we’re pulling out the 65 kva Pratt UPS and putting in 130 UPS.
John Board – The Pratt machine room, part is being used by the medical center, it is being used as the default central site.
Michael – We’ve also installed a transformer in the data center, a 150 kva transformer to offload the machines that didn’t need gold-level treatment. That gives us extra breathing room.
We pulled 85 nodes off of the UPS, which brought us to 76 percent. We have tagged some other servers to move off. Our current needs, we don’t really have any. With this upgrade, the power problem is going to go away. We’re adding another 65 kva and splitting some of the nodes on the DHTS side over to another UPS.
When it comes to power in that room, with the new UPS, we have some proposed dates on when it can get going – the 20th or 21st.
John – How many more racks can you handle with floor space?
Michael – Pratt has 48%, OIT has 52%. The Pratt section is full. But the racks we have we can hold some more. We have 120 or 180 spaces.
John – What we have is good, but we look forward to the day the OIT equipment can move out, which is Pratt’s plan.
Molly Tamarkin – Some of you may remember the server room show from the fall. We convened a server room committee this fall to roadmap our needs. (Molly hands out report.) Essentially what we have are 18 separate data centers, any room that has servers in it, with a collection of 76 racks. We have no extra room. We have one renovation project under way. We’ve got some rooms with just a server.
Right now we need additional space for five racks. They’re in poor locations. Then we could retire five rooms. We also would like to have some buffer, so space for four more racks would be nice. We also may be migrating some services to OIT and offloading the problem to them. Hopefully we’ll be able to at least reduce our admin server needs.
So we’re really seeking space for an additional nine racks. I’ve been thinking of more renovation, but I think we won’t seek that if more space will be opened up in the next year.
Tallman – What about French server space?
Molly – It’s nowhere because it’s expensive. It’s a large space and it’s not something A&S would think about doing unless we were partnering with OIT. There’s another server room in French that’s not live yet but will be soon, but that’s designed for computational chemistry. When they planned that room they didn’t add up the cooling needs correctly, so we would need to add that. That’s why chemistry is still where it is.
Tallman – Who’s paying the M&O on Gross Chem? It’s now up for question.
Tallman – I note a lot of discussion about availability and machinery. Do we have a policy about whether people can take data facilities off campus and put them in spaces that never turn out to be what they are because the landlords are always chipping them up? It’s never clear to me, do people have to ask to do this?
Molly – In the case with the Mill Building, SSRI is there and they have servers and asked where they can put them, so they did that. That’s why we have all the different data centers we have.
Tallman – So I understand your problem with French. I’m willing to make a one-time donation to A&S, but the trade is, you have to get rid of all the substandard machine spaces that were created because there was no good answer. You’ll have to move. Is that salable?
Molly – Absolutely. Folks who are running backup like to have access so they can change tapes, but there are other ways of managing that.
John – Not only is it good, but it’s so close to ideal that one wonders if it could ever happen at Duke.
Molly – In the Mill Building they have three racks for SSRI.
Tracy – You’re talking about more than $100,000.
Tallman – If you two will work out a plan for this, I will immediately grant you a loan even with the understanding you may not be able to pay it all back, if you will accomplish some institutional good.
Brian Eder – You talked about other space off campus. Does any of this play into that?
Tallman – We have gone all around RTP looking for high-end machine rooms that are vacated and haven’t found them. The big mystery is Lenovo because they’re about to vacate that building on 54. I can’t believe IBM didn’t have machine rooms. We’re told there aren’t a lot of spaces available. There aren’t any spaces for rent or for sale.
Data centers can be anywhere, but there are complications to that even beyond what Molly was talking about.
Michael Goodman – What was the space you were looking for?
Tallman – We’re looking for the biggest space with cooling, with minimal office space around it.
IV. eDiscovery, federal lawsuits and Duke IT – Chris Cramer and Kate Hendricks
Kate Hendricks – What brought this up were some changes in federal rules regarding electronic discovery, the federal rules saying what we need to preserve when it comes to electronic information. Security offices on campus and at the health system have been meeting with the counsel’s office to figure it out.
Electronic evidence has been around for a few years and courts have been dealing with it ad hoc. There have been some scares for corporate cases because emails and other data have been lost because of inattention. There have been huge fines and liabilities, ones that dwarf the issue in question.
The federal courts have been studying ways to come up with rules for eDiscovery. So all they’ve done is they’ve put in a provision that says when the parties have their discovery conference, that’s when you discuss electronic evidence. To the extent it’s requested, the obligation is to provide the evidence that is “reasonably accessible.” We have some ideas about what is and is not reasonably accessible. So when we have a lawsuit and go to that meet and confer session, that’s going to be the easy part. We may end up going to court for rulings on whether what someone wants is reasonable. This just came out Dec. 1, so case law will be coming up with how far you have to go.
The harder part is that when you have notice of litigation, you have to take action to preserve evidence. This has been around, it’s just become more complicated with computers. It’s sort of like Y2K when you have experts who want $50,000 to tell you how to avoid problems.
Every seminar I’ve been to says if you have any hint of litigation you have to start saving evidence. Well, that could be firing an employee. We’ve got 25,000 employees. You’ve got to preserve all your electronic records. If we did that, all of your computing systems would be disrupted. In the past year whenever we’ve gotten a lawsuit, we tell our internal clients to preserve this information, and give them a long technical list of things.
If we sent that to the Department of Art History, they wouldn’t know what’s going on. We’ve started a litigation hold letter, a standard letter. We’ve asked IT to help us come up with a litigation hold letter that’s feasible and that will not disrupt users.
Chris Cramer – One part of letter said you have to stop the tape rotation process. So the letter says if we have any reason to believe that there’s any data on the tape you have to find a way to preserve the data – either save it somewhere else or pull the tape.
George Oberlander – In practice it’s not workable, if you lose something with that degree of liability.
Chris – If we’re talking about a lawsuit where Tracy has fired me, most likely we’re talking about Tracy’s and my emails and files. As opposed to saying everyone at ITAC may have information and has to stop backing things up. It’s an issue of reasonableness.
Part of this conference with opposing counsel is to say what’s reasonable. The goal is to contain it to be reasonable.
Brian Eder – Minus a lawsuit, is there an expectation that there will be a retention policy?
Chris – There are a few things to do, one is to get standard template for a hold letter. Another is getting the process we’re going to follow documented. The third thing is, we need to have a retention policy for anyone running a system. That doesn’t mean the university as a whole needs one retention policy, but everyone needs a documented system so the counsel’s office doesn’t get in a bind.
Kate – More importantly, it would be good to have a retention policy and have the time be as short as possible. If everything went away when you delete it from your computer, that would be find with us. If you keep tapes for 20 to 30 years, we’ve got to produce that stuff.
Chris – One thing to think of is if you make level 0 backups, then retain them even when you’ve made another, all of those level 0s are discoverable, indefinitely.
Statement – So it’s not just having a policy, but doing what your policy would be.
Kate – Yes, you need to follow your policy.
John – But it’s OK to say we don’t back things up.
Kate – As an example, departments throughout the university have different policies for saving hard copies. I think the retention period should be driven by business reasons, but it should be as short as possible.
Nevin Fouts – We’re discussing several storage backup services, including personal backups where people could take a snapshot of their laptops or desktops. Would those backup stores be subject to this?
Chris – We looked at, at Duke how do we define devices and data (email, calendaring, etc). on a PC, Blackberry, laptop. There are also temporal designations and the scope of where it lives, at the university, school, department levels.
Robert Wolpert – This needs to get down to the departmental level with a multiple choice, how long do you want to keep it, decision sheet. Do you have a plan for this?
Chris – We’ve decided we need to do this, but we haven’t come up with a template or a plan.
Kate – I would also envision some in-service training for CLAC, deans, etc., and the users of the systems and the business owners need to understand why. If you have a template it makes people think in the way you want them to instead of leaving it wide open.
Chris – What I worry about is if I start constraining your thinking about it – it’s really any data so if you come up with a new device and it’s not in my template, you may not be thinking about it.
George Oberlander – Thinking about this operationally, I’m backing up about 10 departments with a wide swath of information, all merged together on common backup system. I have to imagine what the largest common denominator of backup is?
Chris – All I’m asking for is you go through and document what you do. There’s a goal to minimize the time, but we can work on that later. For now just document it.
Kate – For the health system, they have a zillion different retention policies. If you’re dealing with the medical records of babies, it’s 21 years, that’s just law.
Dave Jamieson-Drake – We may need to deal with the university archivist.
Brian – As for individual retention policies, if you don’t back up centrally but the individual does back it up, does the individual have to provide that information?
Question – Whose equipment and data is it? Isn’t that the key?
Kate – If we give people the ability to back up their own laptops, that’s not unreasonable.
In any litigation, we’d be going to people and asking for it.
Chris – One thing we’re doing in this process is saying, this need to generate a hold letter is driven from the counsel’s office. But as soon as it makes sense, the right IT people will get involved to determine what information is available, working with the school or department.
Klara Jelinkova – It’s a data management process, not a hardware management process. Are we really yanking tapes out, or are we making a copy of the data that’s on the tape.
Chris – One tricky thing is in some seminars what they say is you must yank tapes. What we’re coming back with is you must pull the data. My goal is to say we must do one thing or the other. We should do the right thing but we have to preserve the data.
David – Does that break the chain of evidence?
Chris – It’s already a copy if I’m pulling it off the tape.
David – You wouldn’t say to someone charged with a murder, please bring us your gun.
Kate – In this situation, we’ve got the gun because we control the computer system. In terms of protecting us – say an employee is being fired, we have an obligation to preserve that information. If I call Chris and say take a snapshot of that data, we’ve preserved that data. That seems reasonable to me.
Chris – The worse thing on your scenario, we’d have to pull the disk drives from Klara’s system.
John – are you going to bring us samples of these letters and later this menu of retention choices?
David – Could you also come up with a decision tree of best practices?
Robert – It’s not the IT department’s call. It’s the business practices of the departments.