December 7, 2023 Minutes

ITAC MINUTES for December 7, 2023

Allen Building Boardroom

Called to order 4:03

Victoria Szabo – We will have an end of the year reception following today’s meeting.

4:00 - 4:15pm: Research IT Needs Updates – led by Tracy Futhey (15 minutes)

What it is: Phase 3 of the Research IT Needs process has been underway since July, with a focus on funding strategies for the proposed services.

Why it’s relevant: Tracy will share some updates from phase 3 and will be discussing what the plan ahead looks like.

Tracy Futhey - There are no slides for today’s discussion. For the last five months our focus has been on how to get the funding and make the project work. We have had a diligent effort, led by the finance team, and others including Evan Levine. The group meet weekly, sometimes bi-weekly with a larger group to drill down on a series of opportunities across the different research needs and services.

There are twelve services that we recommended coming out of the second phase. Those services fit into three service clusters.

One concerned getting more resources and people to help the researchers, through technology, data usage or other needs.
The next service group concerned the computational environment and providing more technical support for the data rich environment we need here at Duke.
The last service cluster was about security and compliance. How can we support this area better?

We categorized all these services and calculated costs associated with them. We described those as costing between $3M - $5M. At scale, closer to $5M a year. In talks with others about financing, we decided we needed to scale up over a few years to a $3M effort, potentially in the first three years.

Metrics would be put into place to ensure we are on the right track with each service before we add more financing. We spent lots of time planning and considering other ways we could potentially fund it. We thought through not just allocations, but what might be directly funded through grants. We looked at opportunities through philanthropy and other additional funding ideas. At the end of the day, we needed everyone involved to agree that there is a risk if we proceed with this project, without knowing how everything will get funded. But it is a greater risk if we continue to delay or not move forward with the project until the time we do figure out all the funding. We agreed to move forward and that we will find a way to support it.

There is an implementation group, with Evan, Ed Gomes, Tim McGeary, Rebecca Brower and others that have been meeting over the last six weeks to think about how we phase out the timing of the positions, which ones we hire when and how to build the job descriptions.

We expect by January to post six or seven positions, some in OIT, some in ORI (Office of Research and Innovation) and some in the libraries. These represent the first intended hires toward a consolidated set of services.

It’s very encouraging that we have broad support. We do need to be able to show what has been successful in order to scale up financially. One thing we were encouraged to do is to be thoughtful about how we roll out the message. Over the next several weeks we will come up with a holistic communication plan, getting information out to the people who need to know. We will look for researchers who are early adopters, who will be willing to share quotes or insights on how well the process works. We look to this group for continued guidance and ideas.

Victoria Szabo – Is it part of the proposed rollout that the costs for people be incorporated into grants? Is it a core strategy for funding?

Tracy Futhey - It’s one of the anticipated ideas for scaling. Some people may be funded based on budgets, and then some may be against a grant, depending on the role.

Steffen Bass– There are two hooks for that. One is that ORS already provides templates for data management plans. We can add some language into the templates that suggests this kind of construct. The other is to ask the grants managers to discuss with their PI about whether they need an IT person for data management. The grants managers are where the decisions are being made.

Tracy Futhey – We talked about segmenting the communication, then directing it differently for each segment, such as grants managers versus researchers.

Robert Wolpert – it’s a tough sell, because of concerns that funding for other projects may be affected.

Tracy Futhey – Presumably, that’s the PI’s decision of what they wish to spend their funds on.

John Board – Drawing the line on what’s in the Entitlement funding, making sure that everyone has a compliant data plan is important.

Matt Hirschey– On the School of Medicine side, pre-award work vs award work is a point of contention and a big challenge. Oftentimes, you will need statistical work that goes into the grant, but the grant doesn’t cover the funds needed to have that work done. Different information groups across campus are absorbing that cost.

Tracy Futhey – That’s an important point, but needs to be differed until we get current issues resolved. How do we support people with active research and active project work? We can discuss the other needs after we resolve this.

Victoria Szabo – It could work well if you had core funding to pay for the pre-work, with the understanding you are putting wording on the grant for funds for that.

Tracy Futhey – If what we do next is very successful, then it will be wonderful to discuss what else is needed.

Sunshine Hillygus – The funding structure gets more complicated when the teams aren’t core. We don’t have a model that passes along cost. It’s a challenge.

Tracy Futhey – This process has gotten us to a better place for a conversation in how we all work together. We are now better able to work together for future projects.

4:15 – 4:30pm: Update on new LLMs in Microsoft Eco system @Duke – Charley Kneifel, Jeff Volkheimer (5-minute presentation, 10-minute discussion)

What it is: Microsoft Copilot for Microsoft 365 combines the power of large language models (LLMs) with your organization’s data – all in the flow of work – to turn your words into one of the most powerful productivity tools on the planet.

It works alongside popular Microsoft 365 Apps such as Word, Excel, PowerPoint, Outlook, Teams, and more. Microsoft 365 Copilot provides real-time intelligent assistance, enabling users to enhance their creativity, productivity, and skills.

https://adoption.microsoft.com/en-us/copilot/

Why it’s relevant: MS is providing access to Large Language Models (LLMs) that can leverage your data stored in MS 365 to help you with your daily work.

Slides from presentation: https://duke.box.com/s/h1fvttdl06179itlrjeaz6xdnl6ivn6p

Jeff Volkheimer – let me introduce myself. I’ve worked for Duke Health for 20 years in several roles, including the web team, our service management office and other teams throughout the years. Effective this November, I am now the Senior Director of Collaborative Technology Partnerships. My focus is Duke Health’s relationship with Microsoft and collaborating with Duke University.

Charlie Kneifel and I will present a high-level overview of Copilot. Copilot is Microsoft’s brand for Artificial Intelligence in its products. I’m not sure how many of you have tried Chat GPT, but it’s very common now. It’s becoming critical for research and very useful for general office work productivity.

This talk will focus on the ecosystem within Office 365. There is another section about research and development using Open AI studio which has been rebranded to Copilot Studio. Microsoft is currently changing product names.

Matt Hirschey – Microsoft owns GitHub and GitHub owns their own product called Copilot.

Charley Kneifel – We aren’t talking about Copilot for GitHub right now.

Jeff Volkheimer – A number of things are named Copilot and Microsoft is rebranding several products. What’s important to know for this conversation is what people have access to today and what they may have in the future. Today, with our MS agreement, everyone has access to Bing Chat Enterprise, which will also be rebranded as a form of Copilot in the future. It is essentially a large language model with web access that protects any data that you put into your conversation with the model. That means if you need to optimize a SQL query and you have some sensitive information in that query, you can ask Bing Chat Enterprise to optimize it. Your information is then protected there. It never leaves Duke’s private MS tenant, and the large language model is never trained on that data. It will enrich that data with context from the web. That is something that everyone can use today. If you go to Bing Chat today and are logged into your MS account, you will see a green button labeled “Enterprise” and it shows a protective shield. That’s the indicator that it is active.

Charley Kneifel – Normally that requires you to be logged into a Microsoft Edge web browser. There are special plug-ins that can be used with other browsers. Edge is the easiest way.

Jeff Volkheimer – There is a Copilot plug in for Windows machines, but it isn’t Enterprise enabled.

Evan Levine – Are there guidelines on how secure it is? Is this part of a MS agreement?

Charley Kneifel – All of Duke data, stored in any MS product, is protected and not supposed to leak. However, there may be data in there that the individual user has overshared, that another user shouldn’t see, but can because the individual overshared. That is an important distinction to understand.

Jeff Volkheimer – For the purposes of Bing Chat Enterprise, the data is safe.

Evan Levine – So for people here, it can be used like all other MS services?

Charley Kneifel – Including search using the company functionality and enterprise search functionality.

Harry Thakker – How does it know it’s Duke University? Is it the registration piece?

Charley Kneifel – It knows when you are logged in.

Michael Green – There have been previous conversations around equity issues and not giving everyone licenses to Chat GPT. Do we consider this a solution to that?

Charley Kneifel – No, I wouldn’t.

Evan Levine – Students don’t have access, correct?

Charley Kneifel –Students don’t have access to the early access program. Today we are discussing a purchase of 300 licenses for an early adopter program that is still being rolled out. [Subsequent to this meeting (January 2024) student access was enabled.]

Colin Rundel – What model is it based on?

Jeff Volkheimer – Bing Chat is technically GPT 4, not Turbo. It uses that model. They are probably tuned for the individual MS products.

Tracy Futhey – Matt, can you share any comments on tokens and things they can take in?

Matt Hirschey– Chat GPT 4 has more token window context that can go into the model. Tokens are a measure of words, roughly four tokens per word. It’s the way that the embedding is stored in non-dimensional space. GPT 4 has more context it can take in, it’s faster. Google released a new model yesterday that isn’t as good as GPT 4. GPT 4 is the gold standard.

Steffen Bass – Faculty has asked about access to Chat GPT. We can send them to Bing Chat Enterprise and then they get essentially Chat GPT 4?

Jeff Volkheimer – They are not going to get plug-ins, but they do get web access. It’s not a one for one service.

Charley Kneifel – And it’s protected, but it doesn’t know about all of your data.

Jeff Volkheimer – When we move into what MS previously called Copilot, that is broken into two sections. One is called M365 Chat, which is Office and Copilot, which is embedded in each Office product. If you are part of the early access program, you get access to these which allows you to run the large language models in a number of specific cases either in the Office products or against all the documents and correspondence in the Office suite. It is a double edge sword.

Charley Kneifel – It doesn’t always work like you would hope.

Matt Hirschey - My understanding of these large language models is that right now there are three different ways to get the model to know something about you. One is to fine tune a model. The second is that you will provide the immediate context in the prompt for the thing that you want. The intermediate is this Retrieval Augmented Generator, a RAG model, where you are taking your prompt, sending it to a vector database that has information, that it can then retrieve some context that is very specific for your query and then it returns it to you. Are any of those three in the context of when it says it knows about you. Is it doing this in the context of a prompt or is it building a vector database about you?

Jeff Volkheimer – With Bing Chat Enterprise you are providing the context in the conversation. For Copilot and M365 Chat, it is using MS Graph. We assume it is running a RAG behind the scenes.

Charley Kneifel – MS Graph makes your data more applicable to the model. You have to have your data mapped through. You don’t have a lot of control about how to build the vector databases. That’s the thing that is needed to make it more effective, in how you tune the prompt engineering. It is good for several things, like summarizing email or other tasks. You can get a summary of the meeting, while in the meeting.

Jeff Volkheimer – It can’t customize well. It’s still limited.

Charley Kneifel – I look at it as a smart search for MS 365. It does a better job of finding stuff for you or assembling what you already have.

Jeff Volkheimer – An example of what Copilot can do is in the presentation. It’s a good tool for office productivity. However, results feel very filtered. Context is still not perfect in terms of what it gives you back. For example, it can only tell who spoke in a meeting, not everyone who was in the meeting.

Danfeng Zhang - Any limitation on tokens per minute?

Jeff Volkheimer – There will be limits, but we are not sure of the amount at this time. There are a total of thirty responses allowed in a session.

Charley Kneifel – You can start a new session when you reach thirty.

Evan Levine – Students were not included in Bing Chat Enterprise, previously. Is that still the case?

Charley Kneifel – We will find out and get back with you. [Subsequent to this meeting (January 2024) student access was enabled.]

Colin Rundel - What does the interface look like? Is it similar to a Chat Bot?

Charley Kneifel – It depends on the app you are using with it.

Jeff Volkheimer – It appears differently in the Office suite.

Randy Haskin – Can a department share if it is internal to Duke?

Charley Kneifel – If you have a Teams channel, for example, and you have limited it to members of Fuqua, that will work. Someone from a different department would not have access. However, if you accidentally set the Teams channel up to be campus wide, I would be able to find your data.

Tracy Futhey – But you could also see the data without this—just discovering it in Teams.

Charley Kneifel – Yes, you could without it, but this makes it easier.

John Board – There is a problem where the enhanced search is uncovering files with information that should have been restricted.

Evan Levine – The system is not to blame, but you have a higher risk of exposure.

John Board – Correct.

Colin Rundel – Is the price set on this?

Charley Kneifel – As I understand it, the current price is $30 per user per month.

Matt Hirschey – I just asked Bing Chat how much Bing Chat Enterprise costs and it told me $5 per user per month.

Charley Kneifel – That’s not Copilot.

Steffen Bass – Is Bing Chat Enterprise capable of helping us write Python scripts?

Jeff Volkheimer – It should be.

Steffen Bass – Will we ever get Git Hub?

Colin Rundel – Git Hub is free for an educational account.

Charley Kneifel – If you are part of the covered entity, you can request access by reaching out to DHTS or OIT. If you are not part of the covered entity then you will need to wait until additional work is done to ensure that unintended access to PHI is minimized. Licensing is extremely limited in early access phase.

Evan Levine – What is the covered entity?

Charley Kneifel – That means you are available to provide services to the health system by looking at data. You have also agreed not to share data.

Victoria Szabo – end of meeting.

4:30 – 5:15pm: End of year celebration!