ITAC Meeting Notes November 9, 2023
Allen Building Boardroom
4:00 - 4:05pm: Announcements (5 minutes)

Victoria Szabo:  Approval of 9/28/23 minutes.

Next ITAC meeting is not until December, because of Thanksgiving break.

Tracy Futhey: We will have one more meeting this year, a short December meeting followed by our end-of-year reception.

Frank Tramble, VP for Communications, is an invited guest today for our first topic.

Frank Tramble: I’m excited to be here. On the marketing and communications side, there’s a slew of data-intensive things that we want to be more proactive in, so we can make data-driven decisions.

4:05 - 4:30pm: Migrating from Google Analytics to Matomo - Thomas Crinchlow (Duke University Libraries) (15-minute discussion, 10 minutes Q&A)
What it is: Matomo is an open-source tool adopted by Duke University Libraries for tracking and assessing library website usage.   
Why it’s relevant: This presentation will highlight our migration to Matomo, why and how we use web analytics, and some of our approaches to protecting our website users' privacy.

Thomas Crinchlow:
Tom introduced his plan to share the Library experience in migrating from Google Analytics to Matomo, including how and why they use web analytics at Duke Libraries.
Gathering data about how people use Library websites gives insights into usage patterns, including common problems users experience. This informs changes to websites and is combine with other information.  For example, we use event tracking to see how frequently links on a web page are used, over time replacing seldom-used links.

In this example (slide) we can see how usage of our research databases portal compares with other key pages, making it easier for us to prioritize changes.

David MacAlpine: Can I ask what the time scale is? Weekly? Monthly?

Thomas Crinchlow:  For that screenshot, 50,000 pageviess in a year--2021.
The Library started using Google Analytics in 2009 and expanding to 110 sites. Google Analytics went EOL this past July. In addition to privacy concerns we already had, there was no provided migration path to GA4.

Matomo has been on our radar for some time and our decision to move forward with it was based on a number of reasons:
•    It is open source and allows us to import historical data.
•    The ability to self-host gives us full control of our data, which protects patron’s privacy.
•    Needed developer support is minimal, after initial setup.
•    Broad and active community of 1.4 million sites. Most plugins are free.

Our evaluation of Matomo for replacement of GA began as a pilot project (August 2022-March 2023) with installation of Matomo tracking codes on 2 websites and migration of historical data from these websites.

After the pilot project, a team of 6 IT staff from University Libraries created a self-hosted instance of Matomo in April 2023. Stakeholders also identified 29 websites for tracking. We set up one hosted instance for the multiple sites, working on consistent naming conventions. We’ve had 1 million page views a month.

The libraries Google Analytics accounts had never been centrally-managed or audited.
As part of this project we did our first inventory of the GA sites and the people who had access to that data. This led to creation of a policy and workflow to more closely manage who has access to data.

Matomo lets us track data across all our sites in a way we hadn’t implemented with GA.  

Matomo also includes features to protect patron privacy this such as cookieless tracking that helps us comply with GDPR regulations.
Matomo makes is easy for stakeholders to create customized dashboards for viewing data, and reports on data usage. The the pilot project, Matomo-provided documentation, and feedback about ease of use for historical and current data have all been positives, as well as feedback from our peers at other institutions who are currently using Matomo.

The challenges have been around Google’s API, which throttles how quickly we’ve been able to migrate data. There is a learning curve to Matomo, and its UI struggles to display large date ranges.

The biggest remaining work with our stakeholders is to develop a policy for data retention and sunsetting data. We also plan to implement annual reviews of access.


Ken Rogerson: What were the options out there and how did you choose Matomo? When did it reach the point that it was the best option, or were there few options, and this was just better than GA?

Thomas Crinchlow: There were a few options, and my impression was that Matomo was one of the more fully-baked options, and had a longer track record than some of the others.  Also knowing that some of our peers were using it influenced our decision.  We’d been watching it since 2007 because we knew we had concerns about Google.

Steffan Bass: What do we use on our main Duke website?

Tracy Futhey:  We are still pretty deep into Google Analytics. This is one of the advantages of being in a decentralized environment, where a unit goes off and experiments with something that could bubble up and be of more use to the institution.

John Board: Even Google doesn’t support translation from GA3 to GA4. If you care about historical data at all.

Thomas Crinchlow: The access they give to historical data will end quickly, so if you want to save it, you should act now.

Harry Thakkar: Our group is looking at it in OIT.

Victoria Szabo: I have a meta question, about how does this work, in an era when so many analytics web pages are generated dynamically. What are you able to capture, especially when it’s the library and you are the frontend to so many databases?

Thomas Crinchlow: For most of our sites, even if it’s a search result generated on the fly, it still ends up having some element unique to the URL path; we can still see visitors, events, where they came from, page title..

Dave MacAlpine: I wanted to ask, what is the advantage between these JavaScript trackers vs the Apache-old-school-log-analyzing type of tracking.  

Tim McGeary: Before we moved to GA, we used a tool called Sawmill that was excruciatingly painful.

Thomas Crinchlow: With Matomo, we can do some caching to generate reposts on the fly, which does it a little faster.

Angela Zoss: For tracking, logging, and displays, there are pre-built in visualizations and grouping.

Michael Greene: From a University Archives perspective, are there other audiences we should be talking to?

Thomas Crinchlow: We need to talk more with University Archives. We haven’t gotten to the point of archiving patron interactions with those websites.  

Frank Tramble:  From a marketing perspective, one of the things that that I’m encouraging our team and other communicators is to start using data to inform the actual e-mails and marketing pieces that are going out from the University.  Getting the University to be more of a user journey-based kind of system. Like if you’re going to a site and you’re on a list, being able to give you content based on your own personalization like this.  Does Matomo help you get further down into getting usable information about what is happening with individuals?

Thomas Crinchlow: We do send marketing e-mails and we are tracking it so we know which links in the e-mails they’ve clicked.

Frank Tramble: This is broader than just the internal community, external, people who engage with our sites….

Thomas Crinchlow: One thing I like about Matomo better than GA is that I have an easier time seeing an individual visit.  It gives me better insight, within a timespan.  

Angela Zoss: If the marketing e-mail had a unique code, you could better filter out the business that came from that e-mail.

Thomas Crinchlow: Right, and it supports campaigns.

Tracy Futhey: What I’m hearing is that one use is typical website analytics and another function is much more around the individual user, privacy, how we comply with GDPR, not being tracked. Then the third one is website hygiene. Is that characterization of it in three different regards accurate?  And if it is, is it distinctive or different or better than what we see in GA?

Thomas Crinchlow: I don’t know that it’s better, just slightly easier to use.  

Tracy Futhey: More usable. So maybe none of what it performs in those 3 capacities is distinctively groundbreaking, but perhaps better.

Thomas Crinchlow: With data sampling, I think Google’s estimates were sometimes off.

Victoria Szabo:  Can you talk about the hosting process? You said you were going to host it on your own stack, then presumably you’re going to expand. Is this something where its extensible, or do you create little instances for different contexts?

Thomas Crinchlow: We just have the one instance, we just add the site with the tracking code.  If we wanted to spin up a separate instance, we could.

Victoria Szabo: Would we ever want to do that, or better to be in one place?

Thomas Crinchlow: I was hearing in your question maybe having University-wide Matomo?

Victoria Szabo: Yes, or the idea that people might want to buy-in.

Matthew Hirschey: Regarding historic data and how long you want to keep it around: Have you ever considered scrubbing the data and deciding what is clean and not safe?  Balance between throwing away vs keeping all?

Thomas Crinchlow: For some of the sites where we chose not to keep historic data, we did some scrubbing, spreadsheets with page views and visits. The classic catalog, where we are getting usage and where it’s coming from.
Tim McGeary: For other data contexts, we are doing that because we want to see how things change over time. It’s helpful because we can see ebbs and flows in disciplines and use of our resources. We do have use cases where historical data is important.

Thomas Crinchlow: Web analytics is not the only data that we have.

Robert Wolport: I’m a little nervous about usage as the only metric about deciding what is useful.

Angela Zoss: That’s fair. I think our special collections would agree that usage is not the only metric of success.  There certain things we provide online that anonymous tracking is the only measure of usage.  Value is certainly a much broader conversation here.

Victoria Szabo: Thank you. Next up, Matt Hirschey.

4:30 - 5:00pm: Let’s Talk About Digital You: A Technical and Ethical Exploration of a DataCentric World – Matthew Hirschey, Ph.D. (20-minute presentation, 10 minutes Q&A)
What it is: The course UNIV103 at Duke University, titled "Let's Talk About Digital You: A Technical and Ethical Exploration of a Data-Centric World," explores digital identity, technology, and ethics. Offered in Spring 2024, it delves into topics like social media, AI in art and healthcare, privacy, and more through a blend of lectures, workshops, and discussions.   
Why it’s relevant: The course aims to provide a multifaceted understanding of how digital technologies intersect with societal and individual concerns. In today’s digital age, where technology is intertwined with every facet of our lives, understanding the complexities and implications of these technologies is essential. Whether you’re a tech enthusiast, a future business leader, or a concerned global citizen, this course is your gateway to exploring the cutting-edge technology that shapes our world.  

Matt Hirschey:

Matt describes his role as a co-convener of one of the University Courses (UNIV103), being offering in the spring.

UNIV103 has an intentional design. First, since scaling education is hard, we think about how we can share broadly and take advantage of the educational opportunities that are already happening.  Second, since the educational material changes faster than you can lecture, so we thought about “future proofing” and how we can navigate the rapidly evolving landscape.  Finally, we utilized the DQ Certificate framework, which has paired lectures, one where you learn the technology, another where you talk about ethics.  Having a modular structure future-proofs the lectures on technology—we can shuffle modules around, or de-prioritize, where topics are less relevant. Or anticipate what will be popular the next year, and drop that in.

We planned to run in Fall of 2023, but due to low student sign up we decided to give it more time and postpone to the spring. We did some marketing. We ran some events this fall—“education in disguise”-- where we promoted this class. We also added a quantitative science designation.  As a result of our efforts, we now have 80 upperclass students registered.  Freshmen registration will be opening up on Monday.

Let me explain how the course will work, though a series of two-week topic focus areas: In week 1, we discuss the technology. In week 2, we discuss the implications of using the technology (ethics, policy, environmental and societal implications.)

The course is called “Let’s Talk about Digital You” with the idea that the student is the at the center of this. The way the student interacts with the technology, the way the technology then facilitates students having interactions with each other. All of the topics we chose have a social/individual tension and balance and through them.

The first topic we start out with is social, about how technology enables students to communicate and interact in this world.  

After that, we dive into privacy and security.  We then have two integrated examples, one on how algorithmic decision making is applied in business, the second on AI health.

We then finish with this idea on the way technology influences users—“I Couldn’t Stop Scrolling.”
The course meets once a week in a 2.5 hour block. The first hour will be a lecture, then the students will break into small groups. With the small groups, the tech part will be things like code-a-longs and lab-type sessions.  For the implications lectures, the small groups will be centered around case studies and discussions.  The flow will be the same.

Here is our working list of the faculty who are involved (slide.)  The lectures for the courses are led by Duke faculty in related disciplines. For example, for “I made Generative Art” it will be from a faculty in AAHVS; for “I Made an Essay Using ChatGPT,” by a faculty member in the English Department.

Mark Palmeri: What is the learning objective for the class? How is it graded? What are the learning objectives? An easy A for QS?

Matt Hirschey:  There are quantitative criteria that the students need to check off, but it is more like “did I do it or did I not?”  We have a final project for this course, and TAs for all the small groups. The TAs will be grading the final project, a paper that requires students to describe a technology of their choosing with implications.

Steffan Bass: There is a curriculum revamp going on. Have you talked to curriculum committee about their thinking on the placement of these University Courses in a future curriculum?

Matt Hirschey: No, we’ve been heads-down building this thing, but after we have it, with some measure of success and feedback on the course then I think it will be easy to say whether it fits into this vision.

JoAnne Van Tuyl: I can testify to the fact that at the last A&S meeting, at the table where I was sitting, we started talking about how technology is affecting everything we do, and that their ought to be some course about this.  The conversation has come up.

Matt Hirschey: There will be an asynchronous version of this course available for anyone to take. That’s our summer project.

Andy Li: Non-CS majors will want to take this. But, as a CS person, would you consider putting this under the ethical electives of the CS curriculum? That would incentivize more students to take this course.

John Board: Economics and CS are about 45% of the student body.

Matt Hirschey: Great suggestion. We had early ideas of cross-listing, but we dropped when we postponed this course.  We will revisit.

Zoe Tishaev: This is fantastic. I love that you have an hour speaking and then into small groups.

Victoria Szabo:  Is this going to become a gateway for the DQ Certificate?

Matt Hirschey:  It does. You can take this course instead of the core course. All the rest of the DQ Certificate will be the same.

Mark Palmeri: What is the technical backbone to get all these things ready to go?  So they can hop on and do it, not going to the Co-Lab for 5 hours at a time getting their laptop setup….

Matt Hirschey: Exactly. That has been a tremendous lift that the Co-lab has already helped extensively with.  We’ve been leveraging EdSTem, which will have a lot of these integrated slides and video as well.

Tracy Futhey: As a reminder, a shout out from a past Co-Lab presentation. The Co-lab is now offering office hours. There are experienced students and staff who are there to answer questions, kind of like a writing center type of model.

Victoria Szabo: Thank you.  Last topic of the day, the Emergency Planning Scenario.

5:00 - 5:15pm: Emergency Planning Scenario - John Board , Ph.D. (10-minute presentation, 5 minutes Q&A)
What it is: Duke leadership periodically undertakes tabletop emergency exercises to promote coordinated and effective responses to various threatsDifficu the campus might face.  Recently, one of these exercises focused on a utility outage that had unexpected (for many people in the room) IT implications.  John will discuss this exercise.
Why it’s relevant: Maintaining a practiced and agile leadership posture in the face of various threats is an essential part of maximizing the safety of our campus.  Exercises such as this compel leadership to think carefully about the coordination and communication required to respond to unique situations, and help identify strengths and challenges in our overall security posture.

John Board:  The University has a highly functioning emergency management operation that helps prepare Duke for a wide variety of terrible things that might happen.  At a regular emergency management drill a couple of weeks ago the simulation was a utility incident based on a February 2017 occurrence at UNC Chapel Hill when OWASA had a water break and there was zero water pressure on campus.  They sent students home, and the health system wasn’t impacted. Simultaneously, the water was not chemically drinkable.

Our scenario was similar but worse, and Durham-wide.  Patients in the hospital were identified as priority one, but simultaneously the 5,000 students on campus without running water.  The number one use of water on Duke’s campus was the chiller plants, and this scenario happened in August. The Data Center runs on the chiller plants as well. It’s the only way the data center doesn’t melt down.  We have to go on fire watch.

In the scenario, the City of Durham got its water supply fixed, and sent out “all clear notice.” But we were dealing with communications chaos because we would have had to flush water through the Duke system before safe again.  

Tracy Futhey: One of the things they pointed out was that in an office building the temperature goes up a degree every hour, but in the data center, it’s much faster.

David MacAlpine: Who comes up with these?

John Board: The Directory of Emergency Management.  For IT-related ones, it’s usually an external consultant.

Victoria Szabo: Thanks everyone.