University of Michigan Says It's Not Selling Student Data to AI Companies

On Thursday morning, news broke that someone was going around selling student data from the University of Michigan to tech workers that build AI chatbot tech. An employee at Google DeepMind, the company’s AI research hub, said they’d gotten an offer for recordings of lectures, student discussions, and office hours, as well as essays written by seniors and grad students all available for a paltry licensing fee. Now, the University says it was all a misunderstanding, that students gave their consent, and there’s nothing to worry about.

Like It or Not, Your Doctor Will Use AI | AI Unlocked

Susan Zhang, an engineer at DeepMind, said that she’d received a sponsored LinkedIn message hawking the information, and offering a free sample of the University of Michigan data to prove its worth.

“I’m reaching out because, based on your profile, you may be working with Large Language models (LLM’s) or natural language processing,” the sales message said. “I wanted to let you know that the University of Michigan is licensing academic speech data and student papers that could be very useful for training or tuning LLM’s.”

The message offers data from 85 hours worth of lectures, discussion sections, and interviews for $15,595, a second set of 829 papers written by University of Michigan students across various disciplines for $12,595, or a discount package for both data sets at $25,000.

However, the message “was sent out by a new third-party vendor that shared inaccurate information and has since been asked to halt their work,” Colleen Mastony a University of Michigan spokesperson, said in an email. “No transactions or sharing of content occurred by the vendor. Student data was not and has never been for sale by the University of Michigan.” Mastony didn’t share details about who this vendor was, or what, exactly, was inaccurate about the information they offered.

The University may not be selling the data directly, but it is (or was) being offered for sale by an organization called Catalyst Research Alliance, which claims to partner the University of Michigan as well as North Carolina State University. The website offers a sample of the data set, which comes with an essay titled “The Democratic Inadequacies of the European Union,” and what appears to be a recording of a class discussion section.

Catalyst Research Alliance and North Carolina State University did not immediately respond to requests for comment.

According to Mastony, the recordings and the papers were contributed by student volunteers who participated in two decades-old research studies, and none of the data included students’ names or any other personally identifiable information “These particular papers and recordings have long been available for free to academics – again without any identifying information – and have been used as a tool to improve writing and articulation in education,” Mastony said.

“I think it’s worth pursuing which universities are selling student data and what the terms are,” Zhang told Gizmodo in a message on X. “Licensing is better than scraping data without attribution but the attribution pipelines here are likely only built halfway (aka original creators won’t see a dime, whereas the reseller who stores data will capture all the profits).”

Training large language models like the software that runs chatbots such as ChatGPT and Bard requires massive, clearly labeled data sets across various subjects and disciplines. While the University of Michigan data set is small, well-organized content on a narrow swath of subjects could be useful for tuning certain models, particularly tools designed for specific purposes related to academia, formal communication, or for training more general AIs to improve their performance on individual areas of subject matter expertise.

Update 02/15/2024, 5:45 p.m. ET: This story has been updated with comments from the University of Michigan.

Trending Products

Cooler Master MasterBox Q300L Micro-ATX Tower with Magnetic Design Dust Filter, Transparent Acrylic Side Panel…

$69.99

University of Michigan Says It’s Not Selling Student Data to AI Companies

Cooler Master MasterBox Q300L Micro-ATX Tower with Magnetic Design Dust Filter, Transparent Acrylic Side Panel…

ASUS TUF Gaming GT301 ZAKU II Edition ATX mid-Tower Compact case with Tempered Glass Side Panel, Honeycomb Front Panel…

ASUS TUF Gaming GT501 Mid-Tower Computer Case for up to EATX Motherboards with USB 3.0 Front Panel Cases GT501/GRY/WITH…

be quiet! Pure Base 500DX Black, Mid Tower ATX case, ARGB, 3 pre-installed Pure Wings 2, BGW37, tempered glass window

ASUS ROG Strix Helios GX601 White Edition RGB Mid-Tower Computer Case for ATX/EATX Motherboards with tempered glass…

Corsair 5000D Airflow Tempered Glass Mid-Tower ATX PC Case – Black

CORSAIR 7000D AIRFLOW Full-Tower ATX PC Case, Black

Bgears b-Voguish Gaming PC with Tempered Glass ATX Mid Tower, USB3.0, Support E-ATX, ATX, mATX, ITX. (Note: Fan NOT…

Phanteks (PH-EC360ATG_DWT01) Eclipse P360A Ultra-fine Performance Mesh, Mid-Tower case, Tempered Glass, Digital-RGB…

Corsair iCUE 4000X RGB Mid-Tower ATX PC Case – White (CC-9011205-WW)

Skillet Chicken Thighs – Spend With Pennies

Slow Cooker Turkey Soup

Podcast Episode #146: “The Impact of Alcohol on Women in Midlife” with Krysty Krywko

Roasted Beets – Spend With Pennies

Leave a reply Cancel reply

Compare items

Shopping cart