Michigan Data Science Team wrangles big data

MDST brings together students from many fields to get their hands dirty with real data science problems and tools.

MDST Quicken Loans Enlarge
MDST members who won the Quicken Loan Lending Strategies Prediction Challenge visit Quicken Loans headquarters in Detroit. Back row left to right: Reddy Rachamallu, Alexandr, Alex, Mark Nuppnau, Brian Ball Front row left to right: Jingshu Chen, Patrick, Alex’s wife Kenzie, Yvette Tian, Mike Tan, and Catherine Tu.

The impact of modern data science can be felt in every field of research – sociologists, ecologists, computer scientists, biologists, and more are all in search of ways to better use big data to solve big problems.

The Michigan Data Science Team brings together students from all of these fields to “get their hands dirty” with real data science problems and tools. The team gives members a place to learn from experts, form groups to tackle data science challenges, and do research that matches their interests. In the 2018-19 school year, Computer Science and Data Science undergrad Wesley Tian will be leading the organization as president, with plans to focus the group’s activities and provide a better learning experience for new members.

The data science team attracts hundreds of students each year with its educational events and opportunities to take on real-world problems. As members of the team, students are free to form groups to tackle challenges and collaborate on research, with data science grad students on hand to help them organize. The 2017-18 school year was a record-breaking one for MDST’s membership. They welcomed 367 new members, nearly a quarter of whom were new to data science.

“Some people know data science just as a buzzword,” says Tian. “But this is a chance to actually get your hands on it – complete a project, learn the skills that you need, write the code, and work with a team.”

This year, MDST offered its second annual Data Science Tutorial Series, which consisted of 12 hours of instruction, taught entirely by volunteer graduate students, and attracted over 50 beginner data science students. The tutorials covered essential data science skills, such as data management, regression and classification models, and data visualization and communication.

One of the group’s activities that draws the most participants is its involvement in data science challenges, competitions, and projects run by companies and organizations around the country.

Students collaborated with the city of Detroit on two projects. The Vehicle Fleet Maintenance project had students predict maintenance needs for the city’s vehicle fleet, including include police cars, fire trucks, ambulances, and waste management trucks. The Blight Compliance Prediction project challenged students to produce insights about their Blight ticketing program. They categorized types of property owners by their ticket compliance patterns, investigated theories about what best motivates people to pay their tickets, and evaluated how these findings could better inform policy decisions.

The team also participated in several data science competitions and hackathons. These events allow students to develop their data science skills in a low-stakes environment, and gives team members a chance to work collaboratively. The students typically take on two types of events: prediction challenges, where students develop a predictive model based on provided data; and data hackathons, where students have up to two days to perform an open-ended analysis of some dataset.

Tian participated in one of these competitions, the ASSISTments Data Mining Competition, funded by an NSF initiative to help spur progress in educational research using big data. Tian and his team used educational data from ASSISTments, an intelligent math tutoring system for middle school students, to make long term predictions about their careers.

“Our goal was to use that data from the math test they took when they were in middle school and predict if they would end up in a STEM field after graduation,” says Tian. “The goal was to see if they can better structure classes or this math test to encourage students to enter those fields.”

MDST placed third, and winners published their work in the Journal of Educational Data Mining.

As president, Tian plans to give more continuity to the group’s activities and design the tutorial series to pair with a certain challenge each semester.

“Dialing back the range of challenges will help us give more structure to the experience of the team overall,” he says. “So not only do you learn new skills, but you’re actually putting them to the test.”

Groups of students in MDST competed in a number of prediction challenges and data hackathons this year, with a range of difficulty and subject matter:

Quicken Loans Lending Strategies Prediction Challenge
The goal of this competition was to create a model that would predict whether potential clients would end up getting a mortgage based on the loan product originally offered to them. This competition was organized by MDST, in collaboration with Quicken Loans.

Baltimore Ravens Free Agent Prediction Challenge
In this competition, student teams at Michigan used historical free agent data to predict the value of new contracts signed in the 2018 free agency period. These predictions were evaluated against the actual contracts as they were signed. This competition was organized by MDST, in collaboration with the Baltimore Ravens and the Michigan Sports Analytics Society (MSAS).

Parkinsons Digital Disease Biomarkers Challenge
In the Parkinson’s Biomarker Challenge, student teams identified signs of Parkinsons disease by extracting informative features from cell phone accelerometer data. MDST participated in one track of the competition, and scored in 4th place internationally.

NBA Hackathon
The NBA Hackathon is an annual data hackathon sponsored by the NBA, held in New York City in September. This year, MDST and the Michigan Sports Analytics Society (MSAS) sent a team to participate in the business analytics track of the NBA Hackathon. Their submission included a model which predicts the entertainment value of individual NBA games based on the participating players and teams, using several years of NBA data.

Yale DataHack
This three-day challenge was hosted by the Yale Institute for Network Science, and featured data challenges from many fields, including forestry, public education, business development, online research, and biomedical research. Student participants came from many schools, forming teams of up to four students to solve challenges submitted by sponsors.

In this hackathon, hosted by the Minne Analytics corporation in Minneapolis, MDST competed against student teams from around the country to build predictive models of patient outcomes using data from Type II diabetes patients. They were judged by a panel of experts on the actionability of their results.

Michigan Sports Analytics Hackathon
This was a one-day hackathon on the University of Michigan campus, sponsored by MSAS and the Exercise and Sports Science Initiative (ESSI). Several MDST students participated and built models that analyzed data from the UM Field Hockey team. MDST additionally organized introductory tutorials for the event.