University data science news: Stanford to democratize AI; publishing is a pain

Laura Noren
4 min readApr 22, 2017

21 April 2017

If you’d like to get this content in your inbox every week, it’s part of the Data Science Community newsletter. Sign up here.

Street art by unknown artist. Gowanus, Brooklyn 2017. Photograph by Laura Norén.

Leslie Mitchell at NYU Langone Medical Center is building a synthetic genome from scratch. “We are no longer limited to the study of cells that are a product of evolution,” she explains. Geneticist’s ability to edit mammalian systems is ethically and philosophically daunting. “It is probably naive to rely on altruism. Even the best intentions can go awry. Technical limitations will only impede progress on building increasingly complex genetic systems for so long. I’m an advocate for total transparency…and an inclusive approach.” Indeed, her team includes labs around the world.

Yuan Ji, Oded Rozenbaum and Kyle Welch of George Washington University School of Business, scraped “1,112,476 employee ratings of 14,282 public firms in the period 2008–2015” from Glassdoor and found that employee ratings are a good predictor of SEC fraud violations. They hypothesize that when firms are under pressure, managers may pressure employees to meet targets, resulting in grumpy employees who write negative comments on Glassdoor. If the managers can’t squeeze enough value out of employees, they may try misreporting or other creative fraudulent behaviors to meet the firm’s goals.

Peter Szolovits at MIT CSAIL, a leading expert in using natural language processing in precision medicine applications, explained his goal. It’s not “trying to get all doctors to ‘work as well as the best’”. Instead, it is much better to get “the least-skilled doctors [to perform like] average” doctors. In a refreshingly nonchalant dismissal of IoT for healthcare practice, he reminded everyone that, “IoT today is full of security holes” and not at all “ready for prime time”. Then he captured this organizational sociologist’s heart by declaring that the unrealized potential of electronic health records is not technical, it is “institutional and policy-based.”

Arizona State University, the University of Houston, and the NSF have partnered to create an industry-academia research hub for neurotechnology called The BRAIN Center. The research will focus on improving patient outcomes for those with injuries to or degeneration of the central nervous system.

A Stanford University team has launched DAWN, a project to “democratize AI and machine learning”. Within the next five years they aim to “build out the toolbox that we believe will empower the 99.9 percent to build and deploy their own world-class data products, quickly and cheaply.”

Hahrie Han, a political scientist at UC-Santa Barbara, explains the March for Science. She knows that the disparate goals, “makes it harder to translate whatever happens in the march to political influence. And related to my points about centralization or decentralization, one of the challenges is what happens to the coalition afterwards? If they’re too disparate or fragmented, it could be harder to coalesce around shared goals.” Han goes on to point out that, “the thing that is most predictive of whether any pressure group is able to achieve its political goals is the extent to which it has relationships with political elites.” Charismatic scientists among us, this is the call to use that trait for the betterment of science by persuading as many politicians and voters as you can that science is worth funding.

The Gordon and Betty Moore Foundation will now require all grantees to make their grant-funded publications “openly available within 12 months of publication, either on the journal’s site or in an open access repository.” They allow grant funds to cover the cost of fees associated with open access publishing. Great decision, follow’s the Bill and Melinda Gates Foundation. I wonder what the conversation was like around the decision to cover what many consider to be rather ridiculous OA publishing fees. A related fee is the cost to publish data to an open data archive like Dryad, whose organizers found that 96 percent of their users do not budget for data publication fees. One-quarter of their respondents paid the publication fees on their personal credit cards and were not reimbursed.

Elsewhere in publishing problems, PubMed is now publishing funding information in its abstracts to make potential conflicts of interest more obvious.

Duke, Stanford, and Verily (an Alphabet company) have announced the first initiative of Project Baseline, which will recruit 10,000 participants and track their detailed health data over at least four years. The amount of data collected is extensive, consisting of: “repeat clinical visits; daily use of a wrist-worn investigational device and other sensors; and regular participation in interactive surveys and polls by using a smartphone, computer or call center.” This is similar to the Kavli Foundation’s HUMAN Project which is studying the lives of 10,000 New Yorkers 13 or older for “decades”. Both projects seem ethically dubious, though the Duke+Stanford+Verily initiative avoids working with minors, limits participants’ surveillant period to four years, and is planning to have participant feedback on conference calls throughout the term of the study.

--

--