University Data Sci News: Peer reviewing data, women & uni service, hackers want .edu emails

17 April 2017

Laura Noren
5 min readApr 17, 2017
“Ballerina in a Box” by French street artist JR. Photograph by Laura Norén. (I like street art.)

I write a round-up of data science news every week as part of the Data Science Community newsletter funded by the Gordon and Betty Moore and Alfred P. Sloan Foundations. Not everyone likes to read the whole newsletter; it’s long. Some people prefer to read blogs, not emails. Here’s this week’s digest of what universities are doing in data science.

A new study confirms what many already suspected, “female professors outperform men in terms of service — to their possible professional detriment.”

Arvind Narayanan (Princeton) released the “mother of all ad blockers” according to David Carroll.

The National Cancer Institute of the NIH with medical device maker Medtronic awardedUW-Seattle $3.6m to build personalized 3D maps of patient’s brains. This should help neurosurgeons carefully remove 100% of tumors and 0% of functional brain matter.

Yann LeCun, head of Facebook AI and a professor in NYU’s Data Science Masters program urged politicians to prepare regulatory structures for the waves of AI we will continue to experience. He noted the inequality that predictably occurs, “when you have a rapid technological advance, [then] you tend to see an increase in a concentration of wealth. AI is no different…politicians are refusing to recognize that this is a question to be addressed.” A goal he invites the rest of the AI field to adopt is development of safe, unbiased AI in a public-facing way. Meanwhile, at MIT they are training AI agents to terrifypeople. What could go wrong??

The University of Amsterdam is partnering with German firm Bosch to open an AI lab that will focus on computer vision.

UW-Seattle’s iSchool Data Lab has a theme for the next several years: data for social good. At the multi-disciplinary lab, students are learning to debunk false scientific findings, an unfortunate side effect of the publish or perish pressure of the tenure track and creating better search tools for scholarly literature, which may make the most relevant work easier to find, possibly adding new citations to older articles. The lab also sends students to work on open-data initiatives in city agencies and conducts original urban science research.

Fabiana Zollo and colleagues at Kellogg School of Business found that it only takes 50 likes, shares, or comments for a person placed in a new social media environment to behave in a completely polarized fashion.

One in three PhD students is at risk of developing a common psychiatric disorder like clinical depression or anxiety. What’s the culprit? Bad management and lack of a clear career path: “Organizational policies were significantly associated with the prevalence of mental health problems. Especially work-family interface, job demands and job control, the supervisor’s leadership style, team decision-making culture, and perception of a career outside academia are linked to mental health problems.”

Hackers have targeted universities, stealing email addresses and passwords possibly to get educational discounts on software or run scams. For unknown reasons, midwestern schools were targeted most heavily, “Topping the list is the University of Michigan…followed by Penn State University, the University of Minnesota, Michigan State University, Ohio State University and the University of Illinois.” NYU was vulnerable but not targeted in a hack that hit the vulnerability at other schools that was later claimed by Raspution, a known dark web nefarian.

Astrophysicist Katie Mack is particularly active, informative, and humorous on twitter, using it to communicate publicly (and defend herself against climate deniers). If this all sounds rather shallow, read her profile on vice to see how hard it is to do this kind of public science communication. She has received death threats as well as the insidious label of being unserious because she’s “big on twitter” which could make it harder to get a tenure-track job. In individually-rewarding status hierarchies like academia, it can be organizationally difficult to admit a new person whose status is out of the norm (for better or for worse).

Research at the Evolving Artificial Intelligence Laboratory at the University of Wyomingand Cornell University is investigating the susceptibility of computer vision to fraud by generating AI optical illusions that force recognition errors. “Adversarial data might help slip porn past safe-content filters. Others might try to boost the numbers on a cheque….researchers at Carnegie Mellon University built a pair of glasses that can subtly mislead a facial recognition system — making the computer confuse actress Reese Witherspoon for Russell Crowe”. Fooling an AI agent is in some ways far harder than fooling a human and in other ways much, much easier. Just think about all the different kinds of software testing that needs to be done before we can imagine dropping the humans from the AI loop. We need each other.

Jeff Clune at the University of Wyoming in Laramie has a process that can tell the difference between giraffes and gazelles 92 percent of the time, leaving the humans to track elusive zorillas (a much rarer species). Animal video in the link: spoiler alert no zorillas are featured in the video.

Todd Carpenter, the executive director of the National Information Standards Organization offers a meta-analysis of the guidelines in place to conduct peer reviews of data. We have written about the pressure scientists face to include their data along with article submissions, but there is rather confusing vacancy when it comes to understanding your responsibility as a peer-reviewer of others’ data. Should the data be reviewed as part of its insertion into a data repository and again by the peer reviewers of any papers being published? Should reviewers of article submissions be able to see what reviewers from repositories said?

Relatedly, the New England Journal of Medicine published an article about how to cite people they call “data authors”. In short, “to be cited as a data author, a person must have made substantial contributions to the original acquisition, quality control, and curation of the data, be accountable for all aspects of the accuracy and integrity of the data provided, and ensure that the available data set follows FAIR Guiding Principles, which instruct that the data and metadata meet criteria of findability, accessibility, interoperability, and reusability. Data authors are responsible for the integrity of the data set but are not responsible for the scientific or clinical conclusions of the analyses drawn from the data unless they were also listed as authors of the original manuscript.” This in many ways echoes standards that came out of Force 11.

Silvio Savarese, Assistant Professor of Computer Science, has been named the new permanent Director of Stanford’s Toyota Center for Artificial Intelligence Research.

Legal analytics research firm Premonition announced a data partnership with the NYU School of Law. Law is a field that could be revolutionized with better databases, search protocols, and predictive applications (though I do not advocate any kind of Minority Report applications). Getting students using a particular data product like Premonition could have all sorts of mutually beneficial impacts.

The iSchool, in conjunction with the Whitman School of Management, at Syracuse University announced it will offer a master’s degree in applied data science.

--

--