“Exploring the Power of Taxonomy and Embedding in Text Mining”
The real-world big data are largely dynamic, interconnected and unstructured text. It is highly desirable to transform such massive unstructured data into structured knowledge. Many researchers rely on labor-intensive labeling and curation to extract knowledge from text data. Such approaches, unfortunately, may not be scalable, especially when such texts are domain-specific and nonstandard (such as social media). We envision that massive text data itself may disclose a large body of hidden structures and knowledge. Equipped with domain-independent and domain-dependent knowledge-bases, we can explore the power of massive data to transform unstructured data into structured knowledge. In this talk, we introduce a set of methods developed recently in our group for such an exploration, including joint spherical text embedding, discriminative topic mining, taxonomy construction, and taxonomy-guided knowledge mining. We show that data-driven approach could be promising at transforming massive text data into structured knowledge.
“Network Heterogeneity on Graph Neural Networks”
Graph neural network (GNN), as a powerful graph representation technique based on deep learning, has shown superior performance and attracted considerable research interest. Basically, the current GNNs follow the message-passing framework which receives messages from neighbors and applies neural network to learn node representations. However, previous GNNs mainly focus on homogeneous graph, while in reality, the real-world graphs usually are far from homogeneity. Here we first examine the various types of network heterogeneity, including node and link type heterogeneity, neighborhood heterogeneity, fragment heterogeneity, temporal heterogeneity, and structure heterogeneity. We then discuss the implications and methods to overcome these heterogeneities.
More than one year into the pandemic, the Penn State Center for Socially Responsible AI hosts world-class researchers, industry professionals, and non-profit organizations for a single day event in which we introspect and discuss on how AI research has responded in the face of COVID-19, how has it changed, and how is it likely to change in the coming future (and the role that AI researchers can expect to play in that change).
“Disinformation, Hate-speech, and Manipulation during COVID-19 and the Election”
The COVID-19 pandemic is sometimes referred to as a disinfodemic, as the number of disinformation stories was much higher than in previous crises. We ask, who is spreading this disinformation and to what ends? This talk describes the process of analyzing social media data related to COVID-19 and the U.S. 2020 election, the analytic pipeline, and key results. This pipeline combines high dimensional network analytic, machine learning, and computational linguistics techniques. The overall framework, referred to as BEND, addresses who (what kind of actor) is doing what (the maneuvers) to whom (the target) with what impact. The BEND maneuvers informed by work in communication, psychology and network science, move beyond the traditional 4Ds of information warfare (dismiss, distort, dismay and distract) to a more comprehensive and operationalized set of 16 maneuvers. These maneuvers are then combined into complex patterns to effect changes such as changes in polarization or to orchestrate a political movement or protest.
“Mobility Networks for Modeling the Spread of COVID-19: Explaining Infection Rates and Informing Reopening Strategies”
In this talk, Dr. Leskovec will demonstrate how fine-grained epidemiological modeling of the spread of Coronavirus -- predicting who gets infected at which locations -- can aid the development of policy responses that account for heterogeneous risks of different locations as well as the disparities in infections among different demographic groups. He will demonstrate the use of U.S. cell phone data to capture the hourly movements of millions of people and model the spread of Coronavirus from among a population of nearly 100 million people in 10 of the largest U.S. metropolitan areas. Dr. Leskovic will show that even a relatively simple epidemiological model can accurately capture the case trajectory despite dramatic changes in population behavior due to the virus. He also estimates the impacts of fine-grained reopening plans: he predicts that a small minority of superspreader locations account for a large majority of infections, and that reopening some locations (like restaurants) pose especially large risks. He also explains why infection rates among disadvantaged racial and socioeconomic groups are higher. Overall, his model supports fine-grained analyses that can inform more effective and equitable policy responses to the Coronavirus.
“Measuring Economic Development from Space with Machine Learning”
Recent technological developments are creating new spatio-temporal data streams that contain a wealth of information relevant to climate adaptation strategies. Modern AI techniques have the potential to yield accurate, inexpensive, and highly scalable models to inform research and policy. A key challenge, however, is the lack of large quantities of labeled data that often characterize successful machine learning applications. In this talk, Dr. Ermon will present new approaches for learning useful spatio-temporal models in contexts where labeled training data is scarce or not available at all. He will show applications to predict and map poverty in developing countries, monitor agricultural productivity and food security outcomes, and map infrastructure access in Africa. These methods can reliably predict economic well-being using only high-resolution satellite imagery. Because images are passively collected in every corner of the world, these methods can provide timely and accurate measurements in a very scalable end economic way, and could significantly improve the effectiveness of climate adaptation efforts.
“Creating, Weaponizing, and Detecting Deep Fakes”
The past few years have seen a startling and troubling rise in the fake-news phenomena in which everyone from individuals to nation-sponsored entities can produce and distribute misinformation. The implications of fake news range from a misinformed public to an existential threat to democracy and horrific violence. At the same time, recent and rapid advances in machine learning are making it easier than ever to create sophisticated and compelling fake images, videos, and audio recordings, making the fake-news phenomena even more powerful and dangerous. I will provide an overview of the creation of these so-called deep-fakes, and I will describe emerging techniques for detecting them.
“Physics-Guided AI for Learning Spatiotemporal Dynamics”
Applications such as public health, transportation, and climate science often require learning complex dynamics from large-scale spatiotemporal data. While deep learning has shown tremendous success in these domains, it remains a grand challenge to incorporate physical principles in a systematic manner to the design, training, and inference of such models. In this talk, Dr. Yu will demonstrate how to principally integrate physics in AI models and algorithms to achieve both prediction accuracy and physical consistency. She will showcase the application of these methods to problems such as forecasting COVID-19, traffic modeling, and accelerating turbulence simulations.
“Data Science for Social Equality”
Our society remains profoundly unequal. This talk discusses how data science and machine learning can be used to combat inequality in health care and public health by presenting several vignettes about pain, COVID, and women's health.
“Steps Toward Trustworthy Machine Learning”
How can we trust systems built from machine learning components? We need advances in many areas, including machine learning algorithms, software engineering, ML ops, and explanation. This talk will describe our recent work in two important directions: obtaining calibrated performance estimates and performing run-time monitoring with guarantees. I will first describe recent work with Kiri Wagstaff on region-based calibration for classifiers and work with Jesse Hostetler on performance guarantees for reinforcement learning. Then, I'll review our research on providing guarantees for open category detection and anomaly detection for run-time monitoring of deployed systems. I'll conclude with some speculations concerning meta-cognitive situational awareness for AI systems.