Join David G. Rand, Erwin H. Schell Professor and Professor of Management Science and Brain and Cognitive Sciences, MIT for an upcoming discussion in the Talk Series on Frauds and Fakes. This lecture is free and open to the Penn State community.
“Exploring the Power of Taxonomy and Embedding in Text Mining”
The real-world big data are largely dynamic, interconnected and unstructured text. It is highly desirable to transform such massive unstructured data into structured knowledge. Many researchers rely on labor-intensive labeling and curation to extract knowledge from text data. Such approaches, unfortunately, may not be scalable, especially when such texts are domain-specific and nonstandard (such as social media). We envision that massive text data itself may disclose a large body of hidden structures and knowledge. Equipped with domain-independent and domain-dependent knowledge-bases, we can explore the power of massive data to transform unstructured data into structured knowledge. In this talk, we introduce a set of methods developed recently in our group for such an exploration, including joint spherical text embedding, discriminative topic mining, taxonomy construction, and taxonomy-guided knowledge mining. We show that data-driven approach could be promising at transforming massive text data into structured knowledge.
“Network Heterogeneity on Graph Neural Networks”
Graph neural network (GNN), as a powerful graph representation technique based on deep learning, has shown superior performance and attracted considerable research interest. Basically, the current GNNs follow the message-passing framework which receives messages from neighbors and applies neural network to learn node representations. However, previous GNNs mainly focus on homogeneous graph, while in reality, the real-world graphs usually are far from homogeneity. Here we first examine the various types of network heterogeneity, including node and link type heterogeneity, neighborhood heterogeneity, fragment heterogeneity, temporal heterogeneity, and structure heterogeneity. We then discuss the implications and methods to overcome these heterogeneities.
More than one year into the pandemic, the Penn State Center for Socially Responsible AI hosts world-class researchers, industry professionals, and non-profit organizations for a single day event in which we introspect and discuss on how AI research has responded in the face of COVID-19, how has it changed, and how is it likely to change in the coming future (and the role that AI researchers can expect to play in that change).
“Disinformation, Hate-speech, and Manipulation during COVID-19 and the Election”
The COVID-19 pandemic is sometimes referred to as a disinfodemic, as the number of disinformation stories was much higher than in previous crises. We ask, who is spreading this disinformation and to what ends? This talk describes the process of analyzing social media data related to COVID-19 and the U.S. 2020 election, the analytic pipeline, and key results. This pipeline combines high dimensional network analytic, machine learning, and computational linguistics techniques. The overall framework, referred to as BEND, addresses who (what kind of actor) is doing what (the maneuvers) to whom (the target) with what impact. The BEND maneuvers informed by work in communication, psychology and network science, move beyond the traditional 4Ds of information warfare (dismiss, distort, dismay and distract) to a more comprehensive and operationalized set of 16 maneuvers. These maneuvers are then combined into complex patterns to effect changes such as changes in polarization or to orchestrate a political movement or protest.
“Mobility Networks for Modeling the Spread of COVID-19: Explaining Infection Rates and Informing Reopening Strategies”
In this talk, Dr. Leskovec will demonstrate how fine-grained epidemiological modeling of the spread of Coronavirus -- predicting who gets infected at which locations -- can aid the development of policy responses that account for heterogeneous risks of different locations as well as the disparities in infections among different demographic groups. He will demonstrate the use of U.S. cell phone data to capture the hourly movements of millions of people and model the spread of Coronavirus from among a population of nearly 100 million people in 10 of the largest U.S. metropolitan areas. Dr. Leskovic will show that even a relatively simple epidemiological model can accurately capture the case trajectory despite dramatic changes in population behavior due to the virus. He also estimates the impacts of fine-grained reopening plans: he predicts that a small minority of superspreader locations account for a large majority of infections, and that reopening some locations (like restaurants) pose especially large risks. He also explains why infection rates among disadvantaged racial and socioeconomic groups are higher. Overall, his model supports fine-grained analyses that can inform more effective and equitable policy responses to the Coronavirus.
“Measuring Economic Development from Space with Machine Learning”
Recent technological developments are creating new spatio-temporal data streams that contain a wealth of information relevant to climate adaptation strategies. Modern AI techniques have the potential to yield accurate, inexpensive, and highly scalable models to inform research and policy. A key challenge, however, is the lack of large quantities of labeled data that often characterize successful machine learning applications. In this talk, Dr. Ermon will present new approaches for learning useful spatio-temporal models in contexts where labeled training data is scarce or not available at all. He will show applications to predict and map poverty in developing countries, monitor agricultural productivity and food security outcomes, and map infrastructure access in Africa. These methods can reliably predict economic well-being using only high-resolution satellite imagery. Because images are passively collected in every corner of the world, these methods can provide timely and accurate measurements in a very scalable end economic way, and could significantly improve the effectiveness of climate adaptation efforts.
“Creating, Weaponizing, and Detecting Deep Fakes”
The past few years have seen a startling and troubling rise in the fake-news phenomena in which everyone from individuals to nation-sponsored entities can produce and distribute misinformation. The implications of fake news range from a misinformed public to an existential threat to democracy and horrific violence. At the same time, recent and rapid advances in machine learning are making it easier than ever to create sophisticated and compelling fake images, videos, and audio recordings, making the fake-news phenomena even more powerful and dangerous. I will provide an overview of the creation of these so-called deep-fakes, and I will describe emerging techniques for detecting them.
“Physics-Guided AI for Learning Spatiotemporal Dynamics”
Applications such as public health, transportation, and climate science often require learning complex dynamics from large-scale spatiotemporal data. While deep learning has shown tremendous success in these domains, it remains a grand challenge to incorporate physical principles in a systematic manner to the design, training, and inference of such models. In this talk, Dr. Yu will demonstrate how to principally integrate physics in AI models and algorithms to achieve both prediction accuracy and physical consistency. She will showcase the application of these methods to problems such as forecasting COVID-19, traffic modeling, and accelerating turbulence simulations.
“Data Science for Social Equality”
Our society remains profoundly unequal. This talk discusses how data science and machine learning can be used to combat inequality in health care and public health by presenting several vignettes about pain, COVID, and women's health.
“Steps Toward Trustworthy Machine Learning”
How can we trust systems built from machine learning components? We need advances in many areas, including machine learning algorithms, software engineering, ML ops, and explanation. This talk will describe our recent work in two important directions: obtaining calibrated performance estimates and performing run-time monitoring with guarantees. I will first describe recent work with Kiri Wagstaff on region-based calibration for classifiers and work with Jesse Hostetler on performance guarantees for reinforcement learning. Then, I'll review our research on providing guarantees for open category detection and anomaly detection for run-time monitoring of deployed systems. I'll conclude with some speculations concerning meta-cognitive situational awareness for AI systems.
“Political Polarization and International Conflicts through the Lens of NLP”
In this talk, I will summarize two broad lines of NLP research focusing on (1) the current U.S. political crisis and (2) the long-standing international conflict between the two nuclear adversaries India and Pakistan.
“Behavior Change for Social Good Using AI”
Advances in technologies and interface design are enabling group activities of varying complexities to be carried out, in whole or in part, over the internet, with benefits to science and society (e.g., citizen science, Massive Online Open Courses (MOOC) and questions-and-answers sites). The need to support these highly diverse interactions brings new and significant challenges to AI, including how to provide incentives that keep participants motivated and productive; how to provide useful, information to system designers to help them decide whether and how to intervene with the group’s work; and how to evaluate the effects of AI interventions on the performance of individuals and the group. I will describe ongoing projects in my lab that address these challenges in two socially relevant settings – education and citizen science – and discuss potential ethical issues that arise from using AI for behavior change.
"COPs, Bandits, and AI for Good"
In recent years, there has been an increasing interest in applying techniques from AI to tackle societal and environmental challenges, ranging from climate change and natural disasters, to food safety and disease spread. These efforts are typically known under the name AI for Good. While many research works in this area have been focusing on designing machine learning algorithms to learn new insights and predict future events from previously collected data, there is another domain where AI has been found to be useful, namely resource allocation and decision making. In particular, a key step in addressing societal and environmental challenges is to efficiently allocate a set of sparse resources to mitigate the problem(s).
"Statistical Methods for Biomedical Informatics"
Increasingly, researchers are turning to statistics and machine learning methods to help improve clinical outcomes and make sense of complex data. Yet traditional statistical methods are often ill-equipped to handle these new settings. In this talk, I will discuss three common problems in biomedical informatics: (1) data privacy (2) sequence counting (e.g., microbiome sequence or RNA-seq) and (3) clinical decision support tools. With respect to data privacy, I will discuss the current state of the art for statistical data privacy and several methods I developed that aim to balance the competing concern between scientific advancement and safeguarding of personal information. With respect to sequence counting, I will present recent work using Bayesian partially identified models to overcome compositional limitations that are common to many other methods in the literature. With respect to clinical decision support tools, I will discuss recent work using pool-based active learning to create personalized bacteremia risk models using partially-labeled data as bacteremia (bacteria present in the blood) has an imperfect diagnosis process due to possible contaminants.
"Responsible AI: Thinking Beyond Data and Models"
The last decade has seen tremendous growth in artificial intelligence (AI) capabilities and its wide-spread adoption in society. Given the impact they have on our social lives, there have also been research on the fairness, accountability, and ethical values that underlie these technologies. While this line of research has gotten great attention in recent years, a majority of this work focuses primarily on mathematical interventions on the often opaque algorithms or models and/or their immediate inputs (data) and outputs (predictions). Such oversimplified mathematical interventions abstract away the underlying societal context where models are conceived, developed, and ultimately deployed. In this talk, I will discuss two strands of my recent work attempting to look beyond the data and models. First, I will discuss a complex systems theory based approach towards modeling societal context that accounts for its dynamic nature, including delayed impacts and feedback loops, and how to bring the expertise from marginalized communities into that process. Second, I discuss how the current literature on algorithmic fairness is rooted in Western concerns, histories, and values, and how this limits its portability to other geographies and cultures, especially in the Global South. In particular, I will discuss our recent work on re-imagining algorithmic fairness for the Indian context.
"Statistical Methods for the Analysis of Sequence Count Data"
Justin Silverman is an Assistant Professor in the College of Information Science and Technology at Penn State University. Justin completed both an M.D. and Ph.D. at Duke University. His Ph.D. work was done under the mentorship of Dr. Lawrence David and Dr. Sayan Mukherjee. Justin’s dissertation work focused on longitudinal modeling and experimental design of host-associated microbiota surveys.
"Digital Libraries and Research Data Management"
Research data is the bedrock of scientific research. From the same data, multiple scientists can draw different conclusions. Sharing research data is thus of paramount importance. Additionally, if we can share and reuse research data, we can perform comparative studies over data obtained by different research projects, design different applications of the data, and extract maximum benefits for the cost incurred to obtain the data. Currently, although there are repositories where scientists can store and host their data, we often face difficulties related to expenses, ease of use, lack of accepted data formats, and metadata standards, lack of individual rewards for sharing of the data, etc. eScience and digital libraries have attempted to address some of these difficulties. In this talk, I will outline some of the issues and solutions related to research data management especially using case studies from the ChemXSeer, ArchSeer, and CiteSeerX projects. I will highlight issues related to data storage, management, retrieval, and the diversity of the data, need for interoperation, preservation and archival, usability and user access, need for standards, security and trustworthiness of the repositories, etc. outlining the progress in each of these areas briefly. I will conclude by summarizing both the success and the open problems in the area.
“Just, Equitable, and Efficient Algorithmic Allocation of Scarce Societal Resources”
Demand for resources that are collectively controlled or regulated by society, like social services or organs for transplantation, typically far outstrips supply. How should these scarce resources be allocated? Any approach to this question requires insights from computer science, economics, and beyond; we must define objectives (foregrounding equity and distributive justice in addition to efficiency), predict outcomes (taking causal considerations into account), and optimize allocations, while carefully considering agent preferences and incentives. Motivated by the real-world problem of provision of services to homeless households, I will discuss our approach to thinking through how algorithmic approaches and computational thinking can help.
“AI for Population Health”
As exemplified by the COVID-19 pandemic, our health and wellbeing depend on a difficult-to-measure web of societal factors and individual behaviors. AI can help us untangle this web and optimize interventions to improve health at a population level, especially for marginalized groups. However, population health applications raise new computational challenges, requiring us to make sense of limited data and optimize decisions under the resulting uncertainty. This talk presents methodological developments in machine learning, optimization, and social networks which are motivated by on-the-ground collaborations on HIV prevention, tuberculosis treatment, and the COVID-19 response. These projects have produced deployed applications and policy impact. For example, I will present the development of an AI-augmented intervention for HIV prevention among homeless youth. This system was deployed and evaluated in a field test enrolling over 700 youth and found to significantly reduce the prevalence of key risk behaviors for HIV.
"Exposure to News in the Digital Age"
Join Sandra González-Bailón, associate professor in the Annenberg School for Communication at the University of Pennsylvania, for her talk, where she will discuss the implications of the divide between informed citizens and news avoiders, and the need to measure exposure across media channels to identify populations that are most likely to be vulnerable to misinformation campaigns.
“Doing Good with Data: Fairly and Equitably”
Can AI, ML and Data Science help help prevent children from getting lead poisoning? Can it help reduce police violence and misconduct? Can it improve vaccination rates? Can it help cities better prioritize limited resources to improve lives of citizens and achieve equity? We’re all aware of the potential of ML and AI but turning this potential into tangible social impact, and more importantly equitable social impact, takes cross-disciplinary training, new methods, and collaborations with governments and non profits. I’ll discuss lessons learned from working on 50+ projects over the past few years with non-profits and governments on high-impact public policy and social challenges in criminal justice, public health, education, economic development, public safety, workforce training, and urban infrastructure. I’ll highlight opportunities as well as challenges around explainability and bias/fairness that need to tackled in order to have social and policy impact in a fair and equitable manner.
“AI for Public Health and Conservation: Learning and Planning in the Data-to-Deployment Pipeline”
With the maturing of AI and multiagent systems research, we have a tremendous opportunity to direct these advances towards addressing complex societal problems. We focus on the problems of public health and wildlife conservation, and present research advances in multiagent systems to address one key cross-cutting challenge: how to effectively deploy our limited intervention resources in these problem domains. We present our deployments from around the world as well as lessons learned that we hope are of use to researchers who are interested in AI for Social Impact. Achieving social impact in these domains often requires methodological advances; we will highlight key research advances in topics such as computational game theory, multi-armed bandits and influence maximization in social networks for addressing challenges in public health and conservation. In pushing this research agenda, we believe AI can indeed play an important role in fighting social injustice and improving society.