A broad range of on-line behaviors are mediated by interfaces in which people make choices among sets of options. A rich and growing line of work in the behavioral sciences indicate that human choices follow not only from the utility of alternatives, but also from the choice set in which alternatives are presented. In this work we study comparison-based choice functions, a simple but surprisingly rich class of functions capable of exhibiting so-called choice-set effects. Read More

An active line of research has used on-line data to study the ways in which discrete units of information---including messages, photos, product recommendations, group invitations---spread through social networks. There is relatively little understanding, however, of how on-line data might help in studying the diffusion of more complex {\em practices}---roughly, routines or styles of work that are generally handed down from one person to another through collaboration or mentorship. In this work, we propose a framework together with a novel type of data analysis that seeks to study the spread of such practices by tracking their syntactic signatures in large document collections. Read More

Detecting strong ties among users in social and information networks is a fundamental operation that can improve performance on a multitude of personalization and ranking tasks. Strong-tie edges are often readily obtained from the social network as users often participate in multiple overlapping networks via features such as following and messaging. These networks may vary greatly in size, density and the information they carry. Read More

In many domains, a latent competition among different conventions determines which one will come to dominate. One sees such effects in the success of community jargon, of competing frames in political rhetoric, or of terminology in technical contexts. These effects have become widespread in the online domain, where the data offers the potential to study competition among conventions at a fine-grained level. Read More

Cascades on online networks have been a popular subject of study in the past decade, and there is a considerable literature on phenomena such as diffusion mechanisms, virality, cascade prediction, and peer network effects. However, a basic question has received comparatively little attention: how desirable are cascades on a social media platform from the point of view of users? While versions of this question have been considered from the perspective of the producers of cascades, any answer to this question must also take into account the effect of cascades on their audience. In this work, we seek to fill this gap by providing a consumer perspective of cascade. Read More

We survey results on neural network expressivity described in "On the Expressive Power of Deep Neural Networks". The paper motivates and develops three natural measures of expressiveness, which all display an exponential dependence on the depth of the network. In fact, all of these measures are related to a fourth quantity, trajectory length. Read More

In the classical cake cutting problem, a resource must be divided among agents with different utilities so that each agent believes they have received a fair share of the resource relative to the other agents. We introduce a variant of the problem in which we model an underlying social network on the agents with a graph, and agents only evaluate their shares relative to their neighbors' in the network. This formulation captures many situations in which it is unrealistic to assume a global view, and also exposes interesting phenomena in the original problem. Read More

Recent discussion in the public sphere about algorithmic classification has involved tension between competing notions of what it means for a probabilistic classification to be fair to different groups. We formalize three fairness conditions that lie at the heart of these debates, and we prove that except in highly constrained special cases, there is no method that can satisfy these three conditions simultaneously. Moreover, even satisfying all three conditions approximately requires that the data lie in an approximate version of one of the constrained special cases identified by our theorem. Read More

Methods for ranking the importance of nodes in a network have a rich history in machine learning and across domains that analyze structured data. Recent work has evaluated these methods though the seed set expansion problem: given a subset $S$ of nodes from a community of interest in an underlying graph, can we reliably identify the rest of the community? We start from the observation that the most widely used techniques for this problem, personalized PageRank and heat kernel methods, operate in the space of landing probabilities of a random walk rooted at the seed set, ranking nodes according to weighted sums of landing probabilities of different length walks. Both schemes, however, lack an a priori relationship to the seed set objective. Read More

We propose a novel approach to the problem of neural network expressivity, which seeks to characterize how structural properties of a neural network family affect the functions it is able to compute. Understanding expressivity is a classical issue in the study of neural networks, but it has remained challenging at both a conceptual and a practical level. Our approach is based on an interrelated set of measures of expressivity, unified by the novel notion of trajectory length, which measures how the output of a network changes as the input sweeps along a one-dimensional path. Read More

An increasing number of domains are providing us with detailed trace data on human decisions in settings where we can evaluate the quality of these decisions via an algorithm. Motivated by this development, an emerging line of work has begun to consider whether we can characterize and predict the kinds of decisions where people are likely to make errors. To investigate what a general framework for human error prediction might look like, we focus on a model system with a rich history in the behavioral sciences: the decisions made by chess players as they select moves in a game. Read More

Present bias, the tendency to weigh costs and benefits incurred in the present too heavily, is one of the most widespread human behavioral biases. It has also been the subject of extensive study in the behavioral economics literature. While the simplest models assume that the agents are naive, reasoning about the future without taking their bias into account, there is considerable evidence that people often behave in ways that are sophisticated with respect to present bias, making plans based on the belief that they will be present-biased in the future. Read More

An active line of research has studied the detection and representation of trends in social media content. There is still relatively little understanding, however, of methods to characterize the early adopters of these trends: who picks up on these trends at different points in time, and what is their role in the system? We develop a framework for analyzing the population of users who participate in trending topics over the course of these topics' lifecycles. Central to our analysis is the notion of a "status gradient", describing how users of different activity levels adopt a trend at different points in time. Read More

Cascades of information-sharing are a primary mechanism by which content reaches its audience on social media, and an active line of research has studied how such cascades, which form as content is reshared from person to person, develop and subside. In this paper, we perform a large-scale analysis of cascades on Facebook over significantly longer time scales, and find that a more complex picture emerges, in which many large cascades recur, exhibiting multiple bursts of popularity with periods of quiescence in between. We characterize recurrence by measuring the time elapsed between bursts, their overlap and proximity in the social network, and the diversity in the demographics of individuals participating in each peak. Read More

Social network research has begun to take advantage of fine-grained communications regarding coordination, decision-making, and knowledge sharing. These studies, however, have not generally analyzed how external events are associated with a social network's structure and communicative properties. Here, we study how external events are associated with a network's change in structure and communications. Read More

Team performance is a ubiquitous area of inquiry in the social sciences, and it motivates the problem of team selection -- choosing the members of a team for maximum performance. Influential work of Hong and Page has argued that testing individuals in isolation and then assembling the highest-scoring ones into a team is not an effective method for team selection. For a broad class of performance measures, based on the expected maximum of random variables representing individual candidates, we show that tests directly measuring individual performance are indeed ineffective, but that a more subtle family of tests used in isolation can provide a constant-factor approximation for team performance. Read More

Environments for decentralized on-line collaboration are now widespread on the Web, underpinning open-source efforts, knowledge creation sites including Wikipedia, and other experiments in joint production. When a distributed group works together in such a setting, the mechanisms they use for coordination can play an important role in the effectiveness of the group's performance. Here we consider the trade-offs inherent in coordination in these on-line settings, balancing the benefits to collaboration with the cost in effort that could be spent in other ways. Read More

Apps are emerging as an important form of on-line content, and they combine aspects of Web usage in interesting ways --- they exhibit a rich temporal structure of user adoption and long-term engagement, and they exist in a broader social ecosystem that helps drive these patterns of adoption and engagement. It has been difficult, however, to study apps in their natural setting since this requires a simultaneous analysis of a large set of popular apps and the underlying social network they inhabit. In this work we address this challenge through an analysis of the collection of apps on Facebook Login, developing a novel framework for analyzing both temporal and social properties. Read More

A fundamental decision faced by a firm hiring employees - and a familiar one to anyone who has dealt with the academic job market, for example - is deciding what caliber of candidates to pursue. Should the firm try to increase its reputation by making offers to higher-quality candidates, despite the risk that the candidates might reject the offers and leave the firm empty-handed? Or should it concentrate on weaker candidates who are more likely to accept the offer? The question acquires an added level of complexity once we take into account the effect one hiring cycle has on the next: hiring better employees in the current cycle increases the firm's reputation, which in turn increases its attractiveness for higher-quality candidates in the next hiring cycle. These considerations introduce an interesting temporal dynamic aspect to the rich line of research on matching models for job markets, in which long-range planning and evolving reputational effects enter into the strategic decisions made by competing firms. Read More

In many settings, people exhibit behavior that is inconsistent across time --- we allocate a block of time to get work done and then procrastinate, or put effort into a project and then later fail to complete it. An active line of research in behavioral economics and related fields has developed and analyzed models for this type of time-inconsistent behavior. Here we propose a graph-theoretic model of tasks and goals, in which dependencies among actions are represented by a directed graph, and a time-inconsistent agent constructs a path through this graph. Read More

On many social networking web sites such as Facebook and Twitter, resharing or reposting functionality allows users to share others' content with their own friends or followers. As content is reshared from user to user, large cascades of reshares can form. While a growing body of research has focused on analyzing and characterizing such cascades, a recent, parallel line of work has argued that the future trajectory of a cascade may be inherently unpredictable. Read More

The Web has enabled one of the most visible recent developments in education---the deployment of massive open online courses. With their global reach and often staggering enrollments, MOOCs have the potential to become a major new mechanism for learning. Despite this early promise, however, MOOCs are still relatively unexplored and poorly understood. Read More

A crucial task in the analysis of on-line social-networking systems is to identify important people --- those linked by strong social ties --- within an individual's network neighborhood. Here we investigate this question for a particular category of strong ties, those involving spouses or romantic partners. We organize our analysis around a basic question: given all the connections among a person's friends, can you recognize his or her romantic partner from the network structure alone? Using data from a large sample of Facebook users, we find that this task can be accomplished with high accuracy, but doing so requires the development of a new measure of tie strength that we term `dispersion' --- the extent to which two people's mutual friends are not themselves well-connected. Read More

A/B testing is a standard approach for evaluating the effect of online experiments; the goal is to estimate the `average treatment effect' of a new feature or condition by exposing a sample of the overall population to it. A drawback with A/B testing is that it is poorly suited for experiments involving social interference, when the treatment of individuals spills over to neighboring individuals along an underlying social network. In this work, we propose a novel methodology using graph clustering to analyze average treatment effects under social interference. Read More

An active line of research has considered games played on networks in which payoffs depend on both a player's individual decision and also the decisions of her neighbors. Such games have been used to model issues including the formation of opinions and the adoption of technology. A basic question that has remained largely open in this area is to consider games where the strategies available to the players come from a fixed, discrete set, and where players may have different intrinsic preferences among the possible strategies. Read More

One of the fundamental principles driving diversity or homogeneity in domains such as cultural differentiation, political affiliation, and product adoption is the tension between two forces: influence (the tendency of people to become similar to others they interact with) and selection (the tendency to be affected most by the behavior of others who are already similar). Influence tends to promote homogeneity within a society, while selection frequently causes fragmentation. When both forces act simultaneously, it becomes an interesting question to analyze which societal outcomes should be expected. Read More

Discussion threads form a central part of the experience on many Web sites, including social networking sites such as Facebook and Google Plus and knowledge creation sites such as Wikipedia. To help users manage the challenge of allocating their attention among the discussions that are relevant to them, there has been a growing need for the algorithmic curation of on-line conversations --- the development of automated methods to select a subset of discussions to present to a user. Here we consider two key sub-problems inherent in conversational curation: length prediction --- predicting the number of comments a discussion thread will receive --- and the novel task of re-entry prediction --- predicting whether a user who has participated in a thread will later contribute another comment to it. Read More

A growing set of on-line applications are generating data that can be viewed as very large collections of small, dense social graphs -- these range from sets of social groups, events, or collaboration projects to the vast collection of graph neighborhoods in large social networks. A natural question is how to usefully define a domain-independent coordinate system for such a collection of graphs, so that the set of possible structures can be compactly represented and understood within a common space. In this work, we draw on the theory of graph homomorphisms to formulate and analyze such a representation, based on computing the frequencies of small induced subgraphs within each graph. Read More

Understanding the ways in which information achieves widespread public awareness is a research question of significant interest. We consider whether, and how, the way in which the information is phrased --- the choice of words and sentence structure --- can affect this process. To this end, we develop an analysis framework and build a corpus of movie quotes, annotated with memorability information, in which we are able to control for both the speaker and the setting of the quotes. Read More

The question of how people form their opinion has fascinated economists and sociologists for quite some time. In many of the models, a group of people in a social network, each holding a numerical opinion, arrive at a shared opinion through repeated averaging with their neighbors in the network. Motivated by the observation that consensus is rarely reached in real opinion dynamics, we study a related sociological model in which individuals' intrinsic beliefs counterbalance the averaging process and yield a diversity of opinions. Read More

Understanding social interaction within groups is key to analyzing online communities. Most current work focuses on structural properties: who talks to whom, and how such interactions form larger network structures. The interactions themselves, however, generally take place in the form of natural language --- either spoken or written --- and one could reasonably suppose that signals manifested in language might also provide information about roles, status, and other aspects of the group's dynamics. Read More

The traditional axiomatic approach to voting is motivated by the problem of reconciling differences in subjective preferences. In contrast, a dominant line of work in the theory of voting over the past 15 years has considered a different kind of scenario, also fundamental to voting, in which there is a genuinely "best" outcome that voters would agree on if they only had enough information. This type of scenario has its roots in the classical Condorcet Jury Theorem; it includes cases such as jurors in a criminal trial who all want to reach the correct verdict but disagree in their inferences from the available evidence, or a corporate board of directors who all want to improve the company's revenue, but who have different information that favors different options. Read More

It is not uncommon for certain social networks to divide into two opposing camps in response to stress. This happens, for example, in networks of political parties during winner-takes-all elections, in networks of companies competing to establish technical standards, and in networks of nations faced with mounting threats of war. A simple model for these two-sided separations is the dynamical system dX/dt = X^2 where X is a matrix of the friendliness or unfriendliness between pairs of nodes in the network. Read More

Social media sites are often guided by a core group of committed users engaged in various forms of governance. A crucial aspect of this type of governance is deliberation, in which such a group reaches decisions on issues of importance to the site. Despite its crucial --- though subtle --- role in how a number of prominent social media sites function, there has been relatively little investigation of the deliberative aspects of social media governance. Read More

Relations between users on social media sites often reflect a mixture of positive (friendly) and negative (antagonistic) interactions. In contrast to the bulk of research on social networks that has focused almost exclusively on positive interpretations of links between people, we study how the interplay between positive and negative relationships affects the structure of on-line social networks. We connect our analyses to theories of signed networks from social psychology. Read More

We study online social networks in which relationships can be either positive (indicating relations such as friendship) or negative (indicating relations such as opposition or antagonism). Such a mix of positive and negative links arise in a variety of online settings; we study datasets from Epinions, Slashdot and Wikipedia. We find that the signs of links in the underlying social networks can be predicted with high accuracy, using models that generalize across this diverse range of sites. Read More

It has often been taken as a working assumption that directed links in information networks are frequently formed by "short-cutting" a two-step path between the source and the destination -- a kind of implicit "link copying" analogous to the process of triadic closure in social networks. Despite the role of this assumption in theoretical models such as preferential attachment, it has received very little direct empirical investigation. Here we develop a formalization and methodology for studying this type of directed closure process, and we provide evidence for its important role in the formation of links on Twitter. Read More

We present a new model for reasoning about the way information is shared among friends in a social network, and the resulting ways in which it spreads. Our model formalizes the intuition that revealing personal information in social settings involves a trade-off between the benefits of sharing information with friends, and the risks that additional gossiping will propagate it to people with whom one is not on friendly terms. We study the behavior of rational agents in such a situation, and we characterize the existence and computability of stable information-sharing networks, in which agents do not have an incentive to change the partners with whom they share information. Read More

There are many on-line settings in which users publicly express opinions. A number of these offer mechanisms for other users to evaluate these opinions; a canonical example is, where reviews come with annotations like "26 of 32 people found the following review helpful. Read More

We model a close-knit community of friends and enemies as a fully connected network with positive and negative signs on its edges. Theories from social psychology suggest that certain sign patterns are more stable than others. This notion of social "balance" allows us to define an energy landscape for such networks. Read More

How can we model networks with a mathematically tractable model that allows for rigorous analysis of network properties? Networks exhibit a long list of surprising properties: heavy tails for the degree distribution; small diameters; and densification and shrinking diameters over time. Most present network models either fail to match several of the above properties, are complicated to analyze mathematically, or both. In this paper we propose a generative model for networks that is both mathematically tractable and can generate networks that have the above mentioned properties. Read More

Superlinear scaling in cities, which appears in sociological quantities such as economic productivity and creative output relative to urban population size, has been observed but not been given a satisfactory theoretical explanation. Here we provide a network model for the superlinear relationship between population size and innovation found in cities, with a reasonable range for the exponent. Read More

Social networks are of interest to researchers in part because they are thought to mediate the flow of information in communities and organizations. Here we study the temporal dynamics of communication using on-line data, including e-mail communication among the faculty and staff of a large university over a two-year period. We formulate a temporal notion of "distance" in the underlying social network by measuring the minimum time required for information to spread from one node to another -- a concept that draws on the notion of vector-clocks from the study of distributed computing systems. Read More

How do real graphs evolve over time? What are ``normal'' growth patterns in social, technological, and information networks? Many studies have discovered patterns in static graphs, identifying properties in a single snapshot of a large network, or in a very small number of snapshots; these include heavy tails for in- and out-degree distributions, communities, small-world phenomena, and others. However, given the lack of information about network evolution over long periods, it has been hard to convert these findings into statements about trends over time. Here we study a wide range of real graphs, and we observe some surprising phenomena. Read More

We analyze a minimal model of a growing network. At each time step, a new vertex is added; then, with probability delta, two vertices are chosen uniformly at random and joined by an undirected edge. This process is repeated for t time steps. Read More