# Ali H. Sayed

## Contact Details

NameAli H. Sayed |
||

Affiliation |
||

Location |
||

## Pubs By Year |
||

## Pub CategoriesComputer Science - Multiagent Systems (25) Mathematics - Optimization and Control (21) Mathematics - Information Theory (16) Computer Science - Learning (16) Computer Science - Information Theory (16) Computer Science - Distributed; Parallel; and Cluster Computing (12) Statistics - Machine Learning (7) Physics - Physics and Society (3) Mathematics - Probability (2) Computer Science - Artificial Intelligence (1) Statistics - Computation (1) Mathematics - Statistics (1) Computer Science - Computer Science and Game Theory (1) Statistics - Theory (1) |

## Publications Authored By Ali H. Sayed

The analysis in Part I revealed interesting properties for subgradient learning algorithms in the context of stochastic optimization when gradient noise is present. These algorithms are used when the risk functions are non-smooth and involve non-differentiable components. They have been long recognized as being slow converging methods. Read More

This work develops a distributed optimization strategy with guaranteed exact convergence for a broad class of left-stochastic combination policies. The resulting exact diffusion strategy is shown in Part II to have a wider stability range and superior convergence performance than the EXTRA strategy. The exact diffusion solution is applicable to non-symmetric left-stochastic combination matrices, while most earlier developments on exact consensus implementations are limited to doubly-stochastic matrices; these latter matrices impose stringent constraints on the network topology. Read More

Part I of this work developed the exact diffusion algorithm to remove the bias that is characteristic of distributed solutions for deterministic optimization problems. The algorithm was shown to be applicable to a larger set of combination policies than earlier approaches in the literature. In particular, the combination matrices are not required to be doubly stochastic or right-stochastic, which impose stringent conditions on the graph topology and communications protocol. Read More

Online learning with streaming data in a distributed and collaborative manner can be useful in a wide range of applications. This topic has been receiving considerable attention in recent years with emphasis on both single-task and multitask scenarios. In single-task adaptation, agents cooperate to track an objective of common interest, while in multitask adaptation agents track multiple objectives simultaneously. Read More

We propose an asynchronous, decentralized algorithm for consensus optimization. The algorithm runs over a network in which the agents communicate with their neighbors and perform local computation. In the proposed algorithm, each agent can compute and communicate independently at different times, for different durations, with the information it has even if the latest information from its neighbors is not yet available. Read More

We consider the problem of decentralized clustering and estimation over multi-task networks, where agents infer and track different models of interest. The agents do not know beforehand which model is generating their own data. They also do not know which agents in their neighborhood belong to the same cluster. Read More

We consider distributed multitask learning problems over a network of agents where each agent is interested in estimating its own parameter vector, also called task, and where the tasks at neighboring agents are related according to a set of linear equality constraints. Each agent possesses its own convex cost function of its parameter vector and a set of linear equality constraints involving its own parameter vector and the parameter vectors of its neighboring agents. We propose an adaptive stochastic algorithm based on the projection gradient method and diffusion strategies in order to allow the network to optimize the individual costs subject to all constraints. Read More

In this paper, we study diffusion social learning over weakly-connected graphs. We show that the asymmetric flow of information hinders the learning abilities of certain agents regardless of their local observations. Under some circumstances that we clarify in this work, a scenario of total influence (or "mind-control") arises where a set of influential agents ends up shaping the beliefs of non-influential agents. Read More

This work examines a stochastic formulation of the generalized Nash equilibrium problem (GNEP) where agents are subject to randomness in the environment of unknown statistical distribution. We focus on fully-distributed online learning by agents and employ penalized individual cost functions to deal with coupled constraints. Three stochastic gradient strategies are developed with constant step-sizes. Read More

This work examines the mean-square error performance of diffusion stochastic algorithms under a generalized coordinate-descent scheme. In this setting, the adaptation step by each agent is limited to a random subset of the coordinates of its stochastic gradient vector. The selection of coordinates varies randomly from iteration to iteration and from agent to agent across the network. Read More

The article examines in some detail the convergence rate and mean-square-error performance of momentum stochastic gradient methods in the constant step-size and slow adaptation regime. The results establish that momentum methods are equivalent to the standard stochastic gradient method with a re-scaled (larger) step-size value. The size of the re-scaling is determined by the value of the momentum parameter. Read More

The stochastic dual coordinate-ascent (S-DCA) technique is a useful alternative to the traditional stochastic gradient-descent algorithm for solving large-scale optimization problems due to its scalability to large data sets and strong theoretical guarantees. However, the available S-DCA formulation is limited to finite sample sizes and relies on performing multiple passes over the same data. This formulation is not well-suited for online implementations where data keep streaming in. Read More

We consider distributed detection problems over adaptive networks, where dispersed agents learn continually from streaming data by means of local interactions. The simultaneous requirements of adaptation and cooperation are achieved by employing diffusion algorithms with constant step-size {\mu}. In [1], [2] some main features of adaptive distributed detection were revealed. Read More

In a recent article [1] we surveyed advances related to adaptation, learning, and optimization over synchronous networks. Various distributed strategies were discussed that enable a collection of networked agents to interact locally in response to streaming data and to continually learn and adapt to track drifts in the data and models. Under reasonable technical conditions on the data, the adaptive networks were shown to be mean-square stable in the slow adaptation regime, and their mean-square-error performance and convergence rate were characterized in terms of the network topology and data statistical moments [2]. Read More

In this work and the supporting Part II, we examine the performance of stochastic sub-gradient learning strategies under weaker conditions than usually considered in the literature. The new conditions are shown to be automatically satisfied by several important cases of interest including SVM, LASSO, and Total-Variation denoising formulations. In comparison, these problems do not satisfy the traditional assumptions used in prior analyses and, therefore, conclusions derived from these earlier treatments are not directly applicable to these problems. Read More

In this work, we consider multitask learning problems where clusters of nodes are interested in estimating their own parameter vector. Cooperation among clusters is beneficial when the optimal models of adjacent clusters have a good number of similar entries. We propose a fully distributed algorithm for solving this problem. Read More

We study the performance of diffusion least-mean-square algorithms for distributed parameter estimation in multi-agent networks when nodes exchange information over wireless communication links. Wireless channel impairments, such as fading and path-loss, adversely affect the exchanged data and cause instability and performance degradation if left unattended. To mitigate these effects, we incorporate equalization coefficients into the diffusion combination step and update the combination weights dynamically in the face of randomly changing neighborhoods due to fading conditions. Read More

We study the problem of distributed adaptive estimation over networks where nodes cooperate to estimate physical parameters that can vary over both space and time domains. We use a set of basis functions to characterize the space-varying nature of the parameters and propose a diffusion least mean-squares (LMS) strategy to recover these parameters from successive time measurements. We analyze the stability and convergence of the proposed algorithm, and derive closed-form expressions to predict its learning behavior and steady-state performance in terms of mean-square error. Read More

In many fields, and especially in the medical and social sciences and in recommender systems, data are gathered through clinical studies or targeted surveys. Participants are generally reluctant to respond to all questions in a survey or they may lack information to respond adequately to some questions. The data collected from these studies tend to lead to linear regression models where the regression vectors are only known partially: some of their entries are either missing completely or replaced randomly by noisy values. Read More

We examine the behavior of multi-agent networks where information-sharing is subject to a positive communications cost over the edges linking the agents. We consider a general mean-square-error formulation where all agents are interested in estimating the same target vector. We first show that, in the absence of any incentives to cooperate, the optimal strategy for the agents is to behave in a selfish manner with each agent seeking the optimal solution independently of the other agents. Read More

The multitask diffusion LMS is an efficient strategy to simultaneously infer, in a collaborative manner, multiple parameter vectors. Existing works on multitask problems assume that all agents respond to data synchronously. In several applications, agents may not be able to act synchronously because networks can be subject to several sources of uncertainties such as changing topology, random link failures, or agents turning on and off for energy conservation. Read More

The paper examines the learning mechanism of adaptive agents over weakly-connected graphs and reveals an interesting behavior on how information flows through such topologies. The results clarify how asymmetries in the exchange of data can mask local information at certain agents and make them totally dependent on other agents. A leader-follower relationship develops with the performance of some agents being fully determined by the performance of other agents that are outside their domain of influence. Read More

Distributed processing over networks relies on in-network processing and cooperation among neighboring agents. Cooperation is beneficial when agents share a common objective. However, in many applications agents may belong to different clusters that pursue different objectives. Read More

This work studies distributed primal-dual strategies for adaptation and learning over networks from streaming data. Two first-order methods are considered based on the Arrow-Hurwicz (AH) and augmented Lagrangian (AL) techniques. Several revealing results are discovered in relation to the performance and stability of these strategies when employed over adaptive networks. Read More

The diffusion LMS algorithm has been extensively studied in recent years. This efficient strategy allows to address distributed optimization problems over networks in the case where nodes have to collaboratively estimate a single parameter vector. Problems of this type are referred to as single-task problems. Read More

In this paper, we consider learning dictionary models over a network of agents, where each agent is only in charge of a portion of the dictionary elements. This formulation is relevant in Big Data scenarios where large dictionary models may be spread over different spatial locations and it is not feasible to aggregate all dictionaries in one location due to communication and privacy considerations. We first show that the dual function of the inference problem is an aggregation of individual cost functions associated with different agents, which can then be minimized efficiently by means of diffusion strategies. Read More

This work examines the close interplay between cooperation and adaptation for distributed detection schemes over fully decentralized networks. The combined attributes of cooperation and adaptation are necessary to enable networks of detectors to continually learn from streaming data and to continually track drifts in the state of nature when deciding in favor of one hypothesis or another. The results in the paper establish a fundamental scaling law for the steady-state probabilities of miss-detection and false-alarm in the slow adaptation regime, when the agents interact with each other according to distributed strategies that employ small constant step-sizes. Read More

Part I of this work examined the mean-square stability and convergence of the learning process of distributed strategies over graphs. The results identified conditions on the network topology, utilities, and data in order to ensure stability; the results also identified three distinct stages in the learning behavior of multi-agent networks related to transient phases I and II and the steady-state phase. This Part II examines the steady-state phase of distributed learning by networked agents. Read More

This work carries out a detailed transient analysis of the learning behavior of multi-agent networks, and reveals interesting results about the learning abilities of distributed strategies. Among other results, the analysis reveals how combination policies influence the learning process of networked agents, and how these policies can steer the convergence point towards any of many possible Pareto optimal solutions. The results also establish that the learning process of an adaptive network undergoes three (rather than two) well-defined stages of evolution with distinctive convergence rates during the first two stages, while attaining a finite mean-square-error (MSE) level in the last stage. Read More

We apply diffusion strategies to develop a fully-distributed cooperative reinforcement learning algorithm in which agents in a network communicate only with their immediate neighbors to improve predictions about their environment. The algorithm can also be applied to off-policy learning, meaning that the agents can predict the response to a behavior different from the actual policies they are following. The proposed distributed strategy is efficient, with linear complexity in both computation time and memory footprint. Read More

In this work and the supporting Parts II [2] and III [3], we provide a rather detailed analysis of the stability and performance of asynchronous strategies for solving distributed optimization and adaptation problems over networks. We examine asynchronous networks that are subject to fairly general sources of uncertainties, such as changing topologies, random link failures, random data arrival times, and agents turning on and off randomly. Under this model, agents in the network may stop updating their solutions or may stop sending or receiving information in a random manner and without coordination with other agents. Read More

In Part I \cite{Zhao13TSPasync1}, we introduced a fairly general model for asynchronous events over adaptive networks including random topologies, random link failures, random data arrival times, and agents turning on and off randomly. We performed a stability analysis and established the notable fact that the network is still able to converge in the mean-square-error sense to the desired solution. Once stable behavior is guaranteed, it becomes important to evaluate how fast the iterates converge and how close they get to the optimal solution. Read More

In Part II [3] we carried out a detailed mean-square-error analysis of the performance of asynchronous adaptation and learning over networks under a fairly general model for asynchronous events including random topologies, random link failures, random data arrival times, and agents turning on and off randomly. In this Part III, we compare the performance of synchronous and asynchronous networks. We also compare the performance of decentralized adaptation against centralized stochastic-gradient (batch) solutions. Read More

In this work, we study the task of distributed optimization over a network of learners in which each learner possesses a convex cost function, a set of affine equality constraints, and a set of convex inequality constraints. We propose a fully-distributed adaptive diffusion algorithm based on penalty methods that allows the network to cooperatively optimize the global cost function, which is defined as the sum of the individual costs over the network, subject to all constraints. We show that when small constant step-sizes are employed, the expected distance between the optimal solution vector and that obtained at each node in the network can be made arbitrarily small. Read More

Adaptive networks are suitable for decentralized inference tasks, e.g., to monitor complex natural phenomena. Read More

Recent research works on distributed adaptive networks have intensively studied the case where the nodes estimate a common parameter vector collaboratively. However, there are many applications that are multitask-oriented in the sense that there are multiple parameter vectors that need to be inferred simultaneously. In this paper, we employ diffusion strategies to develop distributed algorithms that address clustered multitask problems by minimizing an appropriate mean-square error criterion with $\ell_2$-regularization. Read More

In distributed processing, agents generally collect data generated by the same underlying unknown model (represented by a vector of parameters) and then solve an estimation or inference task cooperatively. In this paper, we consider the situation in which the data observed by the agents may have risen from two different models. Agents do not know beforehand which model accounts for their data and the data of their neighbors. Read More

This work studies the learning ability of consensus and diffusion distributed learners from continuous streams of data arising from different but related statistical distributions. Four distinctive features for diffusion learners are revealed in relation to other decentralized schemes even under left-stochastic combination policies. First, closed-form expressions for the evolution of their excess-risk are derived for strongly-convex risk functions under a diminishing step-size rule. Read More

In this work, we analyze the generalization ability of distributed online learning algorithms under stationary and non-stationary environments. We derive bounds for the excess-risk attained by each node in a connected network of learners and study the performance advantage that diffusion strategies have over individual non-cooperative processing. We conduct extensive simulations to illustrate the results. Read More

We consider solving multi-objective optimization problems in a distributed manner by a network of cooperating and learning agents. The problem is equivalent to optimizing a global cost that is the sum of individual components. The optimizers of the individual components do not necessarily coincide and the network therefore needs to seek Pareto optimal solutions. Read More

In this work we analyze the mean-square performance of different strategies for distributed estimation over least-mean-squares (LMS) adaptive networks. The results highlight some useful properties for distributed adaptation in comparison to fusion-based centralized solutions. The analysis establishes that, by optimizing over the combination weights, diffusion strategies can deliver lower excess-mean-square-error than centralized solutions employing traditional block or incremental LMS strategies. Read More

This article proposes diffusion LMS strategies for distributed estimation over adaptive networks that are able to exploit sparsity in the underlying system model. The approach relies on convex regularization, common in compressive sensing, to enhance the detection of sparsity via a diffusive process over the network. The resulting algorithms endow networks with learning abilities and allow them to learn the sparse structure from the incoming data in real-time, and also to track variations in the sparsity of the model. Read More

Adaptive networks are well-suited to perform decentralized information processing and optimization tasks and to model various types of self-organized and complex behavior encountered in nature. Adaptive networks consist of a collection of agents with processing and learning abilities. The agents are linked together through a connection topology, and they cooperate with each other through local interactions to solve distributed optimization, estimation, and inference problems in real-time. Read More

Adaptive networks consist of a collection of nodes with adaptation and learning abilities. The nodes interact with each other on a local level and diffuse information across the network to solve estimation and inference tasks in a distributed manner. In this work, we compare the mean-square performance of two main strategies for distributed estimation over networks: consensus strategies and diffusion strategies. Read More

Adaptive networks consist of a collection of agents with adaptation and learning abilities. The agents interact with each other on a local level and diffuse information across the network through their collaborations. In this work, we consider two types of agents: informed agents and uninformed agents. Read More

Adaptive networks rely on in-network and collaborative processing among distributed agents to deliver enhanced performance in estimation and inference tasks. Information is exchanged among the nodes, usually over noisy links. The combination weights that are used by the nodes to fuse information from their neighbors play a critical role in influencing the adaptation and tracking abilities of the network. Read More

We propose an adaptive diffusion mechanism to optimize a global cost function in a distributed manner over a network of nodes. The cost function is assumed to consist of a collection of individual components. Diffusion adaptation allows the nodes to cooperate and diffuse information in real-time; it also helps alleviate the effects of stochastic gradient noise and measurement noise through a continuous learning process. Read More

Spectrum sensing is one of the enabling functionalities for cognitive radio (CR) systems to operate in the spectrum white space. To protect the primary incumbent users from interference, the CR is required to detect incumbent signals at very low signal-to-noise ratio (SNR). In this paper, we present a spectrum sensing technique based on correlating spectra for detection of television (TV) broadcasting signals. Read More

Spectrum sensing is an essential enabling functionality for cognitive radio networks to detect spectrum holes and opportunistically use the under-utilized frequency bands without causing harmful interference to legacy networks. This paper introduces a novel wideband spectrum sensing technique, called multiband joint detection, which jointly detects the signal energy levels over multiple frequency bands rather than consider one band at a time. The proposed strategy is efficient in improving the dynamic spectrum utilization and reducing interference to the primary users. Read More

Spectrum sensing is an essential functionality that enables cognitive radios to detect spectral holes and opportunistically use under-utilized frequency bands without causing harmful interference to primary networks. Since individual cognitive radios might not be able to reliably detect weak primary signals due to channel fading/shadowing, this paper proposes a cooperative wideband spectrum sensing scheme, referred to as spatial-spectral joint detection, which is based on a linear combination of the local statistics from spatially distributed multiple cognitive radios. The cooperative sensing problem is formulated into an optimization problem, for which suboptimal but efficient solutions can be obtained through mathematical transformation under practical conditions. Read More