**Chaotic Time Series Prediction using Spatio-Temporal RBF Neural Networks**

Due to the dynamic nature, chaotic time series are difficult predict. In conventional signal processing approaches signals are treated either in time or in space domain only. Spatio-temporal analysis of signal provides more advantages over conventional uni-dimensional approaches by harnessing the information from both the temporal and spatial domains. Herein, we propose an spatio-temporal extension of RBF neural networks for the prediction of chaotic time series. The proposed algorithm utilizes the concept of time-space orthogonality and separately deals with the temporal dynamics and spatial non-linearity(complexity) of the chaotic series. The proposed RBF architecture is explored for the prediction of Mackey-Glass time series and results are compared with the standard RBF. The spatio-temporal RBF is shown to out perform the standard RBFNN by achieving significantly reduced estimation error.

**A Deep Actor-Critic Reinforcement Learning Framework for Dynamic Multichannel Access**

To make efficient use of limited spectral resources, we in this work propose a deep actor-critic reinforcement learning based framework for dynamic multichannel access. We consider both a single-user case and a scenario in which multiple users attempt to access channels simultaneously. We employ the proposed framework as a single agent in the single-user case, and extend it to a decentralized multi-agent framework in the multi-user scenario. In both cases, we develop algorithms for the actor-critic deep reinforcement learning and evaluate the proposed learning policies via experiments and numerical results. In the single-user model, in order to evaluate the performance of the proposed channel access policy and the framework’s tolerance against uncertainty, we explore different channel switching patterns and different switching probabilities. In the case of multiple users, we analyze the probabilities of each user accessing channels with favorable channel conditions and the probability of collision. We also address a time-varying environment to identify the adaptive ability of the proposed framework. Additionally, we provide comparisons (in terms of both the average reward and time efficiency) between the proposed actor-critic deep reinforcement learning framework, Deep-Q network (DQN) based approach, random access, and the optimal policy when the channel dynamics are known.

**Temporal Neighbourhood Aggregation: Predicting Future Links in Temporal Graphs via Recurrent Variational Graph Convolutions**

Graphs have become a crucial way to represent large, complex and often temporal datasets across a wide range of scientific disciplines. However, when graphs are used as input to machine learning models, this rich temporal information is frequently disregarded during the learning process, resulting in suboptimal performance on certain temporal infernce tasks. To combat this, we introduce Temporal Neighbourhood Aggregation (TNA), a novel vertex representation model architecture designed to capture both topological and temporal information to directly predict future graph states. Our model exploits hierarchical recurrence at different depths within the graph to enable exploration of changes in temporal neighbourhoods, whilst requiring no additional features or labels to be present. The final vertex representations are created using variational sampling and are optimised to directly predict the next graph in the sequence. Our claims are reinforced by extensive experimental evaluation on both real and synthetic benchmark datasets, where our approach demonstrates superior performance compared to competing methods, out-performing them at predicting new temporal edges by as much as 23% on real-world datasets, whilst also requiring fewer overall model parameters.

**U-Net Training with Instance-Layer Normalization**

Normalization layers are essential in a Deep Convolutional Neural Network (DCNN). Various normalization methods have been proposed. The statistics used to normalize the feature maps can be computed at batch, channel, or instance level. However, in most of existing methods, the normalization for each layer is fixed. Batch-Instance Normalization (BIN) is one of the first proposed methods that combines two different normalization methods and achieve diverse normalization for different layers. However, two potential issues exist in BIN: first, the Clip function is not differentiable at input values of 0 and 1; second, the combined feature map is not with a normalized distribution which is harmful for signal propagation in DCNN. In this paper, an Instance-Layer Normalization (ILN) layer is proposed by using the Sigmoid function for the feature map combination, and cascading group normalization. The performance of ILN is validated on image segmentation of the Right Ventricle (RV) and Left Ventricle (LV) using U-Net as the network architecture. The results show that the proposed ILN outperforms previous traditional and popular normalization methods with noticeable accuracy improvements for most validations, supporting the effectiveness of the proposed ILN.

**A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation**

We introduce a new algorithm for multi-objective reinforcement learning (MORL) with linear preferences, with the goal of enabling few-shot adaptation to new tasks. In MORL, the aim is to learn policies over multiple competing objectives whose relative importance (preferences) is unknown to the agent. While this alleviates dependence on scalar reward design, the expected return of a policy can change significantly with varying preferences, making it challenging to learn a single model to produce optimal policies under different preference conditions. We propose a generalized version of the Bellman equation to learn a single parametric representation for optimal policies over the space of all possible preferences. After this initial learning phase, our agent can quickly adapt to any given preference, or automatically infer an underlying preference with very few samples. Experiments across four different domains demonstrate the effectiveness of our approach.

**Populating Web Scale Knowledge Graphs using Distantly Supervised Relation Extraction and Validation**

In this paper, we propose a fully automated system to extend knowledge graphs using external information from web-scale corpora. The designed system leverages a deep learning based technology for relation extraction that can be trained by a distantly supervised approach. In addition to that, the system uses a deep learning approach for knowledge base completion by utilizing the global structure information of the induced KG to further refine the confidence of the newly discovered relations. The designed system does not require any effort for adaptation to new languages and domains as it does not use any hand-labeled data, NLP analytics and inference rules. Our experiments, performed on a popular academic benchmark demonstrate that the suggested system boosts the performance of relation extraction by a wide margin, reporting error reductions of 50%, resulting in relative improvement of up to 100%. Also, a web-scale experiment conducted to extend DBPedia with knowledge from Common Crawl shows that our system is not only scalable but also does not require any adaptation cost, while yielding substantial accuracy gain.

**Memristive Networks: from Graph Theory to Statistical Physics**

We provide an introduction to a very specific toy model of memristive networks, for which an exact differential equation for the internal memory which contains the Kirchhoff laws is known. In particular, we highlight how the circuit topology enters the dynamics via an analysis of directed graph. We try to highlight in particular the connection between the asymptotic states of memristors and the Ising model, and the relation to the dynamics and statics of disordered systems.

**X-SQL: reinforce schema representation with context**

In this work, we present X-SQL, a new network architecture for the problem of parsing natural language to SQL query. X-SQL proposes to enhance the structural schema representation with the contextual output from BERT-style pre-training model, and together with type information to learn a new schema representation for down-stream tasks. We evaluated X-SQL on the WikiSQL dataset and show its new state-of-the-art performance.

**Lecture Notes on ‘Free Probability Theory’**

This in an introduction to free probability theory, covering the basic combinatorial and analytic theory, as well as the relations to random matrices and operator algebras. The material is mainly based on the two books of the lecturer, one joint with Nica and one joint with Mingo. Free probability is here restricted to the scalar-valued setting, the operator-valued version is treated in the subsequent lecture series on ‘Non-Commutative Distributions’. The material here was presented in the winter term 2018/19 at Saarland University in 26 lectures of 90 minutes each. The lectures were recorded and can be found online at

https://…/index.html

**Transferability and Hardness of Supervised Classification Tasks**

We propose a novel approach for estimating the difficulty and transferability of supervised classification tasks. Unlike previous work, our approach is solution agnostic and does not require or assume trained models. Instead, we estimate these values using an information theoretic approach: treating training labels as random variables and exploring their statistics. When transferring from a source to a target task, we consider the conditional entropy between two such variables (i.e., label assignments of the two tasks). We show analytically and empirically that this value is related to the loss of the transferred model. We further show how to use this value to estimate task hardness. We test our claims extensively on three large scale data sets — CelebA (40 tasks), Animals with Attributes 2 (85 tasks), and Caltech-UCSD Birds 200 (312 tasks) — together representing 437 classification tasks. We provide results showing that our hardness and transferability estimates are strongly correlated with empirical hardness and transferability. As a case study, we transfer a learned face recognition model to CelebA attribute classification tasks, showing state of the art accuracy for tasks estimated to be highly transferable.

**Semi-supervised Adversarial Active Learning on Attributed Graphs**

Active learning (AL) on attributed graphs has received increasing attention with the prevalence of graph-structured data. Although AL has been widely studied for alleviating label sparsity issues with the conventional independent and identically distributed (i.i.d.) data, how to make it effective over attributed graphs remains an open research question. Existing AL algorithms on graphs attempt to reuse the classic AL query strategies designed for i.i.d. data. However, they suffer from two major limitations. First, different AL query strategies calculated in distinct scoring spaces are often naively combined to determine which nodes to be labelled. Second, the AL query engine and the learning of the classifier are treated as two separating processes, resulting in unsatisfactory performance. In this paper, we propose a SEmi-supervised Adversarial active Learning (SEAL) framework on attributed graphs, which fully leverages the representation power of deep neural networks and devises a novel AL query strategy in an adversarial way. Our framework learns two adversarial components: a graph embedding network that encodes both the unlabelled and labelled nodes into a latent space, expecting to trick the discriminator to regard all nodes as already labelled, and a semi-supervised discriminator network that distinguishes the unlabelled from the existing labelled nodes in the latent space. The divergence score, generated by the discriminator in a unified latent space, serves as the informativeness measure to actively select the most informative node to be labelled by an oracle. The two adversarial components form a closed loop to mutually and simultaneously reinforce each other towards enhancing the active learning performance. Extensive experiments on four real-world networks validate the effectiveness of the SEAL framework with superior performance improvements to state-of-the-art baselines.

**Report on the First Knowledge Graph Reasoning Challenge 2018 — Toward the eXplainable AI System**

A new challenge for knowledge graph reasoning started in 2018. Deep learning has promoted the application of artificial intelligence (AI) techniques to a wide variety of social problems. Accordingly, being able to explain the reason for an AI decision is becoming important to ensure the secure and safe use of AI techniques. Thus, we, the Special Interest Group on Semantic Web and Ontology of the Japanese Society for AI, organized a challenge calling for techniques that reason and/or estimate which characters are criminals while providing a reasonable explanation based on an open knowledge graph of a well-known Sherlock Holmes mystery story. This paper presents a summary report of the first challenge held in 2018, including the knowledge graph construction, the techniques proposed for reasoning and/or estimation, the evaluation metrics, and the results. The first prize went to an approach that formalized the problem as a constraint satisfaction problem and solved it using a lightweight formal method; the second prize went to an approach that used SPARQL and rules; the best resource prize went to a submission that constructed word embedding of characters from all sentences of Sherlock Holmes novels; and the best idea prize went to a discussion multi-agents model. We conclude this paper with the plans and issues for the next challenge in 2019.

**Relation-Aware Entity Alignment for Heterogeneous Knowledge Graphs**

Entity alignment is the task of linking entities with the same real-world identity from different knowledge graphs (KGs), which has been recently dominated by embedding-based methods. Such approaches work by learning KG representations so that entity alignment can be performed by measuring the similarities between entity embeddings. While promising, prior works in the field often fail to properly capture complex relation information that commonly exists in multi-relational KGs, leaving much room for improvement. In this paper, we propose a novel Relation-aware Dual-Graph Convolutional Network (RDGCN) to incorporate relation information via attentive interactions between the knowledge graph and its dual relation counterpart, and further capture neighboring structures to learn better entity representations. Experiments on three real-world cross-lingual datasets show that our approach delivers better and more robust results over the state-of-the-art alignment methods by learning better KG representations.

**motif2vec: Motif Aware Node Representation Learning for Heterogeneous Networks**

Recent years have witnessed a surge of interest in machine learning on graphs and networks with applications ranging from vehicular network design to IoT traffic management to social network recommendations. Supervised machine learning tasks in networks such as node classification and link prediction require us to perform feature engineering that is known and agreed to be the key to success in applied machine learning. Research efforts dedicated to representation learning, especially representation learning using deep learning, has shown us ways to automatically learn relevant features from vast amounts of potentially noisy, raw data. However, most of the methods are not adequate to handle heterogeneous information networks which pretty much represents most real-world data today. The methods cannot preserve the structure and semantic of multiple types of nodes and links well enough, capture higher-order heterogeneous connectivity patterns, and ensure coverage of nodes for which representations are generated. We propose a novel efficient algorithm, motif2vec that learns node representations or embeddings for heterogeneous networks. Specifically, we leverage higher-order, recurring, and statistically significant network connectivity patterns in the form of motifs to transform the original graph to motif graph(s), conduct biased random walk to efficiently explore higher order neighborhoods, and then employ heterogeneous skip-gram model to generate the embeddings. Unlike previous efforts that uses different graph meta-structures to guide the random walk, we use graph motifs to transform the original network and preserve the heterogeneity. We evaluate the proposed algorithm on multiple real-world networks from diverse domains and against existing state-of-the-art methods on multi-class node classification and link prediction tasks, and demonstrate its consistent superiority over prior work.

**Quantum Algorithms for Portfolio Optimization**

We develop the first quantum algorithm for the constrained portfolio optimization problem. The algorithm has running time

, where

is the number of positivity and budget constraints,

is the number of assets in the portfolio,

the desired precision, and

are problem-dependent parameters related to the well-conditioning of the intermediate solutions. If only a moderately accurate solution is required, our quantum algorithm can achieve a polynomial speedup over the best classical algorithms with complexity

, where

is the matrix multiplication exponent that has a theoretical value of around

, but is closer to

in practice. We also provide some experiments to bound the problem-dependent factors arising in the running time of the quantum algorithm, and these experiments suggest that for most instances the quantum algorithm can potentially achieve an

speedup over its classical counterpart.

**Using Social Media for Word-of-Mouth Marketing**

Nowadays online social networks are used extensively for personal and commercial purposes. This widespread popularity makes them an ideal platform for advertisements. Social media can be used for both direct and word-of-mouth (WoM) marketing. Although WoM marketing is considered more effective and it requires less advertisement cost, it is currently being under-utilized. To do WoM marketing, we need to identify a set of people who can use their authoritative position in social network to promote a given product. In this paper, we show how to do WoM marketing in Facebook group, which is a question answer type of social network. We also present concept of reinforced WoM marketing, where multiple authorities can together promote a product to increase the effectiveness of marketing. We perform our experiments on Facebook group dataset consisting of 0.3 million messages and 10 million user reactions.

**Image Colorization By Capsule Networks**

In this paper, a simple topology of Capsule Network (CapsNet) is investigated for the problem of image colorization. The generative and segmentation capabilities of the original CapsNet topology, which is proposed for image classification problem, is leveraged for the colorization of the images by modifying the network as follows:1) The original CapsNet model is adapted to map the grayscale input to the output in the CIE Lab colorspace, 2) The feature detector part of the model is updated by using deeper feature layers inherited from VGG-19 pre-trained model with weights in order to transfer low-level image representation capability to this model, 3) The margin loss function is modified as Mean Squared Error (MSE) loss to minimize the image-to-imagemapping. The resulting CapsNet model is named as Colorizer Capsule Network (ColorCapsNet).The performance of the ColorCapsNet is evaluated on the DIV2K dataset and promising results are obtained to investigate Capsule Networks further for image colorization problem.

**Deconstructing Blockchains: A Comprehensive Survey on Consensus, Membership and Structure**

It is no exaggeration to say that since the introduction of Bitcoin, blockchains have become a disruptive technology that has shaken the world. However, the rising popularity of the paradigm has led to a flurry of proposals addressing variations and/or trying to solve problems stemming from the initial specification. This added considerable complexity to the current blockchain ecosystems, amplified by the absence of detail in many accompanying blockchain whitepapers. Through this paper, we set out to explain blockchains in a simple way, taming that complexity through the deconstruction of the blockchain into three simple, critical components common to all known systems: membership selection, consensus mechanism and structure. We propose an evaluation framework with insight into system models, desired properties and analysis criteria, using the decoupled components as criteria. We use this framework to provide clear and intuitive overviews of the design principles behind the analyzed systems and the properties achieved. We hope our effort will help clarifying the current state of blockchain proposals and provide directions to the analysis of future proposals.

**Measuring the Business Value of Recommender Systems**

Recommender Systems are nowadays successfully used by all major web sites (from e-commerce to social media) to filter content and make suggestions in a personalized way. Academic research largely focuses on the value of recommenders for consumers, e.g., in terms of reduced information overload. To what extent and in which ways recommender systems create business value is, however, much less clear, and the literature on the topic is scattered. In this research commentary, we review existing publications on field tests of recommender systems and report which business-related performance measures were used in such real-world deployments. We summarize common challenges of measuring the business value in practice and critically discuss the value of algorithmic improvements and offline experiments as commonly done in academic environments. Overall, our review indicates that various open questions remain both regarding the realistic quantification of the business effects of recommenders and the performance assessment of recommendation algorithms in academia.

**The compositionality of neural networks: integrating symbolism and connectionism**

Despite a multitude of empirical studies, little consensus exists on whether neural networks are able to generalise compositionally, a controversy that, in part, stems from a lack of agreement about what it means for a neural model to be compositional. As a response to this controversy, we present a set of tests that provide a bridge between, on the one hand, the vast amount of linguistic and philosophical theory about compositionality and, on the other, the successful neural models of language. We collect different interpretations of compositionality and translate them into five theoretically grounded tests that are formulated on a task-independent level. In particular, we provide tests to investigate (i) if models systematically recombine known parts and rules (ii) if models can extend their predictions beyond the length they have seen in the training data (iii) if models’ composition operations are local or global (iv) if models’ predictions are robust to synonym substitutions and (v) if models favour rules or exceptions during training. To demonstrate the usefulness of this evaluation paradigm, we instantiate these five tests on a highly compositional data set which we dub PCFG SET and apply the resulting tests to three popular sequence-to-sequence models: a recurrent, a convolution based and a transformer model. We provide an in depth analysis of the results, that uncover the strengths and weaknesses of these three architectures and point to potential areas of improvement.

**A General Data Renewal Model for Prediction Algorithms in Industrial Data Analytics**

In industrial data analytics, one of the fundamental problems is to utilize the temporal correlation of the industrial data to make timely predictions in the production process, such as fault prediction and yield prediction. However, the traditional prediction models are fixed while the conditions of the machines change over time, thus making the errors of predictions increase with the lapse of time. In this paper, we propose a general data renewal model to deal with it. Combined with the similarity function and the loss function, it estimates the time of updating the existing prediction model, then updates it according to the evaluation function iteratively and adaptively. We have applied the data renewal model to two prediction algorithms. The experiments demonstrate that the data renewal model can effectively identify the changes of data, update and optimize the prediction model so as to improve the accuracy of prediction.

**Practical Risk Measures in Reinforcement Learning**

Practical application of Reinforcement Learning (RL) often involves risk considerations. We study a generalized approximation scheme for risk measures, based on Monte-Carlo simulations, where the risk measures need not necessarily be \emph{coherent}. We demonstrate that, even in simple problems, measures such as the variance of the reward-to-go do not capture the risk in a satisfactory manner. In addition, we show how a risk measure can be derived from model’s realizations. We propose a neural architecture for estimating the risk and suggest the risk critic architecture that can be use to optimize a policy under general risk measures. We conclude our work with experiments that demonstrate the efficacy of our approach.

**Efficient Cross-Validation of Echo State Networks**

Echo State Networks (ESNs) are known for their fast and precise one-shot learning of time series. But they often need good hyper-parameter tuning for best performance. For this good validation is key, but usually, a single validation split is used. In this rather practical contribution we suggest several schemes for cross-validating ESNs and introduce an efficient algorithm for implementing them. The component that dominates the time complexity of the already quite fast ESN training remains constant (does not scale up with

) in our proposed method of doing

-fold cross-validation. The component that does scale linearly with

starts dominating only in some not very common situations. Thus in many situations

-fold cross-validation of ESNs can be done for virtually the same time complexity as a simple single split validation. Space complexity can also remain the same. We also discuss when the proposed validation schemes for ESNs could be beneficial and empirically investigate them on several different real-world datasets.

**Data Context Adaptation for Accurate Recommendation with Additional Information**

Given a sparse rating matrix and an auxiliary matrix of users or items, how can we accurately predict missing ratings considering different data contexts of entities? Many previous studies proved that utilizing the additional information with rating data is helpful to improve the performance. However, existing methods are limited in that 1) they ignore the fact that data contexts of rating and auxiliary matrices are different, 2) they have restricted capability of expressing independence information of users or items, and 3) they assume the relation between a user and an item is linear. We propose DaConA, a neural network based method for recommendation with a rating matrix and an auxiliary matrix. DaConA is designed with the following three main ideas. First, we propose a data context adaptation layer to extract pertinent features for different data contexts. Second, DaConA represents each entity with latent interaction vector and latent independence vector. Unlike previous methods, both of the two vectors are not limited in size. Lastly, while previous matrix factorization based methods predict missing values through the inner-product of latent vectors, DaConA learns a non-linear function of them via a neural network. We show that DaConA is a generalized algorithm including the standard matrix factorization and the collective matrix factorization as special cases. Through comprehensive experiments on real-world datasets, we show that DaConA provides the state-of-the-art accuracy.

**The many Shapley values for model explanation**

The Shapley value has become a popular method to attribute the prediction of a machine-learning model on an input to its base features. The Shapley value [1] is known to be the unique method that satisfies certain desirable properties, and this motivates its use. Unfortunately, despite this uniqueness result, there are a multiplicity of Shapley values used in explaining a model’s prediction. This is because there are many ways to apply the Shapley value that differ in how they reference the model, the training data, and the explanation context. In this paper, we study an approach that applies the Shapley value to conditional expectations (CES) of sets of features (cf. [2]) that subsumes several prior approaches within a common framework. We provide the first algorithm for the general version of CES. We show that CES can result in counterintuitive attributions in theory and in practice (we study a diabetes prediction task); for instance, CES can assign non-zero attributions to features that are not referenced by the model. In contrast, we show that an approach called the Baseline Shapley (BS) does not exhibit counterintuitive attributions; we support this claim with a uniqueness (axiomatic) result. We show that BS is a special case of CES, and CES with an independent feature distribution coincides with a randomized version of BS. Thus, BS fits into the CES framework, but does not suffer from many of CES’s deficiencies.

**Time series model selection with a meta-learning approach; evidence from a pool of forecasting algorithms**

One of the challenging questions in time series forecasting is how to find the best algorithm. In recent years, a recommender system scheme has been developed for time series analysis using a meta-learning approach. This system selects the best forecasting method with consideration of the time series characteristics. In this paper, we propose a novel approach to focusing on some of the unanswered questions resulting from the use of meta-learning in time series forecasting. Therefore, three main gaps in previous works are addressed including, analyzing various subsets of top forecasters as inputs for meta-learners; evaluating the effect of forecasting error measures; and assessing the role of the dimensionality of the feature space on the forecasting errors of meta-learners. All of these objectives are achieved with the help of a diverse state-of-the-art pool of forecasters and meta-learners. For this purpose, first, a pool of forecasting algorithms is implemented on the NN5 competition dataset and ranked based on the two error measures. Then, six machine-learning classifiers known as meta-learners, are trained on the extracted features of the time series in order to assign the most suitable forecasting method for the various subsets of the pool of forecasters. Furthermore, two-dimensionality reduction methods are implemented in order to investigate the role of feature space dimension on the performance of meta-learners. In general, it was found that meta-learners were able to defeat all of the individual benchmark forecasters; this performance was improved even after applying the feature selection method.

**Transfer Learning for Relation Extraction via Relation-Gated Adversarial Learning**

Relation extraction aims to extract relational facts from sentences. Previous models mainly rely on manually labeled datasets, seed instances or human-crafted patterns, and distant supervision. However, the human annotation is expensive, while human-crafted patterns suffer from semantic drift and distant supervision samples are usually noisy. Domain adaptation methods enable leveraging labeled data from a different but related domain. However, different domains usually have various textual relation descriptions and different label space (the source label space is usually a superset of the target label space). To solve these problems, we propose a novel model of relation-gated adversarial learning for relation extraction, which extends the adversarial based domain adaptation. Experimental results have shown that the proposed approach outperforms previous domain adaptation methods regarding partial domain adaptation and can improve the accuracy of distance supervised relation extraction through fine-tuning.

**Adversarial-Based Knowledge Distillation for Multi-Model Ensemble and Noisy Data Refinement**

Generic Image recognition is a fundamental and fairly important visual problem in computer vision. One of the major challenges of this task lies in the fact that single image usually has multiple objects inside while the labels are still one-hot, another one is noisy and sometimes missing labels when annotated by humans. In this paper, we focus on tackling these challenges accompanying with two different image recognition problems: multi-model ensemble and noisy data recognition with a unified framework. As is well-known, usually the best performing deep neural models are ensembles of multiple base-level networks, as it can mitigate the variation or noise containing in the dataset. Unfortunately, the space required to store these many networks, and the time required to execute them at runtime, prohibit their use in applications where test sets are large (e.g., ImageNet). In this paper, we present a method for compressing large, complex trained ensembles into a single network, where the knowledge from a variety of trained deep neural networks (DNNs) is distilled and transferred to a single DNN. In order to distill diverse knowledge from different trained (teacher) models, we propose to use adversarial-based learning strategy where we define a block-wise training loss to guide and optimize the predefined student network to recover the knowledge in teacher models, and to promote the discriminator network to distinguish teacher vs. student features simultaneously. Extensive experiments on CIFAR-10/100, SVHN, ImageNet and iMaterialist Challenge Dataset demonstrate the effectiveness of our MEAL method. On ImageNet, our ResNet-50 based MEAL achieves top-1/5 21.79%/5.99% val error, which outperforms the original model by 2.06%/1.14%. On iMaterialist Challenge Dataset, our MEAL obtains a remarkable improvement of top-3 1.15% (official evaluation metric) on a strong baseline model of ResNet-101.

**Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes**

Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible. We consider for the first time the semiparametric efficiency limits of OPE in Markov decision processes (MDPs), where actions, rewards, and states are memoryless. We show existing OPE estimators may fail to be efficient in this setting. We develop a new estimator based on cross-fold estimation of

-functions and marginalized density ratios, which we term double reinforcement learning (DRL). We show that DRL is efficient when both components are estimated at fourth-root rates and is also doubly robust when only one component is consistent. We investigate these properties empirically and demonstrate the performance benefits due to harnessing memorylessness efficiently.

Continue Reading…

Collapse

Read More