An Online Learning Algorithm for Demand Response in Smart Grid

被引：120

作者：

Bahraini, Shahab ^{[1
]}

Wong, Vincent W. S. ^{[1
]}

Huang, Jianwei ^{[2
]}

机构：

[1] Univ British Columbia, Dept Elect & Comp Engn, Vancouver, BC V6T 1Z4, Canada

[2] Chinese Univ Hong Kong, Dept Informat Engn, Hong Kong, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON SMART GRID | 2018年 / 9卷 / 05期

基金：

加拿大自然科学与工程研究理事会;

关键词：

Demand response; real-time pricing; partially observable stochastic game; online learning; actor-critic method; MANAGEMENT; OPTIMIZATION; USERS;

D O I：

10.1109/TSG.2017.2667599

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

080906 [电磁信息功能材料与结构]; 082806 [农业信息与电气工程];

摘要：

Demand response program with real-time pricing can encourage electricity users toward scheduling their energy usage to off-peak hours. A user needs to schedule the energy usage of his appliances in an online manner since he may not know the energy prices and the demand of his appliances ahead of time. In this paper, we study the users' long-term load scheduling problem and model the changes of the price information and load demand as a Markov decision process, which enables us to capture the interactions among users as a partially observable stochastic game. To make the problem tractable, we approximate the users' optimal scheduling policy by the Markov perfect equilibrium (MPE) of a fully observable stochastic game with incomplete information. We develop an online load scheduling learning (LSL) algorithm based on the actor-critic method to determine the users' MPE policy. When compared with the benchmark of not performing demand response, simulation results show that the LSL algorithm can reduce the expected cost of users and the peak-to-average ratio in the aggregate load by 28% and 13%, respectively. When compared with the shortterm scheduling policies, the users with the long-term policies can reduce their expected cost by 17%.

引用

页码：4712 / 4725

页数：14

共 37 条

[1]

[Anonymous], 2014, TECH REP

[2]

[Anonymous], 2013, AAMAS

[3]

[Anonymous], 2004, P 3 INT JOINT C AUT

[4]

Bahrami S, 2015, INT CONF SMART GRID, P205, DOI 10.1109/SmartGridComm.2015.7436301

[5]

NEURONLIKE ADAPTIVE ELEMENTS THAT CAN SOLVE DIFFICULT LEARNING CONTROL-PROBLEMS [J].

BARTO, AG ;

SUTTON, RS ;

ANDERSON, CW .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1983, 13 (05) :834-846

[6]

Natural actor-critic algorithms [J].

Bhatnagar, Shalabh ;

Sutton, Richard S. ;

Ghavamzadeh, Mohammad ;

Lee, Mark .

AUTOMATICA, 2009, 45 (11) :2471-2482

[7]

A User's Guide to Solving Dynamic Stochastic Games Using the Homotopy Method [J].

Borkovsky, Ron N. ;

Doraszelski, Ulrich ;

Kryukov, Yaroslav .

OPERATIONS RESEARCH, 2010, 58 (04) :1116-1132

[8]

Bowling M, 2001, P 17 INT JOINT C ART, P1021

[9]

Bunn DerekW., 2004, Modelling prices in competitive electricity markets

[10]

Busoniu L, 2010, AUTOM CONTROL ENG SE, P1, DOI 10.1201/9781439821091-f

← 1 2 3 4 →