Reinforcement Learning Applied to an Electric Water Heater: From Theory to Practice

被引：124

作者：

Ruelens, F. ^{[1
]}

Claessens, B. J. ^{[2
]}

Quaiyum, S. ^{[3
]}

De Schutter, B. ^{[4
]}

Babuska, R. ^{[4
]}

Belmans, R. ^{[1
]}

机构：

[1] KU Leuven EnergyVille, Dept Elect Engn, B-3001 Heverlee, Belgium

[2] Vito EnergyVille, Dept Energy, B-2400 Mol, Belgium

[3] Uppsala Univ, Dept Elect Engn, S-75105 Uppsala, Sweden

[4] Delft Univ Technol, Delft Ctr Syst & Control, NL-2600 AA Delft, Netherlands

来源：

IEEE TRANSACTIONS ON SMART GRID | 2018年 / 9卷 / 04期

关键词：

Auto-encoder network; demand response; electric water heater; fitted Q-iteration; machine learning; reinforcement learning; MANAGEMENT; SYSTEM;

D O I：

10.1109/TSG.2016.2640184

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

080906 [电磁信息功能材料与结构]; 082806 [农业信息与电气工程];

摘要：

Electric water heaters have the ability to store energy in their water buffer without impacting the comfort of the end user. This feature makes them a prime candidate for residential demand response. However, the stochastic and nonlinear dynamics of electric water heaters, makes it challenging to harness their flexibility. Driven by this challenge, this paper formulates the underlying sequential decision-making problem as a Markov decision process and uses techniques from reinforcement learning. Specifically, we apply an auto-encoder network to find a compact feature representation of the sensor measurements, which helps to mitigate the curse of dimensionality. A wellknown batch reinforcement learning technique, fitted Q-iteration, is used to find a control policy, given this feature representation. In a simulation-based experiment using an electric water heater with 50 temperature sensors, the proposed method was able to achieve good policies much faster than when using the full state information. In a laboratory experiment, we apply fitted Q-iteration to an electric water heater with eight temperature sensors. Further reducing the state vector did not improve the results of fitted Q-iteration. The results of the laboratory experiment, spanning 40 days, indicate that compared to a thermostat controller, the presented approach was able to reduce the total cost of energy consumption of the electric water heater by 15%.

引用

页码：3792 / 3800

页数：9

共 48 条

[1]

Experience Replay for Real-Time Reinforcement Learning Control [J].

Adam, Sander ;

Busoniu, Lucian ;

Babuska, Robert .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (02) :201-212

[2]

[Anonymous], TECH REP

[3]

[Anonymous], P 12 EUR WORKSH REIN

[4]

[Anonymous], IEEE POW ENG SOC GEN

[5]

[Anonymous], TRANYS TRANSIENT SYS

[6]

[Anonymous], P POW SYST COMP C PS

[7]

[Anonymous], THESIS

[8]

[Anonymous], EN COST CALC EL GAS

[9]

[Anonymous], 1996, Neuro-dynamic programming

[10]

[Anonymous], P EUR WORKSH REINF L

← 1 2 3 4 5 →