In this paper agent-based simulation is employed to study the power market operation under two alternative pricing systems: uniform and discriminatory (pay-as-bid). Power suppliers are modeled as adaptive agents capable of learning through the interaction with their environment, following a Reinforcement Learning algorithm. The SA-Q-Learning algorithm, a slightly changed version of the popular Q-Learning, is used in this paper; it proposes a solution to the difficult problem of the balance between exploration and exploitation and it has been chosen for its quick convergence. A test system with five supplier-agents is used to study the suppliers' behavior under the uniform and the pay-as-bid pricing systems.