+
+Tensorforce has been used within fluid mechanics to perform active flow control. Active flow control is known to be challenging due the combination of non linearity, high dimensionality, and time dependence implied by fluid mechanics, and therefore DRL is a promising new tool within this research field.
+
+This project performs flow control of the 2D Kármán Vortex Street with Deep Reinforcement Learning. The simulations are done with FEniCS, while the Reinforcement Learning is performed with the help of the library Tensorforce. You will need Fenics, Tensorflow, Tensorforce and Gmsh available on your system in order to be able to run the code.
+
+[Paper](https://arxiv.org/abs/1808.07664) | [GitHub Project](https://github.com/jerabaul29/Cylinder2DFlowControlDRL)
+
+
+
+
+### DeepCrawl
+
+
+DeepCrawl is a turn-based strategy game for mobile platforms, where all the enemies are trained with Deep Reinforcement Learning algorithms. The game is designed to be hard, yet fair: the player will have to explore the dungeons and defeat all the guardians of the rooms, paying attention to every moves the AI does!
+
+The game was developed in Unity, while the AI was built through Tensorforce and Unity ML-Agents.
+
+The project was part of a Master thesis in Computer Engineering at Università degli Studi di Firenze, with title *"DeepCrawl: Deep Reinforcement Learning for turn-based strategy games"*.
+
+[GitHub Project](https://github.com/SestoAle/DeepCrawl)
+
+
+
+
+### SimPyFab
+
+
+Complex job shop manufacturing systems are motivated by the manufacturing characteristics of the semiconductor wafer fabrication. A job shop consists of several machines (processing resources) that process jobs (products, orders) based on a defined list or process steps. After every process, the job is dispatched and transported to the next processing machine. Machines are usually grouped in sub-areas by the type processing type, i.e. similar processing capabilities are next to each other.
+
+This framework provides an integrated simulation and reinforcement learning model to investigate the potential of data-driven reinforcement learning in production planning and control of complex job shop systems. The simulation model allows parametrization of a broad range of job shop-like manufacturing systems. Furthermore, performance statistics and logging of performance indicators are provided. Reinforcement learning is implemented to control the order dispatching and several dispatchin heuristics provide benchmarks that are used in practice.
+
+[GitHub Project](https://github.com/AndreasKuhnle/SimRLFab)
+
+
+
+
+### Navbot: Using RGB Image as Visual Input for Mapless Robot Navigation
+
+
+A collection for mapless robot navigation using RGB image as visual input. It contains the test environment and motion planners, aiming at realizing all the three levels of mapless navigation:
+
+1. memorizing efficiently;
+2. from memorizing to reasoning;
+3. more powerful reasoning
+
+[GitHub Project](https://github.com/marooncn/navbot)
+
+
+
+
+### Adaptive Behavior Generation for Autonomous Driving
+
+
+Making the right decision in traffic is a challenging task that is highly dependent on individual preferences as well as the surrounding environment. Therefore it is hard to model solely based on expert knowledge. In this work we use Deep Reinforcement Learning to learn maneuver decisions based on a compact semantic state representation. This ensures a consistent model of the environment across scenarios as well as a behavior adaptation function, enabling on-line changes
+of desired behaviors without re-training. The input for the neural network is a simulated object list similar to that of Radar or Lidar sensors, superimposed by a relational semantic scene description. The state as well as the reward are extended by a behavior adaptation function and a parameterization respectively. With little expert knowledge and a set of mid-level actions, it can be seen that the agent is capable to adhere to traffic rules and learns to drive safely in a variety of situations
+
+[Paper](https://arxiv.org/abs/1809.03214)
+
+
+
+
+### Bitcoin trading bot
+
+
+This project is a Tensorforce-based Bitcoin trading bot (algo-trader). It uses deep reinforcement learning to automatically buy/sell/hold BTC based on what it learns about BTC price history. Most blogs / tutorials / boilerplate BTC trading-bots you'll find out there use supervised machine learning, likely an LTSM. That's well and good - supervised learning learns what makes a time-series tick so it can predict the next-step future. But that's where it stops. It says "the price will go up next", but it doesn't tell you what to do. Well that's simple, buy, right? Ah, buy low, sell high - it's not that simple. Thousands of lines of code go into trading rules, "if this then that" style. Reinforcement learning takes supervised to the next level - it embeds supervised within its architecture, and then decides what to do. It's beautiful stuff!
+
+This project goes with Episode 26+ of [Machine Learning Guide](http://ocdevel.com/mlg). Those episodes are tutorial for this project; including an intro to Deep RL, hyperparameter decisions, etc.
+
+[GitHub Project](https://github.com/lefnire/tforce_btc_trader)
+
+
+
+
+### TensorTrade: Trade Efficiently with Reinforcement Learning
+
+
+TensorTrade is an open source Python framework for building, training, evaluating, and deploying robust trading algorithms using reinforcement learning. The framework focuses on being highly composable and extensible, to allow the system to scale from simple trading strategies on a single CPU, to complex investment strategies run on a distribution of HPC machines.
+
+Under the hood, the framework uses many of the APIs from existing machine learning libraries to maintain high quality data pipelines and learning models. One of the main goals of TensorTrade is to enable fast experimentation with algorithmic trading strategies, by leveraging the existing tools and pipelines provided by numpy, pandas, gym, keras, and tensorflow.
+
+[GitHub Project](https://github.com/notadamking/tensortrade)
diff --git a/README.md b/README.md
index fe515b587..b258e4145 100644
--- a/README.md
+++ b/README.md
@@ -1,239 +1,237 @@
-TensorForce: A TensorFlow library for applied reinforcement learning
-====================================================================
+# Tensorforce: a TensorFlow library for applied reinforcement learning
[](http://tensorforce.readthedocs.io/en/latest/)
-[](https://docs.google.com/forms/d/1_UD5Pb5LaPVUviD0pO0fFcEnx_vwenvuc00jmP2rRIc/)
-[](https://travis-ci.org/reinforceio/tensorforce)
-[](https://github.com/reinforceio/tensorforce/blob/master/LICENSE)
-
-Introduction
-------------
-
-TensorForce is an open source reinforcement learning library focused on
-providing clear APIs, readability and modularisation to deploy
-reinforcement learning solutions both in research and practice.
-TensorForce is built on top of TensorFlow and compatible with Python 2.7
-and >3.5 and supports multiple state inputs and multi-dimensional
-actions to be compatible with any type of simulation or application environment.
-
-TensorForce also aims to move all reinforcement learning logic into the
-TensorFlow graph, including control flow. This both reduces dependencies
-on the host language (Python), thus enabling portable computation graphs that
-can be used in other languages and contexts, and improves performance.
-
-More information on architecture can also be found [on our blog](https://reinforce.io/blog/).
-Please also read the [TensorForce FAQ](https://github.com/reinforceio/tensorforce/blob/master/FAQ.md)
-if you encounter problems or have questions.
-
-Finally, read the latest update notes (UPDATE_NOTES.md) for an idea of
-how the project is evolving, especially concerning majorAPI breaking updates.
-We recently (20th February) merged a major branch which moves memories
-and all remaining structures into TensorFlow variables. This causes a number
-of breaking API change (see updated configurations, examples, and tests), but
-improves performance. It further enables more expressive update semantics,
-e.g. episode based instead of fixed time step based.
-
-The main difference to existing libraries is a strict separation of
-environments, agents and update logic that facilitates usage in
-non-simulation environments. Further, research code often relies on
-fixed network architectures that have been used to tackle particular
-benchmarks. TensorForce is built with the idea that (almost) everything
-should be optionally configurable and in particular uses value function
-template configurations to be able to quickly experiment with new
-models. The goal of TensorForce is to provide a practitioner's
-reinforcement learning framework that integrates into modern software
-service architectures.
-
-TensorForce is actively being maintained and developed both to
-continuously improve the existing code as well as to reflect new
-developments as they arise. The aim is not to
-include every new trick but to adopt methods as
-they prove themselves stable.
-
-Features
---------
-
-TensorForce currently integrates with the OpenAI Gym API, OpenAI
-Universe, DeepMind lab, ALE and Maze explorer. The following algorithms are available (all
-policy methods both continuous/discrete and using a Beta distribution for bounded actions).
-
-- A3C using distributed TensorFlow or a multithreaded runner - now as part of our generic Model
- usable with different agents. - [paper](https://arxiv.org/pdf/1602.01783.pdf)
-- Trust Region Policy Optimization (TRPO) - ```trpo_agent``` - [paper](https://arxiv.org/abs/1502.05477)
-- Normalised Advantage functions (NAFs) - ```naf_agent``` - [paper](https://arxiv.org/pdf/1603.00748.pdf)
-- DQN - ```dqn_agent``` - [paper](https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf)
-- Double-DQN - ```ddqn_agent``` - [paper](https://arxiv.org/abs/1509.06461)
-- N-step DQN - ```dqn_nstep_agent```
-- Vanilla Policy Gradients (VPG/ REINFORCE) - ```vpg_agent```- [paper](http://www-anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf)
-- Actor-critic models - via `baseline` for any policy gradient model (see next list) - [paper]()
-- Deep Q-learning from Demonstration (DQFD) -
- [paper](https://arxiv.org/abs/1704.03732)
-- Proximal Policy Optimisation (PPO) - ```ppo_agent``` - [paper](https://arxiv.org/abs/1707.06347)
-- Random and constant agents for sanity checking: ```random_agent```, ```constant_agent```
-
-Other heuristics and their respective config key that can be turned on where sensible:
-
-- Generalized advantage estimation - ```gae_lambda``` - [paper](https://arxiv.org/abs/1506.02438)
-- Prioritizied experience replay - memory type ```prioritized_replay``` - [paper](https://arxiv.org/abs/1511.05952)
-- Bounded continuous actions are mapped to Beta distributions instead of Gaussians - [paper](http://proceedings.mlr.press/v70/chou17a/chou17a.pdf)
-- Baseline / actor-critic modes: Based on raw states (```states```) or on network output (```network```). MLP (```mlp```), CNN (```cnn```) or custom network (```custom```). Special case for mode ```states```: baseline per state + linear combination layer (via ```baseline=dict(state1=..., state2=..., etc)```).
-- Generic pure TensorFlow optimizers, most models can be used with natural gradient and evolutionary optimizers
-- Preprocessing modes: ```normalize```, ```standardize```, ```grayscale```, ```sequence```, ```clip```,
- ```divide```, ```image_resize```
-- Exploration modes: ```constant```,```linear_decay```, ```epsilon_anneal```, ```epsilon_decay```,
- ```ornstein_uhlenbeck```
-
-Installation
-------------
-
-We uploaded the latest stable version of TensorForce to PyPI. To install, just execute:
+[](https://gitter.im/tensorforce/community)
+[](https://travis-ci.com/tensorforce/tensorforce)
+[](https://pypi.org/project/Tensorforce/)
+[](https://pypi.org/project/Tensorforce/)
+[](https://github.com/tensorforce/tensorforce/blob/master/LICENSE)
+[](https://github.com/sponsors/AlexKuhnle)
+[](https://liberapay.com/TensorforceTeam/donate)
+
+
+**This project is not maintained any longer!**
+
+
+#### Introduction
+
+Tensorforce is an open-source deep reinforcement learning framework, with an emphasis on modularized flexible library design and straightforward usability for applications in research and practice. Tensorforce is built on top of [Google's TensorFlow framework](https://www.tensorflow.org/) and requires Python 3.
+
+Tensorforce follows a set of high-level design choices which differentiate it from other similar libraries:
+
+- **Modular component-based design**: Feature implementations, above all, strive to be as generally applicable and configurable as possible, potentially at some cost of faithfully resembling details of the introducing paper.
+- **Separation of RL algorithm and application**: Algorithms are agnostic to the type and structure of inputs (states/observations) and outputs (actions/decisions), as well as the interaction with the application environment.
+- **Full-on TensorFlow models**: The entire reinforcement learning logic, including control flow, is implemented in TensorFlow, to enable portable computation graphs independent of application programming language, and to facilitate the deployment of models.
+
+
+
+#### Quicklinks
+
+- [Documentation](http://tensorforce.readthedocs.io) and [update notes](https://github.com/tensorforce/tensorforce/blob/master/UPDATE_NOTES.md)
+- [Contact](mailto:tensorforce.team@gmail.com) and [Gitter channel](https://gitter.im/tensorforce/community)
+- [Benchmarks](https://github.com/tensorforce/tensorforce/blob/master/benchmarks) and [projects using Tensorforce](https://github.com/tensorforce/tensorforce/blob/master/PROJECTS.md)
+- [Roadmap](https://github.com/tensorforce/tensorforce/blob/master/ROADMAP.md) and [contribution guidelines](https://github.com/tensorforce/tensorforce/blob/master/CONTRIBUTING.md)
+- [GitHub Sponsors](https://github.com/sponsors/AlexKuhnle) and [Liberapay](https://liberapay.com/TensorforceTeam/donate)
+
+
+
+#### Table of content
+
+- [Installation](#installation)
+- [Quickstart example code](#quickstart-example-code)
+- [Command line usage](#command-line-usage)
+- [Features](#features)
+- [Environment adapters](#environment-adapters)
+- [Support, feedback and donating](#support-feedback-and-donating)
+- [Core team and contributors](#core-team-and-contributors)
+- [Cite Tensorforce](#cite-tensorforce)
+
+
+
+## Installation
+
+A stable version of Tensorforce is periodically updated on PyPI and installed as follows:
```bash
-pip install tensorforce
+pip3 install tensorforce
```
-If you want to use the latest version from GitHub, use:
-
+To always use the latest version of Tensorforce, install the GitHub version instead:
```bash
-git clone git@github.com:reinforceio/tensorforce.git
-cd tensorforce
-pip install -e .
+git clone https://github.com/tensorforce/tensorforce.git
+pip3 install -e tensorforce
```
-TensorForce is built on [Google's Tensorflow](https://www.tensorflow.org/). The installation command assumes
-that you have `tensorflow` or `tensorflow-gpu` installed.
+**Note on installation on M1 Macs:** At the moment Tensorflow, which is a core dependency of Tensorforce, cannot be installed on M1 Macs directly. Follow the ["M1 Macs" section](https://tensorforce.readthedocs.io/en/latest/basics/installation.html) in the documentation for a workaround.
-Alternatively, you can use the following commands to install the tensorflow dependency.
+Environments require additional packages for which there are setup options available (`ale`, `gym`, `retro`, `vizdoom`, `carla`; or `envs` for all environments), however, some require additional tools to be installed separately (see [environments documentation](http://tensorforce.readthedocs.io)). Other setup options include `tfa` for [TensorFlow Addons](https://www.tensorflow.org/addons) and `tune` for [HpBandSter](https://github.com/automl/HpBandSter) required for the `tune.py` script.
-To install TensorForce with `tensorflow` (cpu), use:
+**Note on GPU usage:** Different from (un)supervised deep learning, RL does not always benefit from running on a GPU, depending on environment and agent configuration. In particular for environments with low-dimensional state spaces (i.e., no images), it is hence worth trying to run on CPU only.
-```bash
-# PyPI install
-pip install tensorforce[tf]
-# Local install
-pip install -e .[tf]
-```
-To install TensorForce with `tensorflow-gpu` (gpu), use:
+## Quickstart example code
-```bash
-# PyPI install
-pip install tensorforce[tf_gpu]
+```python
+from tensorforce import Agent, Environment
+
+# Pre-defined or custom environment
+environment = Environment.create(
+ environment='gym', level='CartPole', max_episode_timesteps=500
+)
-# Local install
-pip install -e .[tf_gpu]
+# Instantiate a Tensorforce agent
+agent = Agent.create(
+ agent='tensorforce',
+ environment=environment, # alternatively: states, actions, (max_episode_timesteps)
+ memory=10000,
+ update=dict(unit='timesteps', batch_size=64),
+ optimizer=dict(type='adam', learning_rate=3e-4),
+ policy=dict(network='auto'),
+ objective='policy_gradient',
+ reward_estimation=dict(horizon=20)
+)
+
+# Train for 300 episodes
+for _ in range(300):
+
+ # Initialize episode
+ states = environment.reset()
+ terminal = False
+
+ while not terminal:
+ # Episode timestep
+ actions = agent.act(states=states)
+ states, terminal, reward = environment.execute(actions=actions)
+ agent.observe(terminal=terminal, reward=reward)
+
+agent.close()
+environment.close()
```
-To update TensorForce, use `pip install --upgrade tensorforce` for the PyPI
-version, or run `git pull` in the tensorforce directory if you cloned the
-GitHub repository.
-Please note that we did not include OpenAI Gym/Universe/DeepMind lab in
-the default install script because not everyone will want to use these.
-Please install them as required, usually via pip.
-Examples and documentation
---------------------------
-For a quick start, you can run one of our example scripts using the
-provided configurations, e.g. to run the TRPO agent on CartPole, execute
-from the examples folder:
+## Command line usage
+
+Tensorforce comes with a range of [example configurations](https://github.com/tensorforce/tensorforce/tree/master/benchmarks/configs) for different popular reinforcement learning environments. For instance, to run Tensorforce's implementation of the popular [Proximal Policy Optimization (PPO) algorithm](https://arxiv.org/abs/1707.06347) on the [OpenAI Gym CartPole environment](https://gym.openai.com/envs/CartPole-v1/), execute the following line:
```bash
-python examples/openai_gym.py CartPole-v0 -a examples/configs/ppo.json -n examples/configs/mlp2_network.json
+python3 run.py --agent benchmarks/configs/ppo.json --environment gym \
+ --level CartPole-v1 --episodes 100
```
-Documentation is available at
-[ReadTheDocs](http://tensorforce.readthedocs.io). We also have tests
-validating models on minimal environments which can be run from the main
-directory by executing `pytest`{.sourceCode}.
+For more information check out the [documentation](http://tensorforce.readthedocs.io).
-Create and use agents
----------------------
-To use TensorForce as a library without using the pre-defined simulation
-runners, simply install and import the library, then create an agent and
-use it as seen below (see documentation for all optional parameters):
-```python
-from tensorforce.agents import PPOAgent
-
-# Create a Proximal Policy Optimization agent
-agent = PPOAgent(
- states_spec=dict(type='float', shape=(10,)),
- actions_spec=dict(type='int', num_actions=10),
- network_spec=[
- dict(type='dense', size=64),
- dict(type='dense', size=64)
- ],
- batch_size=1000,
- step_optimizer=dict(
- type='adam',
- learning_rate=1e-4
- )
-)
+## Features
-# Get new data from somewhere, e.g. a client to a web app
-client = MyClient('http://127.0.0.1', 8080)
+- **Network layers**: Fully-connected, 1- and 2-dimensional convolutions, embeddings, pooling, RNNs, dropout, normalization, and more; *plus* support of Keras layers.
+- **Network architecture**: Support for multi-state inputs and layer (block) reuse, simple definition of directed acyclic graph structures via register/retrieve layer, plus support for arbitrary architectures.
+- **Memory types**: Simple batch buffer memory, random replay memory.
+- **Policy distributions**: Bernoulli distribution for boolean actions, categorical distribution for (finite) integer actions, Gaussian distribution for continuous actions, Beta distribution for range-constrained continuous actions, multi-action support.
+- **Reward estimation**: Configuration options for estimation horizon, future reward discount, state/state-action/advantage estimation, and for whether to consider terminal and horizon states.
+- **Training objectives**: (Deterministic) policy gradient, state-(action-)value approximation.
+- **Optimization algorithms**: Various gradient-based optimizers provided by TensorFlow like Adam/AdaDelta/RMSProp/etc, evolutionary optimizer, natural-gradient-based optimizer, plus a range of meta-optimizers.
+- **Exploration**: Randomized actions, sampling temperature, variable noise.
+- **Preprocessing**: Clipping, deltafier, sequence, image processing.
+- **Regularization**: L2 and entropy regularization.
+- **Execution modes**: Parallelized execution of multiple environments based on Python's `multiprocessing` and `socket`.
+- **Optimized act-only SavedModel extraction**.
+- **TensorBoard support**.
-# Poll new state from client
-state = client.get_state()
+By combining these modular components in different ways, a variety of popular deep reinforcement learning models/features can be replicated:
-# Get prediction from agent, execute
-action = agent.act(state)
-reward = client.execute(action)
+- Q-learning: [Deep Q-learning](https://www.nature.com/articles/nature14236), [Double-DQN](https://arxiv.org/abs/1509.06461), [Dueling DQN](https://arxiv.org/abs/1511.06581), [n-step DQN](https://arxiv.org/abs/1602.01783), [Normalised Advantage Function (NAF)](https://arxiv.org/abs/1603.00748)
+- Policy gradient: [vanilla policy-gradient / REINFORCE](http://www-anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf), [Actor-critic and A3C](https://arxiv.org/abs/1602.01783), [Proximal Policy Optimization](https://arxiv.org/abs/1707.06347), [Trust Region Policy Optimization](https://arxiv.org/abs/1502.05477), [Deterministic Policy Gradient](https://arxiv.org/abs/1509.02971)
-# Add experience, agent automatically updates model according to batch size
-agent.observe(reward=reward, terminal=False)
-```
+Note that in general the replication is not 100% faithful, since the models as described in the corresponding paper often involve additional minor tweaks and modifications which are hard to support with a modular design (and, arguably, also questionable whether it is important/desirable to support them). On the upside, these models are just a few examples from the multitude of module combinations supported by Tensorforce.
+
+
+
+## Environment adapters
+
+- [Arcade Learning Environment](https://github.com/mgbellemare/Arcade-Learning-Environment), a simple object-oriented framework that allows researchers and hobbyists to develop AI agents for Atari 2600 games.
+- [CARLA](https://github.com/carla-simulator/carla), is an open-source simulator for autonomous driving research.
+- [OpenAI Gym](https://gym.openai.com/), a toolkit for developing and comparing reinforcement learning algorithms which supports teaching agents everything from walking to playing games like Pong or Pinball.
+- [OpenAI Retro](https://github.com/openai/retro), lets you turn classic video games into Gym environments for reinforcement learning and comes with integrations for ~1000 games.
+- [OpenSim](http://osim-rl.stanford.edu/), reinforcement learning with musculoskeletal models.
+- [PyGame Learning Environment](https://github.com/ntasfi/PyGame-Learning-Environment/), learning environment which allows a quick start to Reinforcement Learning in Python.
+- [ViZDoom](https://github.com/mwydmuch/ViZDoom), allows developing AI bots that play Doom using only the visual information.
-Benchmarks
-----------
-We provide a seperate repository for benchmarking our algorithm implementations at
-[reinforceio/tensorforce-benchmark](https://github.com/reinforceio/tensorforce-benchmark).
+## Support, feedback and donating
-Docker containers for benchmarking (CPU and GPU) are available.
+Please get in touch via [mail](mailto:tensorforce.team@gmail.com) or on [Gitter](https://gitter.im/tensorforce/community) if you have questions, feedback, ideas for features/collaboration, or if you seek support for applying Tensorforce to your problem.
-This is a sample output for `CartPole-v0`, comparing VPG, TRPO and PPO:
+If you want to support the Tensorforce core team (see below), please also consider donating: [GitHub Sponsors](https://github.com/sponsors/AlexKuhnle) or [Liberapay](https://liberapay.com/TensorforceTeam/donate).
-
-Please refer to the [tensorforce-benchmark](https://github.com/reinforceio/tensorforce-benchmark) repository
-for more information.
+## Core team and contributors
-Community and contributions
----------------------------
+Tensorforce is currently developed and maintained by [Alexander Kuhnle](https://github.com/AlexKuhnle).
-TensorForce is developed by [reinforce.io](https://reinforce.io), a new
-project focused on providing reinforcement learning software
-infrastructure. For any questions, get in touch at
-``.""" - return '\n' + text + '\n' - - def table(self, header, body): - """Rendering table element. Wrap header and body in it. - - :param header: header part of the table. - :param body: body part of the table. - """ - table = '\n.. list-table::\n' - if header and not header.isspace(): - table = (table + self.indent + ':header-rows: 1\n\n' + - self._indent_block(header) + '\n') - else: - table = table + '\n' - table = table + self._indent_block(body) + '\n\n' - return table - - def table_row(self, content): - """Rendering a table row. Like ``
'
- '{}'
- ''.format(text.replace('`', '`')))
-
- def linebreak(self):
- """Rendering line break like ``%s\n\n' % code
- code = escape(code, quote=True, smart_amp=False)
- return '%s\n\n' % (lang, code)
-
- def block_quote(self, text):
- """Rendering with the given text. - - :param text: text content of the blockquote. - """ - return '%s\n\n' % text.rstrip('\n') - - def block_html(self, html): - """Rendering block level pure html content. - - :param html: text content of the html snippet. - """ - if self.options.get('skip_style') and \ - html.lower().startswith('