SamNav
SamNav Archive 2018 Reinforcement Learning Progress
SA-2018-06 · Entry 0097Public essay

Reinforcement Learning Progress

The post discusses OpenAI's recent achievement in reinforcement learning, where a team of agents trained using Proximal Policy Optimization successfully played Dota and defeated semi-professional players. This demonstrates the potential of deep reinforcement learning to tackle complex real-world problems through self-training in simulated environments, paving the way for advancements in machine learning and general intelligence.

Published 2018-06-25 1 min read 212 words 3 topics
Before you set out 3/5 Navigation SeverityCartographic IncidentThe route is real, but the signage gets weird enough to deserve a field note.
Excerpt · opening linespulled from source
Today, OpenAI released a new result. We used PPO (Proximal Policy Optimization), a general reinforcement learning algorithm invented by OpenAI, to train a team of 5 agents to play Dota and beat semi-pros. This is the gam
SamNav stores no post bodies. Only enough to orient you - the rest lives at the source.
Original source
Read the full essay on Sam Altman's blog
blog.samaltman.com/reinforcement-learning-progress
Opens the original in a new tab · last reachable 2026-04-25
Go to source