Skip to content
@PKU-Alignment

PKU-Alignment

Loves Sharing and Open-Source, Making AI Safer.

PKU-Alignment Team

Large language models (LLMs) have immense potential in the field of general intelligence but come with significant risks. As a research team at Peking University, we actively focus on alignment techniques for LLMs, such as safety alignment, to enhance the model's safety and reduce toxicity.

Welcome to follow our AI Safety project:

Pinned Loading

  1. omnisafe omnisafe Public

    JMLR: OmniSafe is an infrastructural framework for accelerating SafeRL research.

    Python 1.1k 146

  2. safety-gymnasium safety-gymnasium Public

    NeurIPS 2023: Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark

    Python 533 76

  3. safe-rlhf safe-rlhf Public

    Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

    Python 1.6k 129

  4. Safe-Policy-Optimization Safe-Policy-Optimization Public

    NeurIPS 2023: Safe Policy Optimization: A benchmark repository for safe reinforcement learning algorithms

    Python 392 58

Repositories

Showing 10 of 25 repositories
  • VLA-Arena Public

    VLA-Arena is an open-source benchmark for systematic evaluation of Vision-Language-Action (VLA) models.

    PKU-Alignment/VLA-Arena’s past year of commit activity
    Python 100 Apache-2.0 3 0 1 Updated Jan 12, 2026
  • SafeVLA Public

    [NeurIPS 2025 Spotlight] Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning.

    PKU-Alignment/SafeVLA’s past year of commit activity
    Python 105 8 0 0 Updated Jan 11, 2026
  • safety-gymnasium Public

    NeurIPS 2023: Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark

    PKU-Alignment/safety-gymnasium’s past year of commit activity
    Python 533 Apache-2.0 76 12 2 Updated Dec 4, 2025
  • align-anything Public

    Align Anything: Training All-modality Model with Feedback

    PKU-Alignment/align-anything’s past year of commit activity
    Python 4,617 Apache-2.0 508 29 2 Updated Nov 27, 2025
  • safe-rlhf Public

    Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

    PKU-Alignment/safe-rlhf’s past year of commit activity
    Python 1,574 Apache-2.0 129 16 2 Updated Nov 24, 2025
  • PKU-Alignment/MM-DeceptionBench’s past year of commit activity
    0 0 0 0 Updated Sep 25, 2025
  • eval-anything Public
    PKU-Alignment/eval-anything’s past year of commit activity
    Python 21 Apache-2.0 18 1 2 Updated Jul 26, 2025
  • llms-resist-alignment Public

    [ACL2025 Best Paper] Language Models Resist Alignment

    PKU-Alignment/llms-resist-alignment’s past year of commit activity
    Python 41 1 0 0 Updated Jun 11, 2025
  • SAE-V Public

    [ICML 2025 Poster] SAE-V: Interpreting Multimodal Models for Enhanced Alignment

    PKU-Alignment/SAE-V’s past year of commit activity
    12 0 0 0 Updated Jun 6, 2025
  • ProgressGym Public

    Alignment with a millennium of moral progress. Spotlight@NeurIPS 2024 Track on Datasets and Benchmarks.

    PKU-Alignment/ProgressGym’s past year of commit activity
    Python 24 MIT 4 0 0 Updated Mar 30, 2025