MR
Mayur Rathi
@github
⭐ 34.1k GitHub stars

Agent-Safety

Agent-Safety是一款code方向的AI技能,核心价值是Guidelines for building safe, governed AI agent systems,可用于解决开发者在code领域的实际问题,帮助用户提升效率、自动化重复任务或优化工作流。

Guidelines for building safe, governed AI agent systems. Apply when writing code that uses agent frameworks, tool-calling LLMs, or multi-agent orchestration to ensure proper safety boundaries, policy

Last verified on: 2026-05-30
mkdir -p ./skills/agent-safety && curl -sfL https://raw.githubusercontent.com/github/awesome-copilot/main/skills/agent-safety/SKILL.md -o ./skills/agent-safety/SKILL.md

Run in terminal / PowerShell. Requires curl (Unix) or PowerShell 5+ (Windows).

Skill Content

# Agent Safety & Governance


Core Principles


- **Fail closed**: If a governance check errors or is ambiguous, deny the action rather than allowing it

- **Policy as configuration**: Define governance rules in YAML/JSON files, not hardcoded in application logic

- **Least privilege**: Agents should have the minimum tool access needed for their task

- **Append-only audit**: Never modify or delete audit trail entries — immutability enables compliance


Tool Access Controls


- Always define an explicit allowlist of tools an agent can use — never give unrestricted tool access

- Separate tool registration from tool authorization — the framework knows what tools exist, the policy controls which are allowed

- Use blocklists for known-dangerous operations (shell execution, file deletion, database DDL)

- Require human-in-the-loop approval for high-impact tools (send email, deploy, delete records)

- Enforce rate limits on tool calls per request to prevent infinite loops and resource exhaustion


Content Safety


- Scan all user inputs for threat signals before passing to the agent (data exfiltration, prompt injection, privilege escalation)

- Filter agent arguments for sensitive patterns: API keys, credentials, PII, SQL injection

- Use regex pattern lists that can be updated without code changes

- Check both the user's original prompt AND the agent's generated tool arguments


Multi-Agent Safety


- Each agent in a multi-agent system should have its own governance policy

- When agents delegate to other agents, apply the most restrictive policy from either

- Track trust scores for agent delegates — degrade trust on failures, require ongoing good behavior

- Never allow an inner agent to have broader permissions than the outer agent that called it


Audit & Observability


- Log every tool call with: timestamp, agent ID, tool name, allow/deny decision, policy name

- Log every governance violation with the matched rule and evidence

- Export audit trails in JSON Lines format for integration with log aggregation systems

- Include session boundaries (start/end) in audit logs for correlation


Code Patterns


When writing agent tool functions:

python
# Good: Governed tool with explicit policy
@govern(policy)
async def search(query: str) -> str:
    ...

# Bad: Unprotected tool with no governance
async def search(query: str) -> str:
    ...

When defining policies:

yaml
# Good: Explicit allowlist, content filters, rate limit
name: my-agent
allowed_tools: [search, summarize]
blocked_patterns: ["(?i)(api_key|password)\\s*[:=]"]
max_calls_per_request: 25

# Bad: No restrictions
name: my-agent
allowed_tools: ["*"]

When composing multi-agent policies:

python
# Good: Most-restrictive-wins composition
final_policy = compose_policies(org_policy, team_policy, agent_policy)

# Bad: Only using agent-level policy, ignoring org constraints
final_policy = agent_policy

Framework-Specific Notes


- **PydanticAI**: Use `@agent.tool` with a governance decorator wrapper. PydanticAI's upcoming Traits feature is designed for this pattern.

- **CrewAI**: Apply governance at the Crew level to cover all agents. Use `before_kickoff` callbacks for policy validation.

- **OpenAI Agents SDK**: Wrap `@function_tool` with governance. Use handoff guards for multi-agent trust.

- **LangChain/LangGraph**: Use `RunnableBinding` or tool wrappers for governance. Apply at the graph edge level for flow control.

- **AutoGen**: Implement governance in the `ConversableAgent.register_for_execution` hook.


Common Mistakes


- Relying only on output guardrails (post-generation) instead of pre-execution governance

- Hardcoding policy rules instead of loading from configuration

- Allowing agents to self-modify their own governance policies

- Forgetting to governance-check tool *arguments*, not just tool *names*

- Not decaying trust scores over time — stale trust is dangerous

- Logging prompts in audit trails — log decisions and metadata, not user content

🎯 Best For

  • UI designers
  • Product designers
  • Claude users
  • GitHub Copilot users
  • Software engineers

💡 Use Cases

  • Generating component mockups
  • Creating design system tokens
  • Code quality improvement
  • Best practice enforcement

📖 How to Use This Skill

  1. 1

    Install the Skill

    Copy the install command from the Terminal tab and run it. The SKILL.md file downloads to your local skills directory.

  2. 2

    Load into Your AI Assistant

    Open Claude or GitHub Copilot and reference the skill. Paste the SKILL.md content or use the system prompt tab.

  3. 3

    Apply Agent-Safety to Your Work

    Open your project in the AI assistant and ask it to apply the skill. Start with a small module to verify the output quality.

  4. 4

    Review and Refine

    Review AI suggestions before committing. Run tests, check for regressions, and iterate on the skill output.

❓ Frequently Asked Questions

Does this work with Figma?

Some design skills integrate with Figma plugins. Check the Works With section for supported tools.

Is Agent-Safety compatible with Cursor and VS Code?

Yes — this skill works with any AI coding assistant including Cursor, VS Code with Copilot, and JetBrains IDEs.

Do I need specific dependencies for Agent-Safety?

Check the install command and Works With section. Most code skills only require the AI assistant and your codebase.

How do I install Agent-Safety?

Copy the install command from the Terminal tab and run it. The skill downloads to ./skills/agent-safety/SKILL.md, ready to use.

Can I customize this skill for my team?

Absolutely. Edit the SKILL.md file to add team-specific instructions, examples, or workflows.

⚠️ Common Mistakes to Avoid

Skipping usability testing

AI-generated designs should be validated with real users before development.

Skipping validation

Always test AI-generated code changes, even for simple refactors.

Missing dependency updates

Check if the skill requires updated dependencies or new packages.

🔗 Related Skills