Agent-Safety
Agent-Safety是一款code方向的AI技能,核心价值是Guidelines for building safe, governed AI agent systems,可用于解决开发者在code领域的实际问题,帮助用户提升效率、自动化重复任务或优化工作流。
Guidelines for building safe, governed AI agent systems. Apply when writing code that uses agent frameworks, tool-calling LLMs, or multi-agent orchestration to ensure proper safety boundaries, policy
mkdir -p ./skills/agent-safety && curl -sfL https://raw.githubusercontent.com/github/awesome-copilot/main/skills/agent-safety/SKILL.md -o ./skills/agent-safety/SKILL.md Run in terminal / PowerShell. Requires curl (Unix) or PowerShell 5+ (Windows).
Skill Content
# Agent Safety & Governance
Core Principles
- **Fail closed**: If a governance check errors or is ambiguous, deny the action rather than allowing it
- **Policy as configuration**: Define governance rules in YAML/JSON files, not hardcoded in application logic
- **Least privilege**: Agents should have the minimum tool access needed for their task
- **Append-only audit**: Never modify or delete audit trail entries — immutability enables compliance
Tool Access Controls
- Always define an explicit allowlist of tools an agent can use — never give unrestricted tool access
- Separate tool registration from tool authorization — the framework knows what tools exist, the policy controls which are allowed
- Use blocklists for known-dangerous operations (shell execution, file deletion, database DDL)
- Require human-in-the-loop approval for high-impact tools (send email, deploy, delete records)
- Enforce rate limits on tool calls per request to prevent infinite loops and resource exhaustion
Content Safety
- Scan all user inputs for threat signals before passing to the agent (data exfiltration, prompt injection, privilege escalation)
- Filter agent arguments for sensitive patterns: API keys, credentials, PII, SQL injection
- Use regex pattern lists that can be updated without code changes
- Check both the user's original prompt AND the agent's generated tool arguments
Multi-Agent Safety
- Each agent in a multi-agent system should have its own governance policy
- When agents delegate to other agents, apply the most restrictive policy from either
- Track trust scores for agent delegates — degrade trust on failures, require ongoing good behavior
- Never allow an inner agent to have broader permissions than the outer agent that called it
Audit & Observability
- Log every tool call with: timestamp, agent ID, tool name, allow/deny decision, policy name
- Log every governance violation with the matched rule and evidence
- Export audit trails in JSON Lines format for integration with log aggregation systems
- Include session boundaries (start/end) in audit logs for correlation
Code Patterns
When writing agent tool functions:
# Good: Governed tool with explicit policy
@govern(policy)
async def search(query: str) -> str:
...
# Bad: Unprotected tool with no governance
async def search(query: str) -> str:
...When defining policies:
# Good: Explicit allowlist, content filters, rate limit
name: my-agent
allowed_tools: [search, summarize]
blocked_patterns: ["(?i)(api_key|password)\\s*[:=]"]
max_calls_per_request: 25
# Bad: No restrictions
name: my-agent
allowed_tools: ["*"]When composing multi-agent policies:
# Good: Most-restrictive-wins composition
final_policy = compose_policies(org_policy, team_policy, agent_policy)
# Bad: Only using agent-level policy, ignoring org constraints
final_policy = agent_policyFramework-Specific Notes
- **PydanticAI**: Use `@agent.tool` with a governance decorator wrapper. PydanticAI's upcoming Traits feature is designed for this pattern.
- **CrewAI**: Apply governance at the Crew level to cover all agents. Use `before_kickoff` callbacks for policy validation.
- **OpenAI Agents SDK**: Wrap `@function_tool` with governance. Use handoff guards for multi-agent trust.
- **LangChain/LangGraph**: Use `RunnableBinding` or tool wrappers for governance. Apply at the graph edge level for flow control.
- **AutoGen**: Implement governance in the `ConversableAgent.register_for_execution` hook.
Common Mistakes
- Relying only on output guardrails (post-generation) instead of pre-execution governance
- Hardcoding policy rules instead of loading from configuration
- Allowing agents to self-modify their own governance policies
- Forgetting to governance-check tool *arguments*, not just tool *names*
- Not decaying trust scores over time — stale trust is dangerous
- Logging prompts in audit trails — log decisions and metadata, not user content
🎯 Best For
- UI designers
- Product designers
- Claude users
- GitHub Copilot users
- Software engineers
💡 Use Cases
- Generating component mockups
- Creating design system tokens
- Code quality improvement
- Best practice enforcement
📖 How to Use This Skill
- 1
Install the Skill
Copy the install command from the Terminal tab and run it. The SKILL.md file downloads to your local skills directory.
- 2
Load into Your AI Assistant
Open Claude or GitHub Copilot and reference the skill. Paste the SKILL.md content or use the system prompt tab.
- 3
Apply Agent-Safety to Your Work
Open your project in the AI assistant and ask it to apply the skill. Start with a small module to verify the output quality.
- 4
Review and Refine
Review AI suggestions before committing. Run tests, check for regressions, and iterate on the skill output.
❓ Frequently Asked Questions
Does this work with Figma?
Some design skills integrate with Figma plugins. Check the Works With section for supported tools.
Is Agent-Safety compatible with Cursor and VS Code?
Yes — this skill works with any AI coding assistant including Cursor, VS Code with Copilot, and JetBrains IDEs.
Do I need specific dependencies for Agent-Safety?
Check the install command and Works With section. Most code skills only require the AI assistant and your codebase.
How do I install Agent-Safety?
Copy the install command from the Terminal tab and run it. The skill downloads to ./skills/agent-safety/SKILL.md, ready to use.
Can I customize this skill for my team?
Absolutely. Edit the SKILL.md file to add team-specific instructions, examples, or workflows.
⚠️ Common Mistakes to Avoid
Skipping usability testing
AI-generated designs should be validated with real users before development.
Skipping validation
Always test AI-generated code changes, even for simple refactors.
Missing dependency updates
Check if the skill requires updated dependencies or new packages.