Mayur Rathi

⭐ 34.1k GitHub stars

Arize-Prompt-Optimization

Arize-Prompt-Optimization is an data AI skill with a core value of Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. It helps developers solve real-world problems in the data domain, boosting efficiency, automating repetitive tasks, and optimizing workflows.

Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop

Last verified on: 2026-07-14

Quick Facts

Category data

Works With Claude, GitHub Copilot

Source github/awesome-copilot

Stars ⭐ 34.1k

Last Verified 2026-07-14

Risk Level Low

mkdir -p ./skills/arize-prompt-optimization && curl -sfL https://raw.githubusercontent.com/github/awesome-copilot/main/skills/arize-prompt-optimization/SKILL.md -o ./skills/arize-prompt-optimization/SKILL.md

Run in terminal / PowerShell. Requires curl (Unix) or PowerShell 5+ (Windows).

Skill Content

# Arize Prompt Optimization Skill

> **`SPACE`** — All `--space` flags and the `ARIZE_SPACE` env var accept a space **name** (e.g., `my-workspace`) or a base64 space **ID** (e.g., `U3BhY2U6...`). Find yours with `ax spaces list`.

Concepts

Where Prompts Live in Trace Data

LLM applications emit spans following OpenInference semantic conventions. Prompts are stored in different span attributes depending on the span kind and instrumentation:

| Column | What it contains | When to use |

|--------|-----------------|-------------|

| `attributes.llm.input_messages` | Structured chat messages (system, user, assistant, tool) in role-based format | **Primary source** for chat-based LLM prompts |

| `attributes.llm.input_messages.roles` | Array of roles: `system`, `user`, `assistant`, `tool` | Extract individual message roles |

| `attributes.llm.input_messages.contents` | Array of message content strings | Extract message text |

| `attributes.input.value` | Serialized prompt or user question (generic, all span kinds) | Fallback when structured messages are not available |

| `attributes.llm.prompt_template.template` | Template with `{variable}` placeholders (e.g., `"Answer {question} using {context}"`) | When the app uses prompt templates |

| `attributes.llm.prompt_template.variables` | Template variable values (JSON object) | See what values were substituted into the template |

| `attributes.output.value` | Model response text | See what the LLM produced |

| `attributes.llm.output_messages` | Structured model output (including tool calls) | Inspect tool-calling responses |

Finding Prompts by Span Kind

- **LLM span** (`attributes.openinference.span.kind = 'LLM'`): Check `attributes.llm.input_messages` for structured chat messages, OR `attributes.input.value` for a serialized prompt. Check `attributes.llm.prompt_template.template` for the template.

- **Chain/Agent span**: `attributes.input.value` contains the user's question. The actual LLM prompt lives on **child LLM spans** -- navigate down the trace tree.

- **Tool span**: `attributes.input.value` has tool input, `attributes.output.value` has tool result. Not typically where prompts live.

Performance Signal Columns

These columns carry the feedback data used for optimization:

| Column pattern | Source | What it tells you |

|---------------|--------|-------------------|

| `annotation.<name>.label` | Human reviewers | Categorical grade (e.g., `correct`, `incorrect`, `partial`) |

| `annotation.<name>.score` | Human reviewers | Numeric quality score (e.g., 0.0 - 1.0) |

| `annotation.<name>.text` | Human reviewers | Freeform explanation of the grade |

| `eval.<name>.label` | LLM-as-judge evals | Automated categorical assessment |

| `eval.<name>.score` | LLM-as-judge evals | Automated numeric score |

| `eval.<name>.explanation` | LLM-as-judge evals | Why the eval gave that score -- **most valuable for optimization** |

| `attributes.input.value` | Trace data | What went into the LLM |

| `attributes.output.value` | Trace data | What the LLM produced |

| `{experiment_name}.output` | Experiment runs | Output from a specific experiment |

Prerequisites

Proceed directly with the task — run the `ax` command you need. Do NOT check versions, env vars, or profiles upfront.

If an `ax` command fails, troubleshoot based on the error:

- `command not found` or version error → see references/ax-setup.md

- `401 Unauthorized` / missing API key → run `ax profiles show` to inspect the current profile. If the profile is missing or the API key is wrong, follow references/ax-profiles.md to create/update it. If the user doesn't have their key, direct them to https://app.arize.com/admin > API Keys

- Space unknown → run `ax spaces list` to pick by name, or ask the user

- Project unclear → ask the user, or run `ax projects list -o json --limit 100` and present as selectable options

- LLM provider call fails (missing OPENAI_API_KEY / ANTHROPIC_API_KEY) → run `ax ai-integrations list --space SPACE

🎯 Best For

Debugging engineers
QA teams
Claude users
GitHub Copilot users
Data professionals

💡 Use Cases

Tracing runtime errors in production logs
Identifying memory leaks
Data pipeline auditing
Query optimization

📖 How to Use This Skill

1
Install the Skill

Copy the install command from the Terminal tab and run it. The SKILL.md file downloads to your local skills directory.
2
Load into Your AI Assistant

Open Claude or GitHub Copilot and reference the skill. Paste the SKILL.md content or use the system prompt tab.
3
Apply Arize-Prompt-Optimization to Your Work

Provide context for your task — paste source material, describe your audience, or share existing work to guide the AI.
4
Review and Refine

Edit the AI output for accuracy, tone, and completeness. Add human insight where the AI lacks context.

❓ Frequently Asked Questions

Can this debug production issues?

Yes, but always ensure you have proper logging and monitoring in place first.

How do I install Arize-Prompt-Optimization?

Copy the install command from the Terminal tab and run it. The skill downloads to ./skills/arize-prompt-optimization/SKILL.md, ready to use.

Can I customize this skill for my team?

Absolutely. Edit the SKILL.md file to add team-specific instructions, examples, or workflows.

⚠️ Common Mistakes to Avoid

Debugging without context

Always provide the full error stack and surrounding code context for accurate debugging.

Ignoring data quality

AI analysis inherits all data quality issues — profile your data first.

🔗 Related Skills

acreadiness-assess Acreadiness-Assess acreadiness-policy Acreadiness-Policy adr-generator ADR Generator ai-prompt-engineering-safety-best-practices Ai-Prompt-Engineering-Safety-Best-Practices ai-readiness-reporter Ai-Readiness-Reporter ai-ready Ai-Ready