MR
Mayur Rathi
@github
⭐ 34.1k GitHub stars

Arize-Prompt-Optimization

Arize-Prompt-Optimization是一款data方向的AI技能,核心价值是Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations,可用于解决开发者在data领域的实际问题,帮助用户提升效率、自动化重复任务或优化工作流。

Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop

Last verified on: 2026-05-30
mkdir -p ./skills/arize-prompt-optimization && curl -sfL https://raw.githubusercontent.com/github/awesome-copilot/main/skills/arize-prompt-optimization/SKILL.md -o ./skills/arize-prompt-optimization/SKILL.md

Run in terminal / PowerShell. Requires curl (Unix) or PowerShell 5+ (Windows).

Skill Content

# Arize Prompt Optimization Skill


> **`SPACE`** — All `--space` flags and the `ARIZE_SPACE` env var accept a space **name** (e.g., `my-workspace`) or a base64 space **ID** (e.g., `U3BhY2U6...`). Find yours with `ax spaces list`.


Concepts


Where Prompts Live in Trace Data


LLM applications emit spans following OpenInference semantic conventions. Prompts are stored in different span attributes depending on the span kind and instrumentation:


| Column | What it contains | When to use |

|--------|-----------------|-------------|

| `attributes.llm.input_messages` | Structured chat messages (system, user, assistant, tool) in role-based format | **Primary source** for chat-based LLM prompts |

| `attributes.llm.input_messages.roles` | Array of roles: `system`, `user`, `assistant`, `tool` | Extract individual message roles |

| `attributes.llm.input_messages.contents` | Array of message content strings | Extract message text |

| `attributes.input.value` | Serialized prompt or user question (generic, all span kinds) | Fallback when structured messages are not available |

| `attributes.llm.prompt_template.template` | Template with `{variable}` placeholders (e.g., `"Answer {question} using {context}"`) | When the app uses prompt templates |

| `attributes.llm.prompt_template.variables` | Template variable values (JSON object) | See what values were substituted into the template |

| `attributes.output.value` | Model response text | See what the LLM produced |

| `attributes.llm.output_messages` | Structured model output (including tool calls) | Inspect tool-calling responses |


Finding Prompts by Span Kind


- **LLM span** (`attributes.openinference.span.kind = 'LLM'`): Check `attributes.llm.input_messages` for structured chat messages, OR `attributes.input.value` for a serialized prompt. Check `attributes.llm.prompt_template.template` for the template.

- **Chain/Agent span**: `attributes.input.value` contains the user's question. The actual LLM prompt lives on **child LLM spans** -- navigate down the trace tree.

- **Tool span**: `attributes.input.value` has tool input, `attributes.output.value` has tool result. Not typically where prompts live.


Performance Signal Columns


These columns carry the feedback data used for optimization:


| Column pattern | Source | What it tells you |

|---------------|--------|-------------------|

| `annotation.<name>.label` | Human reviewers | Categorical grade (e.g., `correct`, `incorrect`, `partial`) |

| `annotation.<name>.score` | Human reviewers | Numeric quality score (e.g., 0.0 - 1.0) |

| `annotation.<name>.text` | Human reviewers | Freeform explanation of the grade |

| `eval.<name>.label` | LLM-as-judge evals | Automated categorical assessment |

| `eval.<name>.score` | LLM-as-judge evals | Automated numeric score |

| `eval.<name>.explanation` | LLM-as-judge evals | Why the eval gave that score -- **most valuable for optimization** |

| `attributes.input.value` | Trace data | What went into the LLM |

| `attributes.output.value` | Trace data | What the LLM produced |

| `{experiment_name}.output` | Experiment runs | Output from a specific experiment |


Prerequisites


Proceed directly with the task — run the `ax` command you need. Do NOT check versions, env vars, or profiles upfront.


If an `ax` command fails, troubleshoot based on the error:

- `command not found` or version error → see references/ax-setup.md

- `401 Unauthorized` / missing API key → run `ax profiles show` to inspect the current profile. If the profile is missing or the API key is wrong, follow references/ax-profiles.md to create/update it. If the user doesn't have their key, direct them to https://app.arize.com/admin > API Keys

- Space unknown → run `ax spaces list` to pick by name, or ask the user

- Project unclear → ask the user, or run `ax projects list -o json --limit 100` and present as selectable options

- LLM provider call fails (missing OPENAI_API_KEY / ANTHROPIC_API_KEY) → run `ax ai-integrations list --space SPACE

🎯 Best For

  • Debugging engineers
  • QA teams
  • Claude users
  • GitHub Copilot users
  • Data professionals

💡 Use Cases

  • Tracing runtime errors in production logs
  • Identifying memory leaks
  • Data pipeline auditing
  • Query optimization

📖 How to Use This Skill

  1. 1

    Install the Skill

    Copy the install command from the Terminal tab and run it. The SKILL.md file downloads to your local skills directory.

  2. 2

    Load into Your AI Assistant

    Open Claude or GitHub Copilot and reference the skill. Paste the SKILL.md content or use the system prompt tab.

  3. 3

    Apply Arize-Prompt-Optimization to Your Work

    Provide context for your task — paste source material, describe your audience, or share existing work to guide the AI.

  4. 4

    Review and Refine

    Edit the AI output for accuracy, tone, and completeness. Add human insight where the AI lacks context.

❓ Frequently Asked Questions

Can this debug production issues?

Yes, but always ensure you have proper logging and monitoring in place first.

How do I install Arize-Prompt-Optimization?

Copy the install command from the Terminal tab and run it. The skill downloads to ./skills/arize-prompt-optimization/SKILL.md, ready to use.

Can I customize this skill for my team?

Absolutely. Edit the SKILL.md file to add team-specific instructions, examples, or workflows.

⚠️ Common Mistakes to Avoid

Debugging without context

Always provide the full error stack and surrounding code context for accurate debugging.

Ignoring data quality

AI analysis inherits all data quality issues — profile your data first.

🔗 Related Skills