Arize-Dataset
Arize-Dataset是一款data方向的AI技能,核心价值是Creates, manages, and queries Arize datasets and examples,可用于解决开发者在data领域的实际问题,帮助用户提升效率、自动化重复任务或优化工作流。
Creates, manages, and queries Arize datasets and examples. Covers dataset CRUD, appending examples, exporting data, and file-based dataset creation using the ax CLI. Use when the user needs test data,
mkdir -p ./skills/arize-dataset && curl -sfL https://raw.githubusercontent.com/github/awesome-copilot/main/skills/arize-dataset/SKILL.md -o ./skills/arize-dataset/SKILL.md Run in terminal / PowerShell. Requires curl (Unix) or PowerShell 5+ (Windows).
Skill Content
# Arize Dataset Skill
> **`SPACE`** — All `--space` flags and the `ARIZE_SPACE` env var accept a space **name** (e.g., `my-workspace`) or a base64 space **ID** (e.g., `U3BhY2U6...`). Find yours with `ax spaces list`.
Concepts
- **Dataset** = a versioned collection of examples used for evaluation and experimentation
- **Dataset Version** = a snapshot of a dataset at a point in time; updates can be in-place or create a new version
- **Example** = a single record in a dataset with arbitrary user-defined fields (e.g., `question`, `answer`, `context`)
- **Space** = an organizational container; datasets belong to a space
System-managed fields on examples (`id`, `created_at`, `updated_at`) are auto-generated by the server -- never include them in create or append payloads.
Prerequisites
Proceed directly with the task — run the `ax` command you need. Do NOT check versions, env vars, or profiles upfront.
If an `ax` command fails, troubleshoot based on the error:
- `command not found` or version error → see references/ax-setup.md
- `401 Unauthorized` / missing API key → run `ax profiles show` to inspect the current profile. If the profile is missing or the API key is wrong, follow references/ax-profiles.md to create/update it. If the user doesn't have their key, direct them to https://app.arize.com/admin > API Keys
- Space unknown → run `ax spaces list` to pick by name, or ask the user
- Project unclear → ask the user, or run `ax projects list -o json --limit 100` and present as selectable options
- **Security:** Never read `.env` files or search the filesystem for credentials. Use `ax profiles` for Arize credentials and `ax ai-integrations` for LLM provider keys. If credentials are not available through these channels, ask the user.
List Datasets: `ax datasets list`
Browse datasets in a space. Output goes to stdout.
ax datasets list
ax datasets list --space SPACE --limit 20
ax datasets list --cursor CURSOR_TOKEN
ax datasets list -o jsonFlags
| Flag | Type | Default | Description |
|------|------|---------|-------------|
| `--space` | string | from profile | Filter by space |
| `--limit, -l` | int | 15 | Max results (1-100) |
| `--cursor` | string | none | Pagination cursor from previous response |
| `-o, --output` | string | table | Output format: table, json, csv, parquet, or file path |
| `-p, --profile` | string | default | Configuration profile |
Get Dataset: `ax datasets get`
Quick metadata lookup -- returns dataset name, space, timestamps, and version list.
ax datasets get NAME_OR_ID
ax datasets get NAME_OR_ID -o json
ax datasets get NAME_OR_ID --space SPACE # required when using dataset name instead of IDFlags
| Flag | Type | Default | Description |
|------|------|---------|-------------|
| `NAME_OR_ID` | string | required | Dataset name or ID (positional) |
| `--space` | string | none | Space name or ID (required if using dataset name instead of ID) |
| `-o, --output` | string | table | Output format |
| `-p, --profile` | string | default | Configuration profile |
Response fields
| Field | Type | Description |
|-------|------|-------------|
| `id` | string | Dataset ID |
| `name` | string | Dataset name |
| `space_id` | string | Space this dataset belongs to |
| `created_at` | datetime | When the dataset was created |
| `updated_at` | datetime | Last modification time |
| `versions` | array | List of dataset versions (id, name, dataset_id, created_at, updated_at) |
Export Dataset: `ax datasets export`
Download all examples to a file. Use `--all` for datasets larger than 500 examples (unlimited bulk export).
ax datasets export NAME_OR_ID
# -> dataset_abc123_20260305_141500/examples.json
ax datasets export NAME_OR_ID --all
ax datasets export NAME_OR_ID --version-id VERSION_ID
ax datasets export NAME_OR_ID --output-dir ./data
ax datasets export NAME_OR_ID --stdout
ax datasets export NAME_OR_ID --stdout | jq '.[0]'
ax datasets export NAME_OR_ID --🎯 Best For
- QA engineers
- Developers writing unit tests
- Claude users
- GitHub Copilot users
- Data professionals
💡 Use Cases
- Generating test cases for edge conditions
- Writing integration test suites
- Data pipeline auditing
- Query optimization
📖 How to Use This Skill
- 1
Install the Skill
Copy the install command from the Terminal tab and run it. The SKILL.md file downloads to your local skills directory.
- 2
Load into Your AI Assistant
Open Claude or GitHub Copilot and reference the skill. Paste the SKILL.md content or use the system prompt tab.
- 3
Apply Arize-Dataset to Your Work
Provide context for your task — paste source material, describe your audience, or share existing work to guide the AI.
- 4
Review and Refine
Edit the AI output for accuracy, tone, and completeness. Add human insight where the AI lacks context.
❓ Frequently Asked Questions
Does this generate test mocks?
Many testing skills include mock generation. Check the install command and skill content for details.
How do I install Arize-Dataset?
Copy the install command from the Terminal tab and run it. The skill downloads to ./skills/arize-dataset/SKILL.md, ready to use.
Can I customize this skill for my team?
Absolutely. Edit the SKILL.md file to add team-specific instructions, examples, or workflows.
⚠️ Common Mistakes to Avoid
Not testing edge cases
AI tends to generate happy-path tests. Manually review for boundary conditions.
Ignoring data quality
AI analysis inherits all data quality issues — profile your data first.