Mayur Rathi

⭐ 34.1k GitHub stars

Arize-Dataset

Arize-Dataset is an data AI skill with a core value of Creates, manages, and queries Arize datasets and examples. It helps developers solve real-world problems in the data domain, boosting efficiency, automating repetitive tasks, and optimizing workflows.

Creates, manages, and queries Arize datasets and examples. Covers dataset CRUD, appending examples, exporting data, and file-based dataset creation using the ax CLI. Use when the user needs test data,

Last verified on: 2026-07-14

Quick Facts

Category data

Works With Claude, GitHub Copilot

Source github/awesome-copilot

Stars ⭐ 34.1k

Last Verified 2026-07-14

Risk Level Low

mkdir -p ./skills/arize-dataset && curl -sfL https://raw.githubusercontent.com/github/awesome-copilot/main/skills/arize-dataset/SKILL.md -o ./skills/arize-dataset/SKILL.md

Run in terminal / PowerShell. Requires curl (Unix) or PowerShell 5+ (Windows).

Skill Content

# Arize Dataset Skill

> **`SPACE`** — All `--space` flags and the `ARIZE_SPACE` env var accept a space **name** (e.g., `my-workspace`) or a base64 space **ID** (e.g., `U3BhY2U6...`). Find yours with `ax spaces list`.

Concepts

- **Dataset** = a versioned collection of examples used for evaluation and experimentation

- **Dataset Version** = a snapshot of a dataset at a point in time; updates can be in-place or create a new version

- **Example** = a single record in a dataset with arbitrary user-defined fields (e.g., `question`, `answer`, `context`)

- **Space** = an organizational container; datasets belong to a space

System-managed fields on examples (`id`, `created_at`, `updated_at`) are auto-generated by the server -- never include them in create or append payloads.

Prerequisites

Proceed directly with the task — run the `ax` command you need. Do NOT check versions, env vars, or profiles upfront.

If an `ax` command fails, troubleshoot based on the error:

- `command not found` or version error → see references/ax-setup.md

- `401 Unauthorized` / missing API key → run `ax profiles show` to inspect the current profile. If the profile is missing or the API key is wrong, follow references/ax-profiles.md to create/update it. If the user doesn't have their key, direct them to https://app.arize.com/admin > API Keys

- Space unknown → run `ax spaces list` to pick by name, or ask the user

- Project unclear → ask the user, or run `ax projects list -o json --limit 100` and present as selectable options

- **Security:** Never read `.env` files or search the filesystem for credentials. Use `ax profiles` for Arize credentials and `ax ai-integrations` for LLM provider keys. If credentials are not available through these channels, ask the user.

List Datasets: `ax datasets list`

Browse datasets in a space. Output goes to stdout.

bash

ax datasets list
ax datasets list --space SPACE --limit 20
ax datasets list --cursor CURSOR_TOKEN
ax datasets list -o json

Flags

|------|------|---------|-------------|

| `--limit, -l` | int | 15 | Max results (1-100) |

Get Dataset: `ax datasets get`

Quick metadata lookup -- returns dataset name, space, timestamps, and version list.

bash

ax datasets get NAME_OR_ID
ax datasets get NAME_OR_ID -o json
ax datasets get NAME_OR_ID --space SPACE   # required when using dataset name instead of ID

Flags

|------|------|---------|-------------|

Response fields

| Field | Type | Description |

|-------|------|-------------|

| `id` | string | Dataset ID |

| `name` | string | Dataset name |

| `space_id` | string | Space this dataset belongs to |

| `created_at` | datetime | When the dataset was created |

| `updated_at` | datetime | Last modification time |

| `versions` | array | List of dataset versions (id, name, dataset_id, created_at, updated_at) |

Export Dataset: `ax datasets export`

Download all examples to a file. Use `--all` for datasets larger than 500 examples (unlimited bulk export).

bash

ax datasets export NAME_OR_ID
# -> dataset_abc123_20260305_141500/examples.json

ax datasets export NAME_OR_ID --all
ax datasets export NAME_OR_ID --version-id VERSION_ID
ax datasets export NAME_OR_ID --output-dir ./data
ax datasets export NAME_OR_ID --stdout
ax datasets export NAME_OR_ID --stdout | jq '.[0]'
ax datasets export NAME_OR_ID --

🎯 Best For

QA engineers
Developers writing unit tests
Claude users
GitHub Copilot users
Data professionals

💡 Use Cases

Generating test cases for edge conditions
Writing integration test suites
Data pipeline auditing
Query optimization

📖 How to Use This Skill

1
Install the Skill

Copy the install command from the Terminal tab and run it. The SKILL.md file downloads to your local skills directory.
2
Load into Your AI Assistant

Open Claude or GitHub Copilot and reference the skill. Paste the SKILL.md content or use the system prompt tab.
3
Apply Arize-Dataset to Your Work

Provide context for your task — paste source material, describe your audience, or share existing work to guide the AI.
4
Review and Refine

Edit the AI output for accuracy, tone, and completeness. Add human insight where the AI lacks context.

❓ Frequently Asked Questions

Does this generate test mocks?

Many testing skills include mock generation. Check the install command and skill content for details.

How do I install Arize-Dataset?

Copy the install command from the Terminal tab and run it. The skill downloads to ./skills/arize-dataset/SKILL.md, ready to use.

Can I customize this skill for my team?

Absolutely. Edit the SKILL.md file to add team-specific instructions, examples, or workflows.

⚠️ Common Mistakes to Avoid

Not testing edge cases

AI tends to generate happy-path tests. Manually review for boundary conditions.

Ignoring data quality

AI analysis inherits all data quality issues — profile your data first.

🔗 Related Skills

comet-opik Comet Opik pester-migration Pester-Migration acreadiness-assess Acreadiness-Assess acreadiness-policy Acreadiness-Policy adr-generator ADR Generator ai-prompt-engineering-safety-best-practices Ai-Prompt-Engineering-Safety-Best-Practices

Arize-Dataset

Quick Facts

Skill Content

Concepts

Prerequisites

List Datasets: `ax datasets list`

Flags

Get Dataset: `ax datasets get`

Flags

Response fields

Export Dataset: `ax datasets export`

🎯 Best For

💡 Use Cases

📖 How to Use This Skill

Install the Skill

Load into Your AI Assistant

Apply Arize-Dataset to Your Work

Review and Refine

❓ Frequently Asked Questions

Does this generate test mocks?

How do I install Arize-Dataset?

Can I customize this skill for my team?

⚠️ Common Mistakes to Avoid

Not testing edge cases

Ignoring data quality

🔗 Related Skills