MR
Mayur Rathi
@github
⭐ 34.1k GitHub stars

Arize-Dataset

Arize-Dataset是一款data方向的AI技能,核心价值是Creates, manages, and queries Arize datasets and examples,可用于解决开发者在data领域的实际问题,帮助用户提升效率、自动化重复任务或优化工作流。

Creates, manages, and queries Arize datasets and examples. Covers dataset CRUD, appending examples, exporting data, and file-based dataset creation using the ax CLI. Use when the user needs test data,

Last verified on: 2026-05-30
mkdir -p ./skills/arize-dataset && curl -sfL https://raw.githubusercontent.com/github/awesome-copilot/main/skills/arize-dataset/SKILL.md -o ./skills/arize-dataset/SKILL.md

Run in terminal / PowerShell. Requires curl (Unix) or PowerShell 5+ (Windows).

Skill Content

# Arize Dataset Skill


> **`SPACE`** — All `--space` flags and the `ARIZE_SPACE` env var accept a space **name** (e.g., `my-workspace`) or a base64 space **ID** (e.g., `U3BhY2U6...`). Find yours with `ax spaces list`.


Concepts


- **Dataset** = a versioned collection of examples used for evaluation and experimentation

- **Dataset Version** = a snapshot of a dataset at a point in time; updates can be in-place or create a new version

- **Example** = a single record in a dataset with arbitrary user-defined fields (e.g., `question`, `answer`, `context`)

- **Space** = an organizational container; datasets belong to a space


System-managed fields on examples (`id`, `created_at`, `updated_at`) are auto-generated by the server -- never include them in create or append payloads.


Prerequisites


Proceed directly with the task — run the `ax` command you need. Do NOT check versions, env vars, or profiles upfront.


If an `ax` command fails, troubleshoot based on the error:

- `command not found` or version error → see references/ax-setup.md

- `401 Unauthorized` / missing API key → run `ax profiles show` to inspect the current profile. If the profile is missing or the API key is wrong, follow references/ax-profiles.md to create/update it. If the user doesn't have their key, direct them to https://app.arize.com/admin > API Keys

- Space unknown → run `ax spaces list` to pick by name, or ask the user

- Project unclear → ask the user, or run `ax projects list -o json --limit 100` and present as selectable options

- **Security:** Never read `.env` files or search the filesystem for credentials. Use `ax profiles` for Arize credentials and `ax ai-integrations` for LLM provider keys. If credentials are not available through these channels, ask the user.


List Datasets: `ax datasets list`


Browse datasets in a space. Output goes to stdout.


bash
ax datasets list
ax datasets list --space SPACE --limit 20
ax datasets list --cursor CURSOR_TOKEN
ax datasets list -o json

Flags


| Flag | Type | Default | Description |

|------|------|---------|-------------|

| `--space` | string | from profile | Filter by space |

| `--limit, -l` | int | 15 | Max results (1-100) |

| `--cursor` | string | none | Pagination cursor from previous response |

| `-o, --output` | string | table | Output format: table, json, csv, parquet, or file path |

| `-p, --profile` | string | default | Configuration profile |


Get Dataset: `ax datasets get`


Quick metadata lookup -- returns dataset name, space, timestamps, and version list.


bash
ax datasets get NAME_OR_ID
ax datasets get NAME_OR_ID -o json
ax datasets get NAME_OR_ID --space SPACE   # required when using dataset name instead of ID

Flags


| Flag | Type | Default | Description |

|------|------|---------|-------------|

| `NAME_OR_ID` | string | required | Dataset name or ID (positional) |

| `--space` | string | none | Space name or ID (required if using dataset name instead of ID) |

| `-o, --output` | string | table | Output format |

| `-p, --profile` | string | default | Configuration profile |


Response fields


| Field | Type | Description |

|-------|------|-------------|

| `id` | string | Dataset ID |

| `name` | string | Dataset name |

| `space_id` | string | Space this dataset belongs to |

| `created_at` | datetime | When the dataset was created |

| `updated_at` | datetime | Last modification time |

| `versions` | array | List of dataset versions (id, name, dataset_id, created_at, updated_at) |


Export Dataset: `ax datasets export`


Download all examples to a file. Use `--all` for datasets larger than 500 examples (unlimited bulk export).


bash
ax datasets export NAME_OR_ID
# -> dataset_abc123_20260305_141500/examples.json

ax datasets export NAME_OR_ID --all
ax datasets export NAME_OR_ID --version-id VERSION_ID
ax datasets export NAME_OR_ID --output-dir ./data
ax datasets export NAME_OR_ID --stdout
ax datasets export NAME_OR_ID --stdout | jq '.[0]'
ax datasets export NAME_OR_ID --

🎯 Best For

  • QA engineers
  • Developers writing unit tests
  • Claude users
  • GitHub Copilot users
  • Data professionals

💡 Use Cases

  • Generating test cases for edge conditions
  • Writing integration test suites
  • Data pipeline auditing
  • Query optimization

📖 How to Use This Skill

  1. 1

    Install the Skill

    Copy the install command from the Terminal tab and run it. The SKILL.md file downloads to your local skills directory.

  2. 2

    Load into Your AI Assistant

    Open Claude or GitHub Copilot and reference the skill. Paste the SKILL.md content or use the system prompt tab.

  3. 3

    Apply Arize-Dataset to Your Work

    Provide context for your task — paste source material, describe your audience, or share existing work to guide the AI.

  4. 4

    Review and Refine

    Edit the AI output for accuracy, tone, and completeness. Add human insight where the AI lacks context.

❓ Frequently Asked Questions

Does this generate test mocks?

Many testing skills include mock generation. Check the install command and skill content for details.

How do I install Arize-Dataset?

Copy the install command from the Terminal tab and run it. The skill downloads to ./skills/arize-dataset/SKILL.md, ready to use.

Can I customize this skill for my team?

Absolutely. Edit the SKILL.md file to add team-specific instructions, examples, or workflows.

⚠️ Common Mistakes to Avoid

Not testing edge cases

AI tends to generate happy-path tests. Manually review for boundary conditions.

Ignoring data quality

AI analysis inherits all data quality issues — profile your data first.

🔗 Related Skills