Mayur Rathi

⭐ 5 GitHub stars

AI Data Engineer

AI Data Engineer is an data AI skill with a core value of Expert guidance on the complete modern data stack from ingestion to analytics: batch vs streaming, storage (Snowflake, BigQuery, Delta Lake), orchestration (Airflow/Prefect), and data quality governance. It helps developers solve real-world problems in the data domain, boosting efficiency, automating repetitive tasks, and optimizing workflows.

Expert guidance on the complete modern data stack from ingestion to analytics: batch vs streaming, storage (Snowflake, BigQuery, Delta Lake), orchestration (Airflow/Prefect), and data quality governance.

Last verified on: 2026-07-11

Quick Facts

Category data

Works With Claude, ChatGPT, Gemini

Source mayurrathi/awesome-agent-skills

Stars ⭐ 5

Last Verified 2026-07-11

Risk Level Low

mkdir -p ./skills/ai-data-engineer && curl -sfL https://raw.githubusercontent.com/mayurrathi/awesome-agent-skills/main/skills/ai-data-engineer/SKILL.md -o ./skills/ai-data-engineer/SKILL.md

Run in terminal / PowerShell. Requires curl (Unix) or PowerShell 5+ (Windows).

Skill Content

# AI Data Engineer

Purpose

Design scalable data pipelines and modern data architecture for production environments.

Architecture Design Process

Step 1: Define Requirements

- Scale: GB/TB/PB per day

- Latency: Batch vs near-real-time vs real-time

- Data Sources: DBs, APIs, files, streams

- Consumers: Analysts (SQL), Data Scientists, Apps

Step 2: Choose Architecture

**Batch:** Fivetran/Airbyte → dbt → Snowflake/BigQuery → BI

**Streaming:** Kafka/Confluent → Flink/Kafka Streams

**Storage:** Delta Lake, Iceberg, Hudi on S3/ADLS/GCS

**Layers:** Bronze (raw) → Silver (cleaned) → Gold (aggregated)

Step 3: Data Modeling

- Star schema for business reporting

- Data Vault 2.0 for enterprise warehousing

- Slowly Changing Dimensions (SCD 1, 2, 3)

Step 4: Data Quality

- Great Expectations framework

- Lineage tracking (DataHub, Atlan)

- Row-level security for PII

- Encryption at rest and in transit

Step 5: Operations

- Infrastructure as Code (Terraform)

- CI/CD for dbt and Spark jobs

- Cost optimization: partitioning, clustering, lifecycle

🎯 Best For

UI designers
Product designers
Claude users
ChatGPT users
Gemini users

💡 Use Cases

Generating component mockups
Creating design system tokens
Data pipeline auditing
Query optimization

📖 How to Use This Skill

1
Install the Skill

Copy the install command from the Terminal tab and run it. The SKILL.md file downloads to your local skills directory.
2
Load into Your AI Assistant

Open Claude or ChatGPT and reference the skill. Paste the SKILL.md content or use the system prompt tab.
3
Apply AI Data Engineer to Your Work

Provide context for your task — paste source material, describe your audience, or share existing work to guide the AI.
4
Review and Refine

Edit the AI output for accuracy, tone, and completeness. Add human insight where the AI lacks context.

❓ Frequently Asked Questions

Does this work with Figma?

Some design skills integrate with Figma plugins. Check the Works With section for supported tools.

How do I install AI Data Engineer?

Copy the install command from the Terminal tab and run it. The skill downloads to ./skills/ai-data-engineer/SKILL.md, ready to use.

Can I customize this skill for my team?

Absolutely. Edit the SKILL.md file to add team-specific instructions, examples, or workflows.

⚠️ Common Mistakes to Avoid

Skipping usability testing

AI-generated designs should be validated with real users before development.

Ignoring data quality

AI analysis inherits all data quality issues — profile your data first.

🔗 Related Skills

agentic-actions-auditor agentic-actions-auditor ai-seo ai-seo neon-ai-gateway neon-ai-gateway ai-wrapper-product Ai Wrapper Product ai-studio-image ai-studio-image aomi-transact aomi-transact