AI Data Engineer
AI Data Engineer是一款data方向的AI技能,核心价值是Expert guidance on the complete modern data stack from ingestion to analytics: batch vs streaming, storage (Snowflake, BigQuery, Delta Lake), orchestration (Airflow/Prefect), and data quality governance,可用于解决开发者在data领域的实际问题,帮助用户提升效率、自动化重复任务或优化工作流。
Expert guidance on the complete modern data stack from ingestion to analytics: batch vs streaming, storage (Snowflake, BigQuery, Delta Lake), orchestration (Airflow/Prefect), and data quality governance.
mkdir -p ./skills/ai-data-engineer && curl -sfL https://raw.githubusercontent.com/mayurrathi/awesome-agent-skills/main/skills/ai-data-engineer/SKILL.md -o ./skills/ai-data-engineer/SKILL.md Run in terminal / PowerShell. Requires curl (Unix) or PowerShell 5+ (Windows).
Skill Content
# AI Data Engineer
Purpose
Design scalable data pipelines and modern data architecture for production environments.
Architecture Design Process
Step 1: Define Requirements
- Scale: GB/TB/PB per day
- Latency: Batch vs near-real-time vs real-time
- Data Sources: DBs, APIs, files, streams
- Consumers: Analysts (SQL), Data Scientists, Apps
Step 2: Choose Architecture
**Batch:** Fivetran/Airbyte → dbt → Snowflake/BigQuery → BI
**Streaming:** Kafka/Confluent → Flink/Kafka Streams
**Storage:** Delta Lake, Iceberg, Hudi on S3/ADLS/GCS
**Layers:** Bronze (raw) → Silver (cleaned) → Gold (aggregated)
Step 3: Data Modeling
- Star schema for business reporting
- Data Vault 2.0 for enterprise warehousing
- Slowly Changing Dimensions (SCD 1, 2, 3)
Step 4: Data Quality
- Great Expectations framework
- Lineage tracking (DataHub, Atlan)
- Row-level security for PII
- Encryption at rest and in transit
Step 5: Operations
- Infrastructure as Code (Terraform)
- CI/CD for dbt and Spark jobs
- Cost optimization: partitioning, clustering, lifecycle
🎯 Best For
- UI designers
- Product designers
- Claude users
- ChatGPT users
- Gemini users
💡 Use Cases
- Generating component mockups
- Creating design system tokens
- Data pipeline auditing
- Query optimization
📖 How to Use This Skill
- 1
Install the Skill
Copy the install command from the Terminal tab and run it. The SKILL.md file downloads to your local skills directory.
- 2
Load into Your AI Assistant
Open Claude or ChatGPT and reference the skill. Paste the SKILL.md content or use the system prompt tab.
- 3
Apply AI Data Engineer to Your Work
Provide context for your task — paste source material, describe your audience, or share existing work to guide the AI.
- 4
Review and Refine
Edit the AI output for accuracy, tone, and completeness. Add human insight where the AI lacks context.
❓ Frequently Asked Questions
Does this work with Figma?
Some design skills integrate with Figma plugins. Check the Works With section for supported tools.
How do I install AI Data Engineer?
Copy the install command from the Terminal tab and run it. The skill downloads to ./skills/ai-data-engineer/SKILL.md, ready to use.
Can I customize this skill for my team?
Absolutely. Edit the SKILL.md file to add team-specific instructions, examples, or workflows.
⚠️ Common Mistakes to Avoid
Skipping usability testing
AI-generated designs should be validated with real users before development.
Ignoring data quality
AI analysis inherits all data quality issues — profile your data first.