Pinecone-Rag
Pinecone-Rag is an code AI skill with a core value of >. It
helps developers solve real-world problems in the code domain, boosting
efficiency, automating repetitive tasks, and optimizing workflows.
>
mkdir -p ./skills/pinecone-rag && curl -sfL https://raw.githubusercontent.com/github/awesome-copilot/main/skills/pinecone-rag/SKILL.md -o ./skills/pinecone-rag/SKILL.md Run in terminal / PowerShell. Requires curl (Unix) or PowerShell 5+ (Windows).
Skill Content
# Pinecone RAG Skill
This skill guides you through building a production RAG pipeline or persistent
agent memory system using Pinecone. Follow the workflow from start to finish —
don't skip steps or jump to code before understanding what the user actually
needs.
Before you start — ask one question
Before writing any code, identify which of these two use cases applies:
**A — RAG over documents**: User wants to index a corpus (PDFs, docs, code,
web pages) and retrieve relevant chunks to ground LLM responses.
**B — Agent memory**: User wants an agent to remember facts, decisions, or
context across sessions or across multiple agents sharing a knowledge base.
The setup is similar but the namespace strategy and retrieval patterns differ.
If the user hasn't said, ask: *"Is this for document retrieval, agent memory,
or both?"* Then follow the relevant workflow below.
---
Step 1 — Choose your index configuration
Pick the index type before writing any code. Getting this wrong means
re-creating the index later.
**Serverless (recommended for most cases)**
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key="PINECONE_API_KEY")
if "my-index" not in pc.list_indexes().names():
pc.create_index(
name="my-index",
dimension=1536, # must match your embedding model exactly
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1")
)
index = pc.Index("my-index")**Pod-based (for consistent high-throughput production)**
from pinecone import PodSpec
pc.create_index(
name="my-index-prod",
dimension=1536,
metric="cosine",
spec=PodSpec(environment="us-east1-gcp", pod_type="p1.x1")
)**Dimension quick reference — match this exactly to your embedding model:**
| Model | Dimension |
|---|---|
| `text-embedding-3-small` | 1536 |
| `text-embedding-3-large` | 3072 |
| `voyage-3` / `voyage-multimodal-3` | 1024 |
| `BAAI/bge-large-en-v1.5` | 1024 |
| `intfloat/multilingual-e5-large` (Arabic, Malay, Chinese) | 1024 |
> **Checkpoint**: Index exists, dimension matches embedding model, `index.describe_index_stats()` returns without error.
---
Step 2 — Embed and upsert documents
Always batch upserts — never upsert one vector at a time.
from openai import OpenAI
client = OpenAI()
def embed(texts: list[str]) -> list[list[float]]:
res = client.embeddings.create(model="text-embedding-3-small", input=texts)
return [r.embedding for r in res.data]
def upsert_docs(index, docs: list[dict], namespace: str = "default"):
"""docs = [{"id": "...", "text": "...", "metadata": {...}}]"""
BATCH = 100
for i in range(0, len(docs), BATCH):
batch = docs[i:i + BATCH]
vecs = [
{
"id": d["id"],
"values": emb,
"metadata": {**d.get("metadata", {}), "text": d["text"]}
}
for d, emb in zip(batch, embed([d["text"] for d in batch]))
]
index.upsert(vectors=vecs, namespace=namespace)**Always store the original text in metadata** — this avoids a second lookup
at retrieval time.
> **Checkpoint**: `index.describe_index_stats()` shows vector count > 0 in the
> target namespace.
---
Step 3 — Choose retrieval strategy
Dense (semantic) search — use for most cases
def search(index, query: str, top_k: int = 5, namespace: str = "default",
filter: dict = None) -> list[dict]:
[q_emb] = embed([query])
results = index.query(
vector=q_emb, top_k=top_k, namespace=namespace,
include_metadata=True, filter=filter
)
return [{"text": m.metadata["text"], "score": m.score, "id": m.id}
for m in results.matches]Hybrid search (semantic + BM25 keyword) — use when corpus has exact terminology
Use hybrid when the domain has precise terms that semantic search misses:
legal citations, medical codes, product SKUs, API method names.
from pine🎯 Best For
- Claude users
- GitHub Copilot users
- Software engineers
- Development teams
- Tech leads
💡 Use Cases
- Code quality improvement
- Best practice enforcement
📖 How to Use This Skill
- 1
Install the Skill
Copy the install command from the Terminal tab and run it. The SKILL.md file downloads to your local skills directory.
- 2
Load into Your AI Assistant
Open Claude or GitHub Copilot and reference the skill. Paste the SKILL.md content or use the system prompt tab.
- 3
Apply Pinecone-Rag to Your Work
Open your project in the AI assistant and ask it to apply the skill. Start with a small module to verify the output quality.
- 4
Review and Refine
Review AI suggestions before committing. Run tests, check for regressions, and iterate on the skill output.
❓ Frequently Asked Questions
Is Pinecone-Rag compatible with Cursor and VS Code?
Yes — this skill works with any AI coding assistant including Cursor, VS Code with Copilot, and JetBrains IDEs.
Do I need specific dependencies for Pinecone-Rag?
Check the install command and Works With section. Most code skills only require the AI assistant and your codebase.
How do I install Pinecone-Rag?
Copy the install command from the Terminal tab and run it. The skill downloads to ./skills/pinecone-rag/SKILL.md, ready to use.
Can I customize this skill for my team?
Absolutely. Edit the SKILL.md file to add team-specific instructions, examples, or workflows.
⚠️ Common Mistakes to Avoid
Skipping validation
Always test AI-generated code changes, even for simple refactors.
Missing dependency updates
Check if the skill requires updated dependencies or new packages.