Python Basics

Includes Virtual Env, Collab and link to NumPy tutorials

Virtual Environment

Setting up a virtual environment is a smart way to keep your project clean and reproducible—especially when you’re working with packages like NumPy and Seaborn in Jupyter notebooks. Here’s a step-by-step guide using both Conda and venv, so you can choose whichever suits your workflow best:

✅ Option 1: Using Conda (Recommended for Data Science projects)

Conda is ideal if you already use Anaconda or Miniconda. It handles scientific packages very well.

🔧 Step-by-Step

Navigate to your project folder in terminal:
```
cd path/to/your/project
```
Create a new conda environment:
```
conda create -n myenv python=3.11
```
Replace myenv with your preferred environment name.
Activate the environment:
```
conda activate myenv
```
Install packages:
```
conda install numpy seaborn jupyter
```

(Optional but useful) Install ipykernel so Jupyter can use this environment:

conda install ipykernel
python -m ipykernel install --user --name=myenv --display-name "Python (myenv)"

Launch Jupyter:
```
jupyter notebook
```
In the notebook interface, select the kernel: Kernel > Change Kernel > Python (myenv)

✅ Option 2: Using `venv` + pip (More lightweight, universal)

If you’re not using Conda and want something lighter:

🔧 Step-by-Step

Navigate to your project folder:
```
cd path/to/your/project
```
Create a virtual environment:
```
python -m venv venv
```
Activate the environment:
- On Windows:
```
venv\Scripts\activate
```
- On macOS/Linux:
```
source venv/bin/activate
```

Install your packages:

pip install numpy seaborn notebook ipykernel

Add the environment as a Jupyter kernel:

python -m ipykernel install --user --name=venv --display-name "Python (venv)"

Launch Jupyter Notebook:
```
jupyter notebook
```

At any point, save your environment to a file:

Conda:
```
conda env export > environment.yml
```
venv/pip:
```
pip freeze > requirements.txt
```

This makes it super easy to recreate the same environment later with:

conda env create -f environment.yml
pip install -r requirements.txt

Git

Cloning your project with Git is a solid move—especially when you’re working with notebooks, dependencies, and scientific experimentation. Let’s talk best practices for setting up, cloning, and maintaining a clean, shareable, and reproducible data science project (like your NumPy + Seaborn + Jupyter one).

✅ Best Practices for Cloning and Managing a Jupyter-Based Data Science Project with Git

📁 1. Project Folder Structure

Organize your repo for clarity and scale. Here’s a clean baseline structure:

your-project/
│
├── notebooks/          # Jupyter notebooks (analysis, experiments)
│   ├── analysis.ipynb
│   └── viz.ipynb
│
├── data/               # Raw or processed datasets (often .gitignored)
│   └── ...
│
├── env/ or environment.yml  # Conda env (or requirements.txt for pip/venv)
│
├── .gitignore
├── README.md
├── LICENSE (optional)
└── setup.sh or setup_instructions.md (optional script/instructions to set up env)

🧠 2. Cloning the Project

From any machine:

git clone https://github.com/your-username/your-project.git
cd your-project

🛠️ 3. Environment Setup (Best Practice)

As soon as you clone it, you should recreate the environment:

a) If using Conda:

conda env create -f environment.yml
conda activate your-env-name
jupyter notebook

b) If using `requirements.txt`:

python -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows
pip install -r requirements.txt
jupyter notebook

🧾 4. Git Best Practices for Jupyter Projects

✅ Track only what you should:

Use .gitignore wisely: Example .gitignore:

__pycache__/
.ipynb_checkpoints/
.DS_Store
venv/
env/
data/
*.pyc

Avoid committing datasets or output files unless small and necessary.

✅ Commit clear, atomic changes:

Avoid “big dump” commits. Instead, do things like:

git add notebooks/viz.ipynb
git commit -m "Add correlation heatmap using seaborn"

✅ Use branches if you’re experimenting:

git checkout -b try-different-plot-style

📚 5. Include a Good `README.md`

People (including future you!) will appreciate context. Your README should include:

# My Data Project
 
This project explores [topic] using NumPy and Seaborn in Jupyter notebooks.
 
## Setup Instructions
 
1. Clone the repo:
   ```bash
   git clone https://github.com/your-username/your-project.git
   cd your-project

Create the environment:

conda env create -f environment.yml
conda activate myenv

Launch notebooks:
```
jupyter notebook
```

Notebooks

notebooks/analysis.ipynb: Data exploration
notebooks/viz.ipynb: Visualizations with seaborn

Requirements

Python 3.11
NumPy
Seaborn
Jupyter


### 🧪 Bonus: Track Notebook Diffs Like a Pro

By default, Git tracks notebooks as JSON blobs. To improve this:
- Use [nbdime](https://github.com/jupyter/nbdime):
  ```bash
  pip install nbdime
  nbdime config-git --enable

This gives you readable diffs like git diff notebooks/plot.ipynb.

🏁 Final Thoughts

✅ Use virtual environments for reproducibility ✅ Keep data out of the repo, unless it’s tiny ✅ Write helpful commits and README ✅ Structure your repo like you’re building a product, not a playground

Google Collab

Working with Google Colab as a data scientist can be super powerful if you use it wisely. It’s more than just a free Jupyter notebook in the cloud — with the right techniques, it can feel almost like a full-blown data science IDE. Here are pro tips and hacks to level up your workflow:

🧠 1. Environment Setup Like a Pro

Use a requirements.txt At the top of your notebook, run:
```
!pip install -r requirements.txt
```
This keeps your environment reproducible. Store this file in your repo.
Use virtual environments (indirectly) While you can’t create Conda envs in Colab easily, you can simulate isolated environments using pip install and %pip or %conda in Jupyter magics (though %conda needs some tricks).

⚡ 2. Speed & Resource Boosting

Get free GPU/TPU
- Go to Runtime > Change runtime type > Hardware accelerator > GPU/TPU.
- For deep learning or large matrix ops, even a Tesla T4 gives a huge boost.

Reconnect to prevent timeouts Use this in your browser console to auto-click “Reconnect”:

function ClickConnect(){
  console.log("Auto reconnecting...");
  document.querySelector("colab-connect-button").shadowRoot.querySelector("#connect").click();
}
setInterval(ClickConnect, 60000);

🧩 3. Mounting Google Drive = Easy Data Access

Store large datasets, models, or checkpoints on Google Drive:

from google.colab import drive
drive.mount('/content/drive')

Use relative paths like:

data_path = "/content/drive/MyDrive/my_project/data.csv"

📊 4. Great Visualizations, Fast

Use %matplotlib inline, seaborn, plotly, or altair for interactive/beautiful charts.
If you’re doing ML, use yellowbrick or mlxtend for quick diagnostic visuals.

💾 5. Persist Your Models & Data

Save models to Drive:

import joblib
joblib.dump(model, '/content/drive/MyDrive/model.pkl')

Save plots/images:

plt.savefig('/content/drive/MyDrive/plot.png')

🚀 6. Magics to Save Time

%time / %timeit – Performance insights
%load_ext autoreload – Auto-reloads modules
%debug – Drops into interactive debugger on exception
%who – Lists variables in memory
%history – Shows command history

🤝 7. Collaboration Superpowers

Comment & Share: Use the share link like Google Docs

Version control? Sync with GitHub:

!git clone https://github.com/yourname/yourrepo.git

You can even push back changes:

!git add . && git commit -m "update" && git push

🛠 8. Install Anything, Do Anything

Need external tools?

!apt install ffmpeg
!wget https://some-url.com/file.zip

Even use ngrok or flask to expose a local web app:
```
!pip install flask-ngrok
```

📁 9. Organize Code With Scripts

Instead of cluttering your notebook:
- Store functions in .py scripts and import them:
```
from my_utils import preprocess_data
```

Reimport updated files:

import importlib
importlib.reload(my_utils)

If you’re using APIs, load keys like this:

import os
os.environ['API_KEY'] = "your_token"

Then access it in code:

key = os.environ['API_KEY']

⚙️ Bonus: Use Extensions!

Colab doesn’t support full Jupyter extensions, but there are hacks:

Use custom CSS with JavaScript
Or, for advanced notebooks, try JupyterLab on platforms like Kaggle Notebooks or Deepnote if Colab becomes limiting.

NumPy

Check this Link to find some notebooks to learn more about NumPy

Machine Learning

Explorer

04-Notes-Amin

Python Basics

Virtual Environment

✅ Option 1: Using Conda (Recommended for Data Science projects)

🔧 Step-by-Step

✅ Option 2: Using `venv` + pip (More lightweight, universal)

🔧 Step-by-Step

Git

✅ Best Practices for Cloning and Managing a Jupyter-Based Data Science Project with Git

📁 1. Project Folder Structure

🧠 2. Cloning the Project

🛠️ 3. Environment Setup (Best Practice)

a) If using Conda:

b) If using `requirements.txt`:

🧾 4. Git Best Practices for Jupyter Projects

✅ Track only what you should:

✅ Commit clear, atomic changes:

✅ Use branches if you’re experimenting:

📚 5. Include a Good `README.md`

Notebooks

Requirements

🏁 Final Thoughts

Google Collab

🧠 1. Environment Setup Like a Pro

⚡ 2. Speed & Resource Boosting

🧩 3. Mounting Google Drive = Easy Data Access

📊 4. Great Visualizations, Fast

💾 5. Persist Your Models & Data

🚀 6. Magics to Save Time

🤝 7. Collaboration Superpowers

🛠 8. Install Anything, Do Anything

📁 9. Organize Code With Scripts

⚙️ Bonus: Use Extensions!

NumPy

Graph View

Table of Contents

Machine Learning

Explorer

04-Notes-Amin

Python Basics

Virtual Environment

✅ Option 1: Using Conda (Recommended for Data Science projects)

🔧 Step-by-Step

✅ Option 2: Using venv + pip (More lightweight, universal)

🔧 Step-by-Step

📦 Bonus: Save and Share Dependencies

Git

✅ Best Practices for Cloning and Managing a Jupyter-Based Data Science Project with Git

📁 1. Project Folder Structure

🧠 2. Cloning the Project

🛠️ 3. Environment Setup (Best Practice)

a) If using Conda:

b) If using requirements.txt:

🧾 4. Git Best Practices for Jupyter Projects

✅ Track only what you should:

✅ Commit clear, atomic changes:

✅ Use branches if you’re experimenting:

📚 5. Include a Good README.md

Notebooks

Requirements

🏁 Final Thoughts

Google Collab

🧠 1. Environment Setup Like a Pro

⚡ 2. Speed & Resource Boosting

🧩 3. Mounting Google Drive = Easy Data Access

📊 4. Great Visualizations, Fast

💾 5. Persist Your Models & Data

🚀 6. Magics to Save Time

🤝 7. Collaboration Superpowers

🛠 8. Install Anything, Do Anything

📁 9. Organize Code With Scripts

🔒 10. Security Warning: Don’t Share Tokens in Code

⚙️ Bonus: Use Extensions!

NumPy

Graph View

Table of Contents

✅ Option 2: Using `venv` + pip (More lightweight, universal)

b) If using `requirements.txt`:

📚 5. Include a Good `README.md`