Python Basics
Includes Virtual Env, Collab and link to NumPy tutorials
Virtual Environment
Setting up a virtual environment is a smart way to keep your project clean and reproducible—especially when you’re working with packages like NumPy and Seaborn in Jupyter notebooks. Here’s a step-by-step guide using both Conda and venv, so you can choose whichever suits your workflow best:
✅ Option 1: Using Conda (Recommended for Data Science projects)
Conda is ideal if you already use Anaconda or Miniconda. It handles scientific packages very well.
🔧 Step-by-Step
- 
Navigate to your project folder in terminal: cd path/to/your/project
- 
Create a new conda environment: conda create -n myenv python=3.11Replace myenvwith your preferred environment name.
- 
Activate the environment: conda activate myenv
- 
Install packages: conda install numpy seaborn jupyter
- 
(Optional but useful) Install ipykernelso Jupyter can use this environment:conda install ipykernel python -m ipykernel install --user --name=myenv --display-name "Python (myenv)"
- 
Launch Jupyter: jupyter notebook
- 
In the notebook interface, select the kernel: Kernel>Change Kernel>Python (myenv)
✅ Option 2: Using venv + pip (More lightweight, universal)
If you’re not using Conda and want something lighter:
🔧 Step-by-Step
- 
Navigate to your project folder: cd path/to/your/project
- 
Create a virtual environment: python -m venv venv
- 
Activate the environment: - 
On Windows: venv\Scripts\activate
- 
On macOS/Linux: source venv/bin/activate
 
- 
- 
Install your packages: pip install numpy seaborn notebook ipykernel
- 
Add the environment as a Jupyter kernel: python -m ipykernel install --user --name=venv --display-name "Python (venv)"
- 
Launch Jupyter Notebook: jupyter notebook
📦 Bonus: Save and Share Dependencies
At any point, save your environment to a file:
- 
Conda: conda env export > environment.yml
- 
venv/pip: pip freeze > requirements.txt
This makes it super easy to recreate the same environment later with:
- conda env create -f environment.yml
- pip install -r requirements.txt
Git
Cloning your project with Git is a solid move—especially when you’re working with notebooks, dependencies, and scientific experimentation. Let’s talk best practices for setting up, cloning, and maintaining a clean, shareable, and reproducible data science project (like your NumPy + Seaborn + Jupyter one).
✅ Best Practices for Cloning and Managing a Jupyter-Based Data Science Project with Git
📁 1. Project Folder Structure
Organize your repo for clarity and scale. Here’s a clean baseline structure:
your-project/
│
├── notebooks/          # Jupyter notebooks (analysis, experiments)
│   ├── analysis.ipynb
│   └── viz.ipynb
│
├── data/               # Raw or processed datasets (often .gitignored)
│   └── ...
│
├── env/ or environment.yml  # Conda env (or requirements.txt for pip/venv)
│
├── .gitignore
├── README.md
├── LICENSE (optional)
└── setup.sh or setup_instructions.md (optional script/instructions to set up env)
🧠 2. Cloning the Project
From any machine:
git clone https://github.com/your-username/your-project.git
cd your-project🛠️ 3. Environment Setup (Best Practice)
As soon as you clone it, you should recreate the environment:
a) If using Conda:
conda env create -f environment.yml
conda activate your-env-name
jupyter notebookb) If using requirements.txt:
python -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows
pip install -r requirements.txt
jupyter notebook🧾 4. Git Best Practices for Jupyter Projects
✅ Track only what you should:
- 
Use .gitignorewisely: Example.gitignore:__pycache__/ .ipynb_checkpoints/ .DS_Store venv/ env/ data/ *.pyc
- 
Avoid committing datasets or output files unless small and necessary. 
✅ Commit clear, atomic changes:
Avoid “big dump” commits. Instead, do things like:
git add notebooks/viz.ipynb
git commit -m "Add correlation heatmap using seaborn"✅ Use branches if you’re experimenting:
git checkout -b try-different-plot-style📚 5. Include a Good README.md
People (including future you!) will appreciate context. Your README should include:
# My Data Project
 
This project explores [topic] using NumPy and Seaborn in Jupyter notebooks.
 
## Setup Instructions
 
1. Clone the repo:
   ```bash
   git clone https://github.com/your-username/your-project.git
   cd your-project- 
Create the environment: conda env create -f environment.yml conda activate myenv
- 
Launch notebooks: jupyter notebook
Notebooks
- notebooks/analysis.ipynb: Data exploration
- notebooks/viz.ipynb: Visualizations with seaborn
Requirements
- Python 3.11
- NumPy
- Seaborn
- Jupyter
### 🧪 Bonus: Track Notebook Diffs Like a Pro
By default, Git tracks notebooks as JSON blobs. To improve this:
- Use [nbdime](https://github.com/jupyter/nbdime):
  ```bash
  pip install nbdime
  nbdime config-git --enable
This gives you readable diffs like git diff notebooks/plot.ipynb.
🏁 Final Thoughts
✅ Use virtual environments for reproducibility ✅ Keep data out of the repo, unless it’s tiny ✅ Write helpful commits and README ✅ Structure your repo like you’re building a product, not a playground
Google Collab
Working with Google Colab as a data scientist can be super powerful if you use it wisely. It’s more than just a free Jupyter notebook in the cloud — with the right techniques, it can feel almost like a full-blown data science IDE. Here are pro tips and hacks to level up your workflow:
🧠 1. Environment Setup Like a Pro
- 
Use a requirements.txt At the top of your notebook, run: !pip install -r requirements.txtThis keeps your environment reproducible. Store this file in your repo. 
- 
Use virtual environments (indirectly) While you can’t create Conda envs in Colab easily, you can simulate isolated environments using pip installand%pipor%condain Jupyter magics (though%condaneeds some tricks).
⚡ 2. Speed & Resource Boosting
- 
Get free GPU/TPU - Go to Runtime > Change runtime type > Hardware accelerator > GPU/TPU.
- For deep learning or large matrix ops, even a Tesla T4 gives a huge boost.
 
- Go to 
- 
Reconnect to prevent timeouts Use this in your browser console to auto-click “Reconnect”: function ClickConnect(){ console.log("Auto reconnecting..."); document.querySelector("colab-connect-button").shadowRoot.querySelector("#connect").click(); } setInterval(ClickConnect, 60000);
🧩 3. Mounting Google Drive = Easy Data Access
- 
Store large datasets, models, or checkpoints on Google Drive: from google.colab import drive drive.mount('/content/drive')
- 
Use relative paths like: data_path = "/content/drive/MyDrive/my_project/data.csv"
📊 4. Great Visualizations, Fast
- Use %matplotlib inline,seaborn,plotly, oraltairfor interactive/beautiful charts.
- If you’re doing ML, use yellowbrickormlxtendfor quick diagnostic visuals.
💾 5. Persist Your Models & Data
- 
Save models to Drive: import joblib joblib.dump(model, '/content/drive/MyDrive/model.pkl')
- 
Save plots/images: plt.savefig('/content/drive/MyDrive/plot.png')
🚀 6. Magics to Save Time
- %time/- %timeit– Performance insights
- %load_ext autoreload– Auto-reloads modules
- %debug– Drops into interactive debugger on exception
- %who– Lists variables in memory
- %history– Shows command history
🤝 7. Collaboration Superpowers
- 
Comment & Share: Use the share link like Google Docs 
- 
Version control? Sync with GitHub: !git clone https://github.com/yourname/yourrepo.git
- 
You can even push back changes: !git add . && git commit -m "update" && git push
🛠 8. Install Anything, Do Anything
- 
Need external tools? !apt install ffmpeg !wget https://some-url.com/file.zip
- 
Even use ngrokorflaskto expose a local web app:!pip install flask-ngrok
📁 9. Organize Code With Scripts
- 
Instead of cluttering your notebook: - 
Store functions in .pyscripts and import them:from my_utils import preprocess_data
 
- 
- 
Reimport updated files: import importlib importlib.reload(my_utils)
🔒 10. Security Warning: Don’t Share Tokens in Code
If you’re using APIs, load keys like this:
import os
os.environ['API_KEY'] = "your_token"Then access it in code:
key = os.environ['API_KEY']⚙️ Bonus: Use Extensions!
Colab doesn’t support full Jupyter extensions, but there are hacks:
- Use custom CSS with JavaScript
- Or, for advanced notebooks, try JupyterLab on platforms like Kaggle Notebooks or Deepnote if Colab becomes limiting.
NumPy
Check this Link to find some notebooks to learn more about NumPy