Python Basics
Includes Virtual Env, Collab and link to NumPy tutorials
Virtual Environment
Setting up a virtual environment is a smart way to keep your project clean and reproducible—especially when you’re working with packages like NumPy and Seaborn in Jupyter notebooks. Here’s a step-by-step guide using both Conda and venv, so you can choose whichever suits your workflow best:
✅ Option 1: Using Conda (Recommended for Data Science projects)
Conda is ideal if you already use Anaconda or Miniconda. It handles scientific packages very well.
🔧 Step-by-Step
-
Navigate to your project folder in terminal:
cd path/to/your/project
-
Create a new conda environment:
conda create -n myenv python=3.11
Replace
myenv
with your preferred environment name. -
Activate the environment:
conda activate myenv
-
Install packages:
conda install numpy seaborn jupyter
-
(Optional but useful) Install
ipykernel
so Jupyter can use this environment:conda install ipykernel python -m ipykernel install --user --name=myenv --display-name "Python (myenv)"
-
Launch Jupyter:
jupyter notebook
-
In the notebook interface, select the kernel:
Kernel
>Change Kernel
>Python (myenv)
✅ Option 2: Using venv
+ pip (More lightweight, universal)
If you’re not using Conda and want something lighter:
🔧 Step-by-Step
-
Navigate to your project folder:
cd path/to/your/project
-
Create a virtual environment:
python -m venv venv
-
Activate the environment:
-
On Windows:
venv\Scripts\activate
-
On macOS/Linux:
source venv/bin/activate
-
-
Install your packages:
pip install numpy seaborn notebook ipykernel
-
Add the environment as a Jupyter kernel:
python -m ipykernel install --user --name=venv --display-name "Python (venv)"
-
Launch Jupyter Notebook:
jupyter notebook
📦 Bonus: Save and Share Dependencies
At any point, save your environment to a file:
-
Conda:
conda env export > environment.yml
-
venv/pip:
pip freeze > requirements.txt
This makes it super easy to recreate the same environment later with:
conda env create -f environment.yml
pip install -r requirements.txt
Git
Cloning your project with Git is a solid move—especially when you’re working with notebooks, dependencies, and scientific experimentation. Let’s talk best practices for setting up, cloning, and maintaining a clean, shareable, and reproducible data science project (like your NumPy + Seaborn + Jupyter one).
✅ Best Practices for Cloning and Managing a Jupyter-Based Data Science Project with Git
📁 1. Project Folder Structure
Organize your repo for clarity and scale. Here’s a clean baseline structure:
your-project/
│
├── notebooks/ # Jupyter notebooks (analysis, experiments)
│ ├── analysis.ipynb
│ └── viz.ipynb
│
├── data/ # Raw or processed datasets (often .gitignored)
│ └── ...
│
├── env/ or environment.yml # Conda env (or requirements.txt for pip/venv)
│
├── .gitignore
├── README.md
├── LICENSE (optional)
└── setup.sh or setup_instructions.md (optional script/instructions to set up env)
🧠 2. Cloning the Project
From any machine:
git clone https://github.com/your-username/your-project.git
cd your-project
🛠️ 3. Environment Setup (Best Practice)
As soon as you clone it, you should recreate the environment:
a) If using Conda:
conda env create -f environment.yml
conda activate your-env-name
jupyter notebook
b) If using requirements.txt
:
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
pip install -r requirements.txt
jupyter notebook
🧾 4. Git Best Practices for Jupyter Projects
✅ Track only what you should:
-
Use
.gitignore
wisely: Example.gitignore
:__pycache__/ .ipynb_checkpoints/ .DS_Store venv/ env/ data/ *.pyc
-
Avoid committing datasets or output files unless small and necessary.
✅ Commit clear, atomic changes:
Avoid “big dump” commits. Instead, do things like:
git add notebooks/viz.ipynb
git commit -m "Add correlation heatmap using seaborn"
✅ Use branches if you’re experimenting:
git checkout -b try-different-plot-style
📚 5. Include a Good README.md
People (including future you!) will appreciate context. Your README should include:
# My Data Project
This project explores [topic] using NumPy and Seaborn in Jupyter notebooks.
## Setup Instructions
1. Clone the repo:
```bash
git clone https://github.com/your-username/your-project.git
cd your-project
-
Create the environment:
conda env create -f environment.yml conda activate myenv
-
Launch notebooks:
jupyter notebook
Notebooks
notebooks/analysis.ipynb
: Data explorationnotebooks/viz.ipynb
: Visualizations with seaborn
Requirements
- Python 3.11
- NumPy
- Seaborn
- Jupyter
### 🧪 Bonus: Track Notebook Diffs Like a Pro
By default, Git tracks notebooks as JSON blobs. To improve this:
- Use [nbdime](https://github.com/jupyter/nbdime):
```bash
pip install nbdime
nbdime config-git --enable
This gives you readable diffs like git diff notebooks/plot.ipynb
.
🏁 Final Thoughts
✅ Use virtual environments for reproducibility ✅ Keep data out of the repo, unless it’s tiny ✅ Write helpful commits and README ✅ Structure your repo like you’re building a product, not a playground
Google Collab
Working with Google Colab as a data scientist can be super powerful if you use it wisely. It’s more than just a free Jupyter notebook in the cloud — with the right techniques, it can feel almost like a full-blown data science IDE. Here are pro tips and hacks to level up your workflow:
🧠 1. Environment Setup Like a Pro
-
Use a requirements.txt At the top of your notebook, run:
!pip install -r requirements.txt
This keeps your environment reproducible. Store this file in your repo.
-
Use virtual environments (indirectly) While you can’t create Conda envs in Colab easily, you can simulate isolated environments using
pip install
and%pip
or%conda
in Jupyter magics (though%conda
needs some tricks).
⚡ 2. Speed & Resource Boosting
-
Get free GPU/TPU
- Go to
Runtime > Change runtime type > Hardware accelerator > GPU/TPU
. - For deep learning or large matrix ops, even a Tesla T4 gives a huge boost.
- Go to
-
Reconnect to prevent timeouts Use this in your browser console to auto-click “Reconnect”:
function ClickConnect(){ console.log("Auto reconnecting..."); document.querySelector("colab-connect-button").shadowRoot.querySelector("#connect").click(); } setInterval(ClickConnect, 60000);
🧩 3. Mounting Google Drive = Easy Data Access
-
Store large datasets, models, or checkpoints on Google Drive:
from google.colab import drive drive.mount('/content/drive')
-
Use relative paths like:
data_path = "/content/drive/MyDrive/my_project/data.csv"
📊 4. Great Visualizations, Fast
- Use
%matplotlib inline
,seaborn
,plotly
, oraltair
for interactive/beautiful charts. - If you’re doing ML, use
yellowbrick
ormlxtend
for quick diagnostic visuals.
💾 5. Persist Your Models & Data
-
Save models to Drive:
import joblib joblib.dump(model, '/content/drive/MyDrive/model.pkl')
-
Save plots/images:
plt.savefig('/content/drive/MyDrive/plot.png')
🚀 6. Magics to Save Time
%time
/%timeit
– Performance insights%load_ext autoreload
– Auto-reloads modules%debug
– Drops into interactive debugger on exception%who
– Lists variables in memory%history
– Shows command history
🤝 7. Collaboration Superpowers
-
Comment & Share: Use the share link like Google Docs
-
Version control? Sync with GitHub:
!git clone https://github.com/yourname/yourrepo.git
-
You can even push back changes:
!git add . && git commit -m "update" && git push
🛠 8. Install Anything, Do Anything
-
Need external tools?
!apt install ffmpeg !wget https://some-url.com/file.zip
-
Even use
ngrok
orflask
to expose a local web app:!pip install flask-ngrok
📁 9. Organize Code With Scripts
-
Instead of cluttering your notebook:
-
Store functions in
.py
scripts and import them:from my_utils import preprocess_data
-
-
Reimport updated files:
import importlib importlib.reload(my_utils)
🔒 10. Security Warning: Don’t Share Tokens in Code
If you’re using APIs, load keys like this:
import os
os.environ['API_KEY'] = "your_token"
Then access it in code:
key = os.environ['API_KEY']
⚙️ Bonus: Use Extensions!
Colab doesn’t support full Jupyter extensions, but there are hacks:
- Use custom CSS with JavaScript
- Or, for advanced notebooks, try JupyterLab on platforms like Kaggle Notebooks or Deepnote if Colab becomes limiting.
NumPy
Check this Link to find some notebooks to learn more about NumPy