📈 Data Science Repertoire

This repository consists of all the implementations of data science and machine learning algorithms and theories. It is divided into 6 parts from basics of python to implementing deep learning related terminologies in python using PySpark. This repository will be updated whenever required. Feel free to star it if you find it useful! ⭐

📂 Repository Structure

0) Introduction to Programming in DS

1. Python Basics

Basics and Crumbs: Includes extreme basics of Python like fundamental and advanced data types, variables, loops, if and else statements, break and continue statements and range() function.
Strings and List Operations: Discusses about the string and list in built methods, slicing and indexing, nesting of lists along with different string and list operations.
Functions and Containers: Continues about the operations of advanced data types like tuples, dictionaries and sets and discussed about creating functions in python.

2. Python Advanced

OOPs and APIs: Discusses about object oriented programming in python, working with APIs in python and how to get text and responses from a website, and examples of exception handling with the help of try and except blocks.
Pandas: Explains all the packages and terminologies related to pandas, which includes the basics of pandas, subsetting and modifying data in pandas, and sorting and aggregating data in python with the help of pandas along with playing with datetime data.
NumPy: Covers the basic terminology of numPy library in python, creating arrays with different methods, indexing and concatenation, conditional selection and broadcasting.
Matplotlib and Seaborn: Covers all the basic tools used in Matplotlib and Seaborn for data visualization purposes in python like line charts, bar graphs, histograms, scatter and density plots for continuous and discrete variables.
SciPy: Covers basic mathematics tools used in machine learning and data science like derivative of a function, permutation and combinations and determinant and inverse of a matrix.

1) Mathematics and Statistics

Descriptive Statistics: Includes concepts related to meaures of central tendency such as mean, median, mode and terms like variance, standard deviation, quartiles, percentiles and many more.
Inferential Statistics: Includes concepts related to basics of probability, marginal, joint and conditional probability along with creating frequency tables and infer from them.
Probability Distributions: Includes code of discrete distributions like binomial, bernoulli, poisson and and continuous like normal distributions. This concept also includes distributions like uniform and exponential distributions.
Hypothesis Testing: Demonstrates one sample (Z-test and T-test), two sample (Z-test and T-test) and Chi-Square test in detail by defining null and alternative hypothesis using specific python modules and libraries.
Random and Stratified Sampling: Includes examples which use random sampling and stratified sampling in python with the help of real data.

2) Exploratory Data Analysis

EDA Basics: Covers basic EDA terminologies, descriptives like Mean, Median, Mode, Variance, Min and Max, central limit theorem, discrete and continuous distributions on a real world dataset.
Univariate Analysis: Covers univariate analysis on both numerical and categorical variables with the help of data visualization, and helps you to draw inferences from the plots.
Bivariate Analysis: Covers bivariate analysis in combination of numerical and categorical variables, and then helps you to draw inferences from the plots. Different graphs are used for different types of analysis.
Multivate Analysis: Covers multivariate analysis on differnt attributes with the help of pivot tables, grouped box plots and pair plots.

3) Data Preprocessing

Missing Values: Demonstrates how to handle missing values; deleting rows, imputing with mean, median or mode or predicting the missing values.
Categorical Encoding: Gives as example of label and one-hot encoding on a real world dataset, and how to deal with categorical variables in general.
Outlier Treatment: Tells how to treat outliers with the help of interquartile range, removal of outliers, or variable transformation.
Validation: Gives an example of hold-out validation and k-fold cross validation.

4) Feature Engineering

Basics of Feature Engineering: Provides notebooks related to the crumbs of feature engineering, like feature transformation, feature scaling, binning, feature interaction, handling date-time features, frequency and mean encoding, feature tools, and many more.
Text Feature Engineering: Provides basics of regular expressions or regex, built-in methods, sequences and characters, and gives examples of data extraction from a given text or string.

5) Machine Learning

Scikit-Learn: Teaches the basics of Scikit-Learn and how different machine learning algorithms can be imported for regression and classification purposes.
Basics of Machine Learning: Tells about benchmark classification, benchmark regression and cost function, and are performed on a sample dataset for demonstration.
Regression and Classification: Discusses about linear models like linear regression, regularized models like Ridge and Lasso regression, OLS method, classification models like logistic regression, and showcasing example of using SMOTE for class imbalance situations.
Support Vector Machines: Demonstrates all types of SVM kernels (linear, default, polynomial, and gaussian) with the help of iris dataset and plotting contour graphs.
Decision Trees and Random Forests: Talks about different regression and classification regression with respect to decision trees, random forests, gradient boosting, XGBoost, catboost, and light gradient boosting machines (LightGBM). It also showcases different ensemble techniques like max voting and average techniques.
Naive Bayes Algorithm: Teaches about the basics of naive bayes algorithm and it's types: Gaussian, Multinomial and Bernoulli naive bayes algorithms.
Principal Component Analysis: Showcases an example of a classification model with and without PCA, and how dimensionality reduction can help improve the performance of the model.
Clustering: Showcases 2 types of clustering technique: K-means clustering which uses the concept of euclidean distance and inertia, and Hierarchial / Agglomerative clustering which uses the concept of Dengrogram to find similar features and create different clusters of same properties.
K-nearest neighbours: How KNN works on a real world dataset, how to find the K nearest neighbours and how to avoid overfitting and underfitting using KNN algorithm.

6) Deep Learning

Pytorch, Tensorflow and Keras: Gives a concise overview of Pytorch, Tensorflow and Keras and how it can be used in the context of Deep learning in Python.
Neural Network: Provides an introduction of neural network and concepts like backward and forward propagation, along with an end-to-end example of how to use it on a real world dataset.
Optimizers in NN: Showcases three different optimizers for Neural Networks: Adam, SGD with momentum, and RMSProp along with their implementations from scratch.
Digital Image Processing:
- Playing with images: Consits of basic image related preprocessing, such as reading and loading images, image operations and conversion of images into different formats.
- Bit Plane Slicing: Showcases an example of how to perform bit plane slicing and how to enhance an image with the help of constrast stretching technique.
- Morphological Image Processing: Tells about different morphological techniques such as image erosion, image dilation, closing, opening and morphological gradient, along with various image thresholding techniques for image segmentation.
- Edge detection: Tells about different operators for detecting edges in an image such as Prewitt, Robert, Sobel and Harris Corner, and advanced thresholding techniques such as Global and Adaptive thresholding.
- Image Segmentation: Talks about different segmentation techniques, such as K-means segmentation, edge and region based segmentation and Histogram of Oriented Gradients (HOG features), along with Hough Transform for line detection, and image reconstruction with the help of auto-encoders.

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
0_Introduction to Programming in DS/1_Python Programming		0_Introduction to Programming in DS/1_Python Programming
1_Mathematics and Statistics in DS		1_Mathematics and Statistics in DS
2_Exploratory Data Analysis		2_Exploratory Data Analysis
3_Data Preprocessing		3_Data Preprocessing
4_Feature Engineering		4_Feature Engineering
5_Machine Learning		5_Machine Learning
6_Deep Learning		6_Deep Learning
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📈 Data Science Repertoire

📂 Repository Structure

0) Introduction to Programming in DS

1. Python Basics

2. Python Advanced

1) Mathematics and Statistics

2) Exploratory Data Analysis

3) Data Preprocessing

4) Feature Engineering

5) Machine Learning

6) Deep Learning

Cheatsheets

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📈 Data Science Repertoire

📂 Repository Structure

0) Introduction to Programming in DS

1. Python Basics

2. Python Advanced

1) Mathematics and Statistics

2) Exploratory Data Analysis

3) Data Preprocessing

4) Feature Engineering

5) Machine Learning

6) Deep Learning

Cheatsheets

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages