This project implements an end-to-end regression model to predict crop yields (measured in hectograms per hectare) using various factors like location, crop type, year, rainfall, pesticide usage, and average temperature. It includes a data preprocessing pipeline, model training scripts, and a Streamlit-based web interface for real-time predictions.
The predictive model is built using a Sequential Artificial Neural Network (ANN) implemented with TensorFlow/Keras. Below are the detailed specifications:
-
Model Architecture:
-
Input Layer: Accepts 15 input features (after preprocessing and encoding).
-
Hidden Layer 1: 64 neurons with ReLU (Rectified Linear Unit) activation function.
-
Hidden Layer 2: 32 neurons with ReLU activation function.
-
Output Layer: 1 neuron (for regression) to predict the numerical yield value.
-
Compilation Details:
-
Optimizer: Adam.
-
Loss Function: Mean Absolute Error (MAE).
-
Metrics: Mean Absolute Error (MAE).
-
Training Configuration:
-
Epochs: 100.
-
Callbacks: Includes Early Stopping (patience of 15 epochs) to prevent overfitting and TensorBoard for visualization.
-
Data Split: 80% Training, 20% Testing.
-
Performance: The model achieved a validation MAE of approximately 17,074 hg/ha after 100 epochs.
The model is trained on a dataset (yield.csv) containing the following features:
| Feature Name | Type | Description |
|---|---|---|
| Area | Categorical | Country or region name (e.g., India, Brazil). |
| Item | Categorical | Crop type (e.g., Maize, Rice, Wheat). |
| Year | Temporal | Year of observation. |
| Average Rainfall | Numerical | Annual rainfall in mm per year. |
| Pesticides | Numerical | Amount of pesticides used in tonnes. |
| Average Temp | Numerical | Average yearly temperature in °C. |
| hg/ha_yield | Target | Crop yield in hectograms per hectare. |
To prepare the data for the ANN, the following steps are performed:
- Encoding:
- Item: One-Hot Encoded (transformed into 10 binary columns).
- Area: Target Encoded (using
category_encoders.TargetEncoderto handle high cardinality).
- Scaling: All numerical features are standardized using
StandardScalerto ensure the neural network converges efficiently. - Artifacts: The trained encoders and scalers are saved as
.pklfiles in theartifacts/folder for use during inference.
app.py: The Streamlit web application providing the user interface.cleaning.ipynb: Jupyter notebook detailing data cleaning, encoding, scaling, and model training.predict.ipynb: Notebook containing the logic for loading the model and making test predictions.model.h5: The saved trained ANN model.artifacts/: ContainsAge_encoder.pkl(Area encoder),onehot_encoder.pkl, andscaler.pkl.requirements.txt: Lists all necessary Python libraries (TensorFlow, Streamlit, Pandas, etc.).
- Install Dependencies:
pip install -r requirements.txt
- Run the Web Application:
streamlit run app.py
- Prediction: Enter the required crop and environmental details in the Streamlit UI to receive an instant yield prediction.