Automated classification of good and defective beans Python + TensorFlow/Keras pipeline for segmentation, feature extraction and dense neural network classification.
- João Pedro Rocha Senna
- Thiago Tanaka Peczek
FeijõesAI - FIA – Dense neural network for binary classification (bad vs. good).
-
Clone the repository
git clone https://github.com/Arg0n4ut4/IAFIA.git cd IAFIA -
Create and activate a virtual environment
python -m venv venv source venv/bin/activate # Linux/macOS venv\Scripts\activate # Windows
-
Install dependencies
pip install -r requirements.txt
-
Preprocessing Segments and crops each bean from
data/raw/goodanddata/raw/bad→data/processed/python src/preprocessing/segment.py data/raw data/processed --size 128
-
Feature extraction
-
Shape:
python src/features/shape.py data/processed data/features/shape.csv
-
Color:
python src/features/color.py data/processed data/features/color.csv
-
Texture:
python src/features/texture.py data/processed data/features/texture.csv
-
-
Dataset construction Merges shape, color and texture features → splits into 70% training / 30% testing
python src/datasets/build_dataset.py
Generates:
data/features/train.csvdata/features/test.csv
-
Neural Network Training Trains using
train.csv(70% of good beans + 70% of bad beans) and internally validates using 20% validation split.python src/train.py
Model saved at
data/models/rna.h5. -
Evaluation (Optional) Run
evaluate.pyfor final metrics usingtest.csv.python src/evaluate.py data/features/test.csv data/models/rna.h5
| Column | Description |
|---|---|
area |
Bean contour area |
perimeter |
Contour perimeter |
aspect |
Bounding box width/height ratio |
circularity |
4π·area / perimeter² |
extent |
Area ⁄ (bounding box width·height) |
solidity |
Area ⁄ convex hull area |
hu_1 … hu_7 |
7 invariant Hu moments (log-transformed) |
prop_black |
Proportion of near-black pixels (RGB ≤ (30,30,30)) |
mean_b,mean_g,mean_r |
Mean values for B, G and R channels |
std_b,std_g,std_r |
Standard deviation of B, G and R channels |
prop_dark_otsu |
Proportion of dark pixels in V channel (HSV + Otsu) |
glcm_contrast |
GLCM contrast |
glcm_homogeneity |
GLCM homogeneity |
glcm_energy |
GLCM energy |
glcm_correlation |
GLCM correlation |
lbp_0 … lbp_9 |
LBP histogram (10 bins, “uniform” method) |
hog_0 … hog_N |
HOG feature vector (gradient orientations) |
label |
Class: 0 = bad, 1 = normal |
Note:
train_reduced.csvandtest_reduced.csvcontain only the 10 most important features (calculated usingRandomForestClassifier.feature_importances_) +label.
-
Preprocessing
- Grayscale conversion + Otsu thresholding
- Morphological operations (open)
- Main contour detection + crop + resize (128×128)
-
Feature Extraction
- Shape: area, perimeter, aspect ratio, circularity, extent, solidity, Hu moments
- Color: dark pixel proportion, RGB statistics, dark V-channel proportion (HSV + Otsu)
- Texture: GLCM (contrast, homogeneity, energy, correlation), LBP (10 bins), HOG
-
Feature Selection
- Training a
RandomForestClassifier(n_estimators=100)usingtrain.csv - Feature importance ranking and retention of the top 10 features
- Generation of
train_reduced.csvandtest_reduced.csv
- Training a
-
Dense Neural Network
- Fully-connected layers: 128 → 64 → 32
- BatchNormalization + Dropout(0.3)
- Sigmoid output (
binary_crossentropy) - Metrics:
accuracy,AUC - Callbacks: EarlyStopping (patience=10), ModelCheckpoint
| Metric | Full Model | Reduced Model |
|---|---|---|
| Accuracy | 90.20 % | 89.71 % |
| F1-score (average) | 0.9091 | 0.9041 |
| ROC-AUC | 0.9628 | 0.9511 |
| Precision (bad) | 0.88 | 0.87 |
| Recall (bad) | 0.91 | 0.91 |
| Precision (normal) | 0.93 | 0.93 |
| Recall (normal) | 0.89 | 0.88 |
Classificação automatizada de feijões bons e defeituosos Pipeline em Python + TensorFlow/Keras para segmentação, extração de atributos e rede neural densa.
- João Pedro Rocha Senna
- Thiago Tanaka Peczek
FeijõesAI - FIA – Rede neural densa para classificação binária (ruim vs. bom).
-
Clone o repositório
git clone https://github.com/Arg0n4ut4/IAFIA.git cd IAFIA -
Crie e ative um ambiente virtual
python -m venv venv source venv/bin/activate # Linux/macOS venv\Scripts\activate # Windows
-
Instale as dependências
pip install -r requirements.txt
-
Pré-processamento Segmenta e recorta cada grão de
data/raw/bonsedata/raw/ruins→data/processed/python src/preprocessing/segment.py data/raw data/processed --size 128
-
Extração de features
-
Forma:
python src/features/shape.py data/processed data/features/shape.csv
-
Cor:
python src/features/color.py data/processed data/features/color.csv
-
Textura:
python src/features/texture.py data/processed data/features/texture.csv
-
-
Construção do dataset Junta shape, color e texture → divide 70% treino / 30% teste
python src/datasets/build_dataset.py
Gera:
data/features/train.csvdata/features/test.csv
-
Treinamento da RNA Treina com
train.csv(70% dos bons + 70% dos ruins) e valida internamente (20% de validação).python src/train.py
Modelo salvo em
data/models/rna.h5. -
Avaliação (Opcional) Rode
evaluate.pypara métricas finais notest.csv.python src/evaluate.py data/features/test.csv data/models/rna.h5
| Coluna | Descrição |
|---|---|
area |
Área do contorno do grão |
perimeter |
Perímetro do contorno |
aspect |
Razão largura/altura da bounding box |
circularity |
4π·area / perímetro² |
extent |
Área ⁄ (largura·altura) da bounding box |
solidity |
Área ⁄ (área do convex hull) |
hu_1 … hu_7 |
7 momentos de Hu invariantes (log-transformados) |
prop_black |
Proporção de pixels quase-pretos (RGB ≤ (30,30,30)) |
mean_b,mean_g,mean_r |
Média dos canais B, G, R |
std_b,std_g,std_r |
Desvio-padrão dos canais B, G, R |
prop_dark_otsu |
Proporção de pixels escuros no canal V (HSV + Otsu) |
glcm_contrast |
Contraste da matriz GLCM |
glcm_homogeneity |
Homogeneidade da matriz GLCM |
glcm_energy |
Energia da matriz GLCM |
glcm_correlation |
Correlação da matriz GLCM |
lbp_0 … lbp_9 |
Histograma LBP (10 bins, método “uniform”) |
hog_0 … hog_N |
Vetor HOG (orientações de gradiente) |
label |
Classe: 0 = ruim, 1 = normal |
Observação:
train_reduced.csvetest_reduced.csvcontêm apenas as 10 features mais importantes (calculadas viaRandomForestClassifier.feature_importances_) +label.
-
Pré-processamento
- Conversão para escala de cinza + Otsu
- Operações morfológicas (open)
- Detecção de contorno principal + crop + resize (128×128)
-
Extração de atributos
- Forma: área, perímetro, razão, circularidade, extent, solidity, momentos de Hu
- Cor: proporção de pixels escuros, estatísticas RGB, proporção escura em V (HSV+Otsu)
- Textura: GLCM (contrast, homog., energy, corr.), LBP (10 bins), HOG
-
Seleção de features
- Treinamento de
RandomForestClassifier(n_estimators=100)emtrain.csv - Ordenação por importância e retenção das 10 top features
- Geração de
train_reduced.csvetest_reduced.csv
- Treinamento de
-
Rede Neural Densa
- Camadas fully-connected: 128 → 64 → 32
- BatchNormalization + Dropout(0.3)
- Saída Sigmoid (binary_crossentropy)
- Métricas:
accuracy,AUC - Callback: EarlyStopping (patience=10), ModelCheckpoint
| Métrica | Modelo Completo | Modelo Reduzido |
|---|---|---|
| Acurácia | 90,20 % | 89,71 % |
| F1-score (médio) | 0,9091 | 0,9041 |
| ROC-AUC | 0,9628 | 0,9511 |
| Precision (ruim) | 0,88 | 0,87 |
| Recall (ruim) | 0,91 | 0,91 |
| Precision (normal) | 0,93 | 0,93 |
| Recall (normal) | 0,89 | 0,88 |