Skip to content

Arg0n4ut4/IAFIA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FeijõesAI

Automated classification of good and defective beans Python + TensorFlow/Keras pipeline for segmentation, feature extraction and dense neural network classification.


📋 Participants

  • João Pedro Rocha Senna
  • Thiago Tanaka Peczek

🤖 AI Name

FeijõesAI - FIA – Dense neural network for binary classification (bad vs. good).


🚀 Usage Guide

  1. Clone the repository

    git clone https://github.com/Arg0n4ut4/IAFIA.git
    cd IAFIA
  2. Create and activate a virtual environment

    python -m venv venv
    source venv/bin/activate    # Linux/macOS
    venv\Scripts\activate       # Windows
  3. Install dependencies

    pip install -r requirements.txt
  4. Preprocessing Segments and crops each bean from data/raw/good and data/raw/baddata/processed/

    python src/preprocessing/segment.py data/raw data/processed --size 128
  5. Feature extraction

    • Shape:

      python src/features/shape.py data/processed data/features/shape.csv
    • Color:

      python src/features/color.py data/processed data/features/color.csv
    • Texture:

      python src/features/texture.py data/processed data/features/texture.csv
  6. Dataset construction Merges shape, color and texture features → splits into 70% training / 30% testing

    python src/datasets/build_dataset.py

    Generates:

    • data/features/train.csv
    • data/features/test.csv
  7. Neural Network Training Trains using train.csv (70% of good beans + 70% of bad beans) and internally validates using 20% validation split.

    python src/train.py

    Model saved at data/models/rna.h5.

  8. Evaluation (Optional) Run evaluate.py for final metrics using test.csv.

    python src/evaluate.py data/features/test.csv data/models/rna.h5

📊 CSV Structure

train.csv / test.csv

Column Description
area Bean contour area
perimeter Contour perimeter
aspect Bounding box width/height ratio
circularity 4π·area / perimeter²
extent Area ⁄ (bounding box width·height)
solidity Area ⁄ convex hull area
hu_1hu_7 7 invariant Hu moments (log-transformed)
prop_black Proportion of near-black pixels (RGB ≤ (30,30,30))
mean_b,mean_g,mean_r Mean values for B, G and R channels
std_b,std_g,std_r Standard deviation of B, G and R channels
prop_dark_otsu Proportion of dark pixels in V channel (HSV + Otsu)
glcm_contrast GLCM contrast
glcm_homogeneity GLCM homogeneity
glcm_energy GLCM energy
glcm_correlation GLCM correlation
lbp_0lbp_9 LBP histogram (10 bins, “uniform” method)
hog_0hog_N HOG feature vector (gradient orientations)
label Class: 0 = bad, 1 = normal

Note:

  • train_reduced.csv and test_reduced.csv contain only the 10 most important features (calculated using RandomForestClassifier.feature_importances_) + label.

🧪 Methodology

  1. Preprocessing

    • Grayscale conversion + Otsu thresholding
    • Morphological operations (open)
    • Main contour detection + crop + resize (128×128)
  2. Feature Extraction

    • Shape: area, perimeter, aspect ratio, circularity, extent, solidity, Hu moments
    • Color: dark pixel proportion, RGB statistics, dark V-channel proportion (HSV + Otsu)
    • Texture: GLCM (contrast, homogeneity, energy, correlation), LBP (10 bins), HOG
  3. Feature Selection

    • Training a RandomForestClassifier(n_estimators=100) using train.csv
    • Feature importance ranking and retention of the top 10 features
    • Generation of train_reduced.csv and test_reduced.csv
  4. Dense Neural Network

    • Fully-connected layers: 128 → 64 → 32
    • BatchNormalization + Dropout(0.3)
    • Sigmoid output (binary_crossentropy)
    • Metrics: accuracy, AUC
    • Callbacks: EarlyStopping (patience=10), ModelCheckpoint

📈 Results and Metrics

Metric Full Model Reduced Model
Accuracy 90.20 % 89.71 %
F1-score (average) 0.9091 0.9041
ROC-AUC 0.9628 0.9511
Precision (bad) 0.88 0.87
Recall (bad) 0.91 0.91
Precision (normal) 0.93 0.93
Recall (normal) 0.89 0.88


FeijõesAI

Classificação automatizada de feijões bons e defeituosos Pipeline em Python + TensorFlow/Keras para segmentação, extração de atributos e rede neural densa.


📋 Participantes

  • João Pedro Rocha Senna
  • Thiago Tanaka Peczek

🤖 Nome da IA

FeijõesAI - FIA – Rede neural densa para classificação binária (ruim vs. bom).


🚀 Passo a passo de uso

  1. Clone o repositório

    git clone https://github.com/Arg0n4ut4/IAFIA.git
    cd IAFIA
  2. Crie e ative um ambiente virtual

    python -m venv venv
    source venv/bin/activate    # Linux/macOS
    venv\Scripts\activate       # Windows
  3. Instale as dependências

    pip install -r requirements.txt
  4. Pré-processamento Segmenta e recorta cada grão de data/raw/bons e data/raw/ruinsdata/processed/

    python src/preprocessing/segment.py data/raw data/processed --size 128
  5. Extração de features

    • Forma:

      python src/features/shape.py data/processed data/features/shape.csv
    • Cor:

      python src/features/color.py data/processed data/features/color.csv
    • Textura:

      python src/features/texture.py data/processed data/features/texture.csv
  6. Construção do dataset Junta shape, color e texture → divide 70% treino / 30% teste

    python src/datasets/build_dataset.py

    Gera:

    • data/features/train.csv
    • data/features/test.csv
  7. Treinamento da RNA Treina com train.csv (70% dos bons + 70% dos ruins) e valida internamente (20% de validação).

    python src/train.py

    Modelo salvo em data/models/rna.h5.

  8. Avaliação (Opcional) Rode evaluate.py para métricas finais no test.csv.

    python src/evaluate.py data/features/test.csv data/models/rna.h5

📊 Estrutura dos CSVs

train.csv / test.csv

Coluna Descrição
area Área do contorno do grão
perimeter Perímetro do contorno
aspect Razão largura/altura da bounding box
circularity 4π·area / perímetro²
extent Área ⁄ (largura·altura) da bounding box
solidity Área ⁄ (área do convex hull)
hu_1hu_7 7 momentos de Hu invariantes (log-transformados)
prop_black Proporção de pixels quase-pretos (RGB ≤ (30,30,30))
mean_b,mean_g,mean_r Média dos canais B, G, R
std_b,std_g,std_r Desvio-padrão dos canais B, G, R
prop_dark_otsu Proporção de pixels escuros no canal V (HSV + Otsu)
glcm_contrast Contraste da matriz GLCM
glcm_homogeneity Homogeneidade da matriz GLCM
glcm_energy Energia da matriz GLCM
glcm_correlation Correlação da matriz GLCM
lbp_0lbp_9 Histograma LBP (10 bins, método “uniform”)
hog_0hog_N Vetor HOG (orientações de gradiente)
label Classe: 0 = ruim, 1 = normal

Observação:

  • train_reduced.csv e test_reduced.csv contêm apenas as 10 features mais importantes (calculadas via RandomForestClassifier.feature_importances_) + label.

🧪 Metodologias

  1. Pré-processamento

    • Conversão para escala de cinza + Otsu
    • Operações morfológicas (open)
    • Detecção de contorno principal + crop + resize (128×128)
  2. Extração de atributos

    • Forma: área, perímetro, razão, circularidade, extent, solidity, momentos de Hu
    • Cor: proporção de pixels escuros, estatísticas RGB, proporção escura em V (HSV+Otsu)
    • Textura: GLCM (contrast, homog., energy, corr.), LBP (10 bins), HOG
  3. Seleção de features

    • Treinamento de RandomForestClassifier(n_estimators=100) em train.csv
    • Ordenação por importância e retenção das 10 top features
    • Geração de train_reduced.csv e test_reduced.csv
  4. Rede Neural Densa

    • Camadas fully-connected: 128 → 64 → 32
    • BatchNormalization + Dropout(0.3)
    • Saída Sigmoid (binary_crossentropy)
    • Métricas: accuracy, AUC
    • Callback: EarlyStopping (patience=10), ModelCheckpoint

📈 Resultados e Métricas

Métrica Modelo Completo Modelo Reduzido
Acurácia 90,20 % 89,71 %
F1-score (médio) 0,9091 0,9041
ROC-AUC 0,9628 0,9511
Precision (ruim) 0,88 0,87
Recall (ruim) 0,91 0,91
Precision (normal) 0,93 0,93
Recall (normal) 0,89 0,88

About

University assignment | AI to recognize the quality of bean grains

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages