Artificial: Keras Guide — High-Level Neural Networks API

Q: How do I handle imbalanced datasets in Keras?

: Use class_weight parameter in fit(), oversample minority classes with tf.data, or use specialized loss functions like Focal Loss.

Artificial Intelligence

Keras Guide — High-Level Neural Networks API

DodaTech Updated Jun 7, 2026 10 min read

Keras is a high-level deep learning API that provides intuitive, modular building blocks for creating and training neural networks, running on top of TensorFlow and offering both Sequential and Functional API styles for different complexity levels.

What You’ll Learn

You’ll build models with the Sequential API for linear stacks and the Functional API for complex architectures, configure various layer types (Dense, Conv2D, LSTM), train and validate models, use callbacks for model checkpointing and early stopping, save and restore models, apply transfer learning from pre-trained networks, and integrate with TensorFlow’s ecosystem.

Why Keras Matters

Keras dramatically lowers the barrier to entry for deep learning. Its clean, consistent API lets you prototype complex neural networks in minutes instead of hours. It’s the official high-level API of TensorFlow and is used by Google, NASA, and CERN. DodaTech uses Keras for rapid prototyping of security models — its Functional API lets us experiment with multi-input architectures that analyze both file metadata and binary content simultaneously.

Keras Learning Path

    flowchart LR
  A[Python & NumPy] --> B[Keras]
  B --> C[Sequential API]
  B --> D[Functional API]
  C --> E[Layers & Activations]
  D --> F[Multi-Input/Output]
  E --> G[Training & Callbacks]
  F --> G
  G --> H[Transfer Learning]
  H --> I[Deployment]
  B:::current

  classDef current fill:#D00000,color:#fff,stroke:#333,stroke-width:2px

Prerequisites: Solid Python programming. Basic understanding of neural networks (layers, activation functions, loss functions) is helpful. Familiarity with TensorFlow is not required but beneficial.

Sequential API — The Simplest Path

The Sequential API creates models as linear stacks of layers:

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np

# Build a sequential model for binary classification
model = keras.Sequential([
    layers.Dense(128, activation='relu', input_shape=(784,)),
    layers.Dropout(0.3),
    layers.Dense(64, activation='relu'),
    layers.Dropout(0.3),
    layers.Dense(1, activation='sigmoid')
])

# Display model architecture
model.summary()

# Output:
# Model: "sequential"
# _________________________________________________________________
#  Layer (type)                Output Shape              Param #
# =================================================================
#  dense (Dense)               (None, 128)               100480
#  dropout (Dropout)           (None, 128)               0
#  dense_1 (Dense)             (None, 64)                8256
#  dropout_1 (Dropout)         (None, 64)                0
#  dense_2 (Dense)             (None, 1)                 65
# =================================================================
# Total params: 108,801
# Trainable params: 108,801
# Non-trainable params: 0
# _________________________________________________________________

# Compile the model
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=0.001),
    loss='binary_crossentropy',
    metrics=['accuracy', keras.metrics.AUC(name='auc')]
)

# Generate dummy data
X_train = np.random.randn(5000, 784).astype('float32')
y_train = np.random.randint(0, 2, (5000, 1)).astype('float32')
X_val = np.random.randn(1000, 784).astype('float32')
y_val = np.random.randint(0, 2, (1000, 1)).astype('float32')

# Train the model
history = model.fit(
    X_train, y_train,
    batch_size=64,
    epochs=20,
    validation_data=(X_val, y_val),
    verbose=1
)

print(f"Final training accuracy: {history.history['accuracy'][-1]:.4f}")
print(f"Final validation accuracy: {history.history['val_accuracy'][-1]:.4f}")

Output: The model trains for 20 epochs, showing loss and accuracy for both training and validation sets after each epoch. The summary shows 108,801 trainable parameters across 3 dense layers with dropout regularization.

Functional API — Complex Architectures

The Functional API supports non-linear topologies, shared layers, and multi-input/output models:

# Multi-input model: process text and metadata separately
# Text input
text_input = keras.Input(shape=(100,), name='text')
text_branch = layers.Embedding(10000, 128)(text_input)
text_branch = layers.GlobalAveragePooling1D()(text_branch)
text_branch = layers.Dense(64, activation='relu')(text_branch)

# Metadata input
meta_input = keras.Input(shape=(10,), name='metadata')
meta_branch = layers.Dense(32, activation='relu')(meta_input)
meta_branch = layers.Dropout(0.2)(meta_branch)

# Combine branches
combined = layers.concatenate([text_branch, meta_branch])
combined = layers.Dense(32, activation='relu')(combined)
output = layers.Dense(5, activation='softmax', name='class_output')(combined)

# Create the model
model = keras.Model(
    inputs=[text_input, meta_input],
    outputs=output
)

model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

print(model.summary())

# Generate dummy multi-input data
X_text = np.random.randint(0, 10000, (1000, 100))
X_meta = np.random.randn(1000, 10)
y = np.random.randint(0, 5, (1000,))

# Train with multiple inputs
history = model.fit(
    {'text': X_text, 'metadata': X_meta},
    y,
    batch_size=32,
    epochs=10,
    validation_split=0.2
)

# Functional API also supports multiple outputs
def build_multi_output_model():
    """Model that predicts both class and confidence"""
    inputs = keras.Input(shape=(784,), name='features')
    
    x = layers.Dense(128, activation='relu')(inputs)
    x = layers.Dense(64, activation='relu')(x)
    
    class_output = layers.Dense(10, activation='softmax', name='class_pred')(x)
    confidence_output = layers.Dense(1, activation='sigmoid', name='confidence')(x)
    
    return keras.Model(inputs=inputs, outputs=[class_output, confidence_output])

multi_model = build_multi_output_model()
multi_model.compile(
    optimizer='adam',
    loss={
        'class_pred': 'sparse_categorical_crossentropy',
        'confidence': 'binary_crossentropy'
    },
    metrics={
        'class_pred': 'accuracy',
        'confidence': 'accuracy'
    }
)

Output: The Functional API creates a model with two separate input branches that merge into a combined classifier. The model summary shows the full computation graph. The multi-output variant demonstrates predicting both the class and confidence score from the same features.

Convolutional Neural Network (CNN) for Images

# CNN for image classification (CIFAR-10)
def build_cnn():
    model = keras.Sequential([
        # First convolutional block
        layers.Conv2D(32, (3, 3), padding='same', input_shape=(32, 32, 3)),
        layers.BatchNormalization(),
        layers.Activation('relu'),
        layers.Conv2D(32, (3, 3), padding='same'),
        layers.BatchNormalization(),
        layers.Activation('relu'),
        layers.MaxPooling2D((2, 2)),
        layers.Dropout(0.25),
        
        # Second convolutional block
        layers.Conv2D(64, (3, 3), padding='same'),
        layers.BatchNormalization(),
        layers.Activation('relu'),
        layers.Conv2D(64, (3, 3), padding='same'),
        layers.BatchNormalization(),
        layers.Activation('relu'),
        layers.MaxPooling2D((2, 2)),
        layers.Dropout(0.25),
        
        # Third convolutional block
        layers.Conv2D(128, (3, 3), padding='same'),
        layers.BatchNormalization(),
        layers.Activation('relu'),
        layers.MaxPooling2D((2, 2)),
        layers.Dropout(0.25),
        
        # Classifier
        layers.Flatten(),
        layers.Dense(256, activation='relu'),
        layers.Dropout(0.5),
        layers.Dense(10, activation='softmax')
    ])
    
    return model

cnn = build_cnn()
cnn.compile(
    optimizer=keras.optimizers.Adam(learning_rate=0.001),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# Load CIFAR-10 data
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()

# Normalize pixel values
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Data augmentation
data_augmentation = keras.Sequential([
    layers.RandomFlip('horizontal'),
    layers.RandomRotation(0.1),
    layers.RandomZoom(0.1),
], name='data_augmentation')

# Training with augmented data
history = cnn.fit(
    x_train, y_train,
    batch_size=64,
    epochs=30,
    validation_data=(x_test, y_test),
    callbacks=[
        keras.callbacks.EarlyStopping(
            monitor='val_loss',
            patience=5,
            restore_best_weights=True
        ),
        keras.callbacks.ReduceLROnPlateau(
            monitor='val_loss',
            factor=0.5,
            patience=3,
            min_lr=1e-6
        )
    ]
)

# Evaluate
test_loss, test_acc = cnn.evaluate(x_test, y_test)
print(f"Test accuracy: {test_acc:.4f}")

Output: The CNN trains with data augmentation (random flips, rotations, zooms). Early stopping halts training after validation loss plateaus for 5 epochs, restoring the best weights. Learning rate reduces by half when validation loss stagnates. The test accuracy reflects generalization to unseen data.

Callbacks — Training Intelligence

Callbacks execute actions during training:

from tensorflow.keras import callbacks

# Create callbacks
checkpoint = callbacks.ModelCheckpoint(
    filepath='best_model.keras',
    monitor='val_accuracy',
    save_best_only=True,
    save_weights_only=False,
    verbose=1
)

early_stop = callbacks.EarlyStopping(
    monitor='val_loss',
    patience=10,
    restore_best_weights=True,
    verbose=1
)

reduce_lr = callbacks.ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.5,
    patience=5,
    min_lr=1e-7,
    verbose=1
)

tensorboard = callbacks.TensorBoard(
    log_dir='./logs',
    histogram_freq=1,
    write_graph=True,
    update_freq='epoch'
)

csv_logger = callbacks.CSVLogger('training_log.csv')

# Custom callback
class MetricsLogger(callbacks.Callback):
    def on_epoch_end(self, epoch, logs=None):
        if logs:
            print(f"Epoch {epoch+1}: "
                  f"loss={logs['loss']:.4f}, "
                  f"val_loss={logs['val_loss']:.4f}, "
                  f"lr={self.model.optimizer.lr.numpy():.6f}")

# Train with all callbacks
model.fit(
    X_train, y_train,
    batch_size=64,
    epochs=50,
    validation_data=(X_val, y_val),
    callbacks=[
        checkpoint,
        early_stop,
        reduce_lr,
        tensorboard,
        csv_logger,
        MetricsLogger()
    ]
)

Output: The best model (highest validation accuracy) is saved to best_model.keras. Training stops if validation loss doesn’t improve for 10 epochs. Learning rate halves every 5 epochs of stagnation. Training metrics stream to TensorBoard for visualization. A custom callback prints detailed epoch stats.

Saving and Loading Models

# Save the entire model (architecture + weights + optimizer state)
model.save('malware_classifier.keras')

# Load the model
loaded_model = keras.models.load_model('malware_classifier.keras')

# Predict with loaded model
predictions = loaded_model.predict(X_test[:10])
print(f"Predictions shape: {predictions.shape}")
print(f"First prediction: {predictions[0]}")

# Save only weights
model.save_weights('model_weights.weights.h5')

# Load weights into a compatible architecture
new_model = build_cnn()
new_model.load_weights('model_weights.weights.h5')

# Export for TensorFlow Lite (mobile deployment)
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()

with open('model.tflite', 'wb') as f:
    f.write(tflite_model)

print(f"TFLite model size: {len(tflite_model) / 1024:.1f} KB")

Transfer Learning

Leverage pre-trained models for your custom task:

# Load a pre-trained model (without the top classifier)
base_model = keras.applications.ResNet50(
    weights='imagenet',
    include_top=False,
    input_shape=(224, 224, 3)
)

# Freeze base model layers
base_model.trainable = False

# Add custom classifier on top
inputs = keras.Input(shape=(224, 224, 3))
x = base_model(inputs, training=False)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dropout(0.4)(x)
x = layers.Dense(128, activation='relu')(x)
outputs = layers.Dense(5, activation='softmax')(x)

model = keras.Model(inputs, outputs)

model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=0.001),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# Train the classifier head (base model frozen)
model.fit(X_train, y_train, epochs=10, validation_data=(X_val, y_val))

# Fine-tuning: unfreeze some base layers
base_model.trainable = True
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=0.0001),  # Lower LR
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# Continue training with very low learning rate
model.fit(X_train, y_train, epochs=5, validation_data=(X_val, y_val))

Output: First, only the newly added classifier layers train for 10 epochs (base model frozen). Then, the base model is unfrozen and the entire network fine-tunes with a 10x lower learning rate, adjusting pre-trained features for the new task.

Security Angle: Model Robustness

class AdversarialTrainingCallback(keras.callbacks.Callback):
    """Augment training with adversarial examples"""
    def __init__(self, epsilon=0.01):
        super().__init__()
        self.epsilon = epsilon
    
    def on_batch_end(self, batch, logs=None):
        # Get current batch
        if hasattr(self.model, 'current_batch'):
            x_batch, y_batch = self.model.current_batch
            
            # Generate adversarial examples using FGSM
            with tf.GradientTape() as tape:
                tape.watch(x_batch)
                preds = self.model(x_batch, training=True)
                loss = self.compiled_loss(y_batch, preds)
            
            grads = tape.gradient(loss, x_batch)
            adversarial_x = x_batch + self.epsilon * tf.sign(grads)
            
            # Train on adversarial examples
            self.model.train_on_batch(adversarial_x, y_batch)

# Use in training
model.fit(
    X_train, y_train,
    callbacks=[AdversarialTrainingCallback(epsilon=0.02)]
)

DodaTech uses adversarial training to make Durga Antivirus Pro’s neural network classifiers resistant to evasion attacks, where malware authors subtly modify binary files to avoid detection.

Common Mistakes Beginners Make

Forgetting to normalize input data: Neural networks expect inputs in a consistent range (typically [0,1] or [-1,1]). Raw pixel values (0-255) cause slow or failed convergence.
Using too many epochs without early stopping: Training past the point of convergence leads to overfitting. Always use EarlyStopping callback with restore_best_weights=True.
Not using validation data: Training accuracy alone is misleading. Always hold out validation data to monitor generalization.
Overfitting on small datasets: Small datasets (hundreds of samples) with large models (millions of parameters) almost always overfit. Use heavy regularization, dropout, data augmentation, or start with a smaller model.
Ignoring class imbalance: If one class dominates (e.g., 95% benign, 5% malware), accuracy is misleading. Use class weights, oversampling, or specialized loss functions.
Not freezing base layers in transfer learning: Training pre-trained layers with a high learning rate destroys learned features. Freeze first, then fine-tune with a low learning rate.

Practice Questions

What is the difference between Sequential and Functional API?
What does EarlyStopping do and why is it important?
How does transfer learning work in Keras?
What is the purpose of the validation_data parameter in fit()?
How do you save and restore a trained Keras model?

Answers:

Sequential API is for linear layer stacks; Functional API supports branching, multi-input, multi-output, and shared layers. Functional API is more flexible but slightly more verbose.
EarlyStopping monitors a metric (usually validation loss) and stops training when it stops improving, preventing overfitting and saving training time.
Load a pre-trained model (trained on a large dataset like ImageNet), freeze its layers, add new classifier layers for your task, train the new layers, then optionally fine-tune the whole model with a low learning rate.
validation_data provides data for evaluating the model after each epoch. The validation metrics help detect overfitting and guide callbacks like EarlyStopping.
Use model.save('path.keras') to save the entire model (architecture, weights, optimizer state) and keras.models.load_model('path.keras') to restore it.

Challenge

Build a multi-class image classifier using the Functional API: create a model with a shared convolutional base and two output heads — one for object category and one for image quality (blurry/clear). Use data augmentation, callbacks (checkpointing, early stopping, learning rate reduction), and evaluate with both accuracy and confusion matrix.

Real-World Task

Create a document classifier for security scanning: use transfer learning from EfficientNet to classify documents as “safe” or “suspicious”, add a second output that predicts document type (PDF, DOCX, HTML), implement real-time inference pipeline with tf.data for efficient file loading, export as TensorFlow SavedModel, and containerize with TensorFlow Serving for deployment.

FAQ

What is the relationship between Keras and TensorFlow?

: Keras is the official high-level API of TensorFlow (tf.keras). It provides a user-friendly interface while TensorFlow handles the low-level computation, graph optimization, and deployment.

When should I use Sequential vs Functional API?

: Use Sequential for simple linear stacks (most common). Use Functional for models with multiple inputs/outputs, shared layers, residual connections, or non-linear topology.

Does Keras support GPU acceleration?

: Yes. Keras automatically uses TensorFlow’s GPU support. Install TensorFlow with CUDA support, and GPU training happens automatically — no code changes needed.

How do I handle imbalanced datasets in Keras?

: Use class_weight parameter in fit(), oversample minority classes with tf.data, or use specialized loss functions like Focal Loss.

Can I export Keras models to mobile or web?

: Yes. Convert to TensorFlow Lite for mobile/embedded, TensorFlow.js for web, or TensorFlow Serving for production servers.

Try It Yourself

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Build and train a simple model in 10 lines
model = keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=(100,)),
    layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

import numpy as np
X = np.random.randn(1000, 100)
y = np.random.randint(0, 2, (1000, 1))

history = model.fit(X, y, epochs=5, validation_split=0.2, verbose=1)
print(f"Accuracy: {history.history['accuracy'][-1]:.3f}")