gtag('config', 'G-0PFHD683JR');
Price Prediction

How to write custom training episodes in Keras with Gradienttape

Content overview

  • Proven
  • introduction
  • Using Gradienttape: First example to a party
  • Low -level treatment with scales
  • Speed ​​your training step with TF.function
  • Low -level treatment of the losses followed by the model
  • summary
  • A comprehensive example: GAN training episode of scratch

Proven

import tensorflow as tf
import keras
from keras import layers
import numpy as np

introduction

Keras offers virtual training and evaluation seminars, fit() and evaluate(). Their use in the training and evaluation guide is covered with compact methods.

If you want to customize the learning algorithm for your model, while continuing to take advantage of comfort fit() (For example, to train GAN using fit()) You can Model Separation and implement yourself train_step() The way it is called over and over again during fit(). This is covered in the guide to allocating what is happening fit().

Now, if you want to control the very low level of training and evaluation, you must write your training and evaluation rings from the zero point. This is what this guide is about.

Using GradientTape: First example to a party

Call a model within a GradientTape SCOPE allows you to recover the gradients of the Training weights for the loss of the loss. Using an improved counterpart, you can use these shades to update these variables (which you can recover with using model.trainable_weights).

Let’s look at the MNIST Simple Model:

inputs = keras.Input(shape=(784,), name="digits")
x1 = layers.Dense(64, activation="relu")(inputs)
x2 = layers.Dense(64, activation="relu")(x1)
outputs = layers.Dense(10, name="predictions")(x2)
model = keras.Model(inputs=inputs, outputs=outputs)

Let’s train it with a mini -gradient with a custom -made training loop.

First, we will need an improved, a loss function, and a data set:


# Instantiate an optimizer.
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Prepare the training dataset.
batch_size = 64
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = np.reshape(x_train, (-1, 784))
x_test = np.reshape(x_test, (-1, 784))

# Reserve 10,000 samples for validation.
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]

# Prepare the training dataset.
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)

# Prepare the validation dataset.
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(batch_size)

Here is our training episode:

  • We open a for Episode
  • For every period, we open a for An episode repeated on the data set, in batches
  • For each batch, we open a GradientTape() range
  • Inside this range, we call the model (the front corridor) and the loss account
  • Outside of the range, we recover the gradients of the weight weights with regard to the loss
  • Finally, we use the benefactor to update the model weights based on gradients

epochs = 2
for epoch in range(epochs):
    print("\nStart of epoch %d" % (epoch,))

    # Iterate over the batches of the dataset.
    for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
        # Open a GradientTape to record the operations run
        # during the forward pass, which enables auto-differentiation.
        with tf.GradientTape() as tape:
            # Run the forward pass of the layer.
            # The operations that the layer applies
            # to its inputs are going to be recorded
            # on the GradientTape.
            logits = model(x_batch_train, training=True)  # Logits for this minibatch

            # Compute the loss value for this minibatch.
            loss_value = loss_fn(y_batch_train, logits)

        # Use the gradient tape to automatically retrieve
        # the gradients of the trainable variables with respect to the loss.
        grads = tape.gradient(loss_value, model.trainable_weights)

        # Run one step of gradient descent by updating
        # the value of the variables to minimize the loss.
        optimizer.apply_gradients(zip(grads, model.trainable_weights))

        # Log every 200 batches.
        if step % 200 == 0:
            print(
                "Training loss (for one batch) at step %d: %.4f"
                % (step, float(loss_value))
            )
            print("Seen so far: %s samples" % ((step + 1) * batch_size))

Start of epoch 0
WARNING:tensorflow:5 out of the last 5 calls to  triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
WARNING:tensorflow:6 out of the last 6 calls to  triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
Training loss (for one batch) at step 0: 131.3794
Seen so far: 64 samples
Training loss (for one batch) at step 200: 1.2871
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 1.2652
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 0.8800
Seen so far: 38464 samples

Start of epoch 1
Training loss (for one batch) at step 0: 0.8296
Seen so far: 64 samples
Training loss (for one batch) at step 200: 1.3322
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 1.0486
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 0.6610
Seen so far: 38464 samples

Low -level treatment with scales

Let’s add measures monitoring to this basic episode.

You can reuse compact scales (or those designated that you wrote) in this written training episodes from the zero point. This is the flow:

  • Create an equivalent of scale at the beginning of the episode
  • Call metric.update_state() After each batch
  • Call metric.result() When you need to display the current value of the scale
  • Call metric.reset_states() When you need to wipe the scale condition (usually at the end of the era)

Let’s use this knowledge of the account SparseCategoricalAccuracy On health verification data at the end of each era:


# Get model
inputs = keras.Input(shape=(784,), name="digits")
x = layers.Dense(64, activation="relu", name="dense_1")(inputs)
x = layers.Dense(64, activation="relu", name="dense_2")(x)
outputs = layers.Dense(10, name="predictions")(x)
model = keras.Model(inputs=inputs, outputs=outputs)

# Instantiate an optimizer to train the model.
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Prepare the metrics.
train_acc_metric = keras.metrics.SparseCategoricalAccuracy()
val_acc_metric = keras.metrics.SparseCategoricalAccuracy()

Here is our training and evaluation episode:


import time

epochs = 2
for epoch in range(epochs):
    print("\nStart of epoch %d" % (epoch,))
    start_time = time.time()

    # Iterate over the batches of the dataset.
    for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
        with tf.GradientTape() as tape:
            logits = model(x_batch_train, training=True)
            loss_value = loss_fn(y_batch_train, logits)
        grads = tape.gradient(loss_value, model.trainable_weights)
        optimizer.apply_gradients(zip(grads, model.trainable_weights))

        # Update training metric.
        train_acc_metric.update_state(y_batch_train, logits)

        # Log every 200 batches.
        if step % 200 == 0:
            print(
                "Training loss (for one batch) at step %d: %.4f"
                % (step, float(loss_value))
            )
            print("Seen so far: %d samples" % ((step + 1) * batch_size))

    # Display metrics at the end of each epoch.
    train_acc = train_acc_metric.result()
    print("Training acc over epoch: %.4f" % (float(train_acc),))

    # Reset training metrics at the end of each epoch
    train_acc_metric.reset_states()

    # Run a validation loop at the end of each epoch.
    for x_batch_val, y_batch_val in val_dataset:
        val_logits = model(x_batch_val, training=False)
        # Update val metrics
        val_acc_metric.update_state(y_batch_val, val_logits)
    val_acc = val_acc_metric.result()
    val_acc_metric.reset_states()
    print("Validation acc: %.4f" % (float(val_acc),))
    print("Time taken: %.2fs" % (time.time() - start_time))

Start of epoch 0
Training loss (for one batch) at step 0: 106.2691
Seen so far: 64 samples
Training loss (for one batch) at step 200: 0.9259
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 0.9347
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 0.7641
Seen so far: 38464 samples
Training acc over epoch: 0.7332
Validation acc: 0.8325
Time taken: 10.95s

Start of epoch 1
Training loss (for one batch) at step 0: 0.5238
Seen so far: 64 samples
Training loss (for one batch) at step 200: 0.7125
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 0.5705
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 0.6006
Seen so far: 38464 samples
Training acc over epoch: 0.8424
Validation acc: 0.8525
Time taken: 10.59s

Speed ​​up your training step with tf.function

The default operating time in Tensorflow 2 is implementation. As such, our training episode above is impatiently carried out.

This is great for correction, but collecting the graph has a specific performance feature. Your account is described as a fixed graphic fee to apply global performance improvements. This is impossible when the frame is restricted to carrying out one process by one, without knowing what comes after that.

You can assemble it in a fixed graph of any function that takes Tensors as inputs. Just add a @tf.function Decoration on that, like this:


@tf.function
def train_step(x, y):
    with tf.GradientTape() as tape:
        logits = model(x, training=True)
        loss_value = loss_fn(y, logits)
    grads = tape.gradient(loss_value, model.trainable_weights)
    optimizer.apply_gradients(zip(grads, model.trainable_weights))
    train_acc_metric.update_state(y, logits)
    return loss_value

Let’s do the same with the evaluation step:


@tf.function
def test_step(x, y):
    val_logits = model(x, training=False)
    val_acc_metric.update_state(y, val_logits)

Now, let’s restart our training episode with this translated training step:


import time

epochs = 2
for epoch in range(epochs):
    print("\nStart of epoch %d" % (epoch,))
    start_time = time.time()

    # Iterate over the batches of the dataset.
    for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
        loss_value = train_step(x_batch_train, y_batch_train)

        # Log every 200 batches.
        if step % 200 == 0:
            print(
                "Training loss (for one batch) at step %d: %.4f"
                % (step, float(loss_value))
            )
            print("Seen so far: %d samples" % ((step + 1) * batch_size))

    # Display metrics at the end of each epoch.
    train_acc = train_acc_metric.result()
    print("Training acc over epoch: %.4f" % (float(train_acc),))

    # Reset training metrics at the end of each epoch
    train_acc_metric.reset_states()

    # Run a validation loop at the end of each epoch.
    for x_batch_val, y_batch_val in val_dataset:
        test_step(x_batch_val, y_batch_val)

    val_acc = val_acc_metric.result()
    val_acc_metric.reset_states()
    print("Validation acc: %.4f" % (float(val_acc),))
    print("Time taken: %.2fs" % (time.time() - start_time))

Start of epoch 0
Training loss (for one batch) at step 0: 0.5162
Seen so far: 64 samples
Training loss (for one batch) at step 200: 0.4599
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 0.3975
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 0.2557
Seen so far: 38464 samples
Training acc over epoch: 0.8747
Validation acc: 0.8545
Time taken: 1.85s

Start of epoch 1
Training loss (for one batch) at step 0: 0.6145
Seen so far: 64 samples
Training loss (for one batch) at step 200: 0.3751
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 0.3464
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 0.4128
Seen so far: 38464 samples
Training acc over epoch: 0.8919
Validation acc: 0.8996
Time taken: 1.34s

Many faster, right?

Low -level treatment of the losses followed by the model

Repeat the layers and models repeatedly any losses created during the front pass, according to the layers that call self.add_loss(value). The resulting list of numerical loss values ​​is available through the property model.losses At the end of the front corridor.

If you want to use these losing components, you must summarize them and add them to the main loss in your training step.

Consider this layer, which creates the loss of activity regulation:


@keras.saving.register_keras_serializable()
class ActivityRegularizationLayer(layers.Layer):
    def call(self, inputs):
        self.add_loss(1e-2 * tf.reduce_sum(inputs))
        return inputs

Let’s build a really simple model that it uses:


inputs = keras.Input(shape=(784,), name="digits")
x = layers.Dense(64, activation="relu")(inputs)
# Insert activity regularization as a layer
x = ActivityRegularizationLayer()(x)
x = layers.Dense(64, activation="relu")(x)
outputs = layers.Dense(10, name="predictions")(x)

model = keras.Model(inputs=inputs, outputs=outputs)

Here’s what you have to go to the training step now:


@tf.function
def train_step(x, y):
    with tf.GradientTape() as tape:
        logits = model(x, training=True)
        loss_value = loss_fn(y, logits)
        # Add any extra losses created during the forward pass.
        loss_value += sum(model.losses)
    grads = tape.gradient(loss_value, model.trainable_weights)
    optimizer.apply_gradients(zip(grads, model.trainable_weights))
    train_acc_metric.update_state(y, logits)
    return loss_value

summary

You now know everything that can be known about the use of integrated training rings and writing your scratch.

In conclusion, here is a simple example from the end to the end that connects everything you learned in this guide: Dcgan trained on Mnist numbers.

A comprehensive example: GAN training episode of scratch

You may be familiar with hostile networks (GANS). GANS can create new images that look almost real, by learning the inherent distribution of the Training Data collection (“Came Perception” of Pictures).

GAN consists of two parts: a “birth” form that refers to the inherent space to points in the image space, and the “discrimination” model, a work that can determine the difference between real images (from the training data set) and fake photos (directing the generator orbiting).

The GAN training episode looks like this:

  1. Discrimination training. – A sample of a group of random points in the inherent space. Transform the points into false images via the “generator” model. Get a set of real photos and combine them with the created images. – Training the “Discrimination” model for classifying images created for real photos.
  2. Birthday training. – A random point sample in the inherent space. Transform the points into false images via the “Mawlid” network. Get a set of real photos and combine them with the created images. – Training the “generator” model to “deceive” discrimination and classify fake images as real.

For a more detailed overview of how Gans works, see deep learning with Python.

Let’s perform this training episode. First, create the intended discrimination by classifying fake numbers for real numbers:


discriminator = keras.Sequential(
    [
        keras.Input(shape=(28, 28, 1)),
        layers.Conv2D(64, (3, 3), strides=(2, 2), padding="same"),
        layers.LeakyReLU(alpha=0.2),
        layers.Conv2D(128, (3, 3), strides=(2, 2), padding="same"),
        layers.LeakyReLU(alpha=0.2),
        layers.GlobalMaxPooling2D(),
        layers.Dense(1),
    ],
    name="discriminator",
)
discriminator.summary()

Model: "discriminator"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 14, 14, 64)        640       
                                                                 
 leaky_re_lu (LeakyReLU)     (None, 14, 14, 64)        0         
                                                                 
 conv2d_1 (Conv2D)           (None, 7, 7, 128)         73856     
                                                                 
 leaky_re_lu_1 (LeakyReLU)   (None, 7, 7, 128)         0         
                                                                 
 global_max_pooling2d (Glob  (None, 128)               0         
 alMaxPooling2D)                                                 
                                                                 
 dense_4 (Dense)             (None, 1)                 129       
                                                                 
=================================================================
Total params: 74625 (291.50 KB)
Trainable params: 74625 (291.50 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

Then let us create a birth network, the underlying vectors turn into the outputs of the shape (28, 28, 1) (Representation of MNIST numbers):


latent_dim = 128

generator = keras.Sequential(
    [
        keras.Input(shape=(latent_dim,)),
        # We want to generate 128 coefficients to reshape into a 7x7x128 map
        layers.Dense(7 * 7 * 128),
        layers.LeakyReLU(alpha=0.2),
        layers.Reshape((7, 7, 128)),
        layers.Conv2DTranspose(128, (4, 4), strides=(2, 2), padding="same"),
        layers.LeakyReLU(alpha=0.2),
        layers.Conv2DTranspose(128, (4, 4), strides=(2, 2), padding="same"),
        layers.LeakyReLU(alpha=0.2),
        layers.Conv2D(1, (7, 7), padding="same", activation="sigmoid"),
    ],
    name="generator",
)

Here’s the main bit: training episode. As you see it is clear and direct. The training step function takes only 17 lines.


# Instantiate one optimizer for the discriminator and another for the generator.
d_optimizer = keras.optimizers.Adam(learning_rate=0.0003)
g_optimizer = keras.optimizers.Adam(learning_rate=0.0004)

# Instantiate a loss function.
loss_fn = keras.losses.BinaryCrossentropy(from_logits=True)


@tf.function
def train_step(real_images):
    # Sample random points in the latent space
    random_latent_vectors = tf.random.normal(shape=(batch_size, latent_dim))
    # Decode them to fake images
    generated_images = generator(random_latent_vectors)
    # Combine them with real images
    combined_images = tf.concat([generated_images, real_images], axis=0)

    # Assemble labels discriminating real from fake images
    labels = tf.concat(
        [tf.ones((batch_size, 1)), tf.zeros((real_images.shape[0], 1))], axis=0
    )
    # Add random noise to the labels - important trick!
    labels += 0.05 * tf.random.uniform(labels.shape)

    # Train the discriminator
    with tf.GradientTape() as tape:
        predictions = discriminator(combined_images)
        d_loss = loss_fn(labels, predictions)
    grads = tape.gradient(d_loss, discriminator.trainable_weights)
    d_optimizer.apply_gradients(zip(grads, discriminator.trainable_weights))

    # Sample random points in the latent space
    random_latent_vectors = tf.random.normal(shape=(batch_size, latent_dim))
    # Assemble labels that say "all real images"
    misleading_labels = tf.zeros((batch_size, 1))

    # Train the generator (note that we should *not* update the weights
    # of the discriminator)!
    with tf.GradientTape() as tape:
        predictions = discriminator(generator(random_latent_vectors))
        g_loss = loss_fn(misleading_labels, predictions)
    grads = tape.gradient(g_loss, generator.trainable_weights)
    g_optimizer.apply_gradients(zip(grads, generator.trainable_weights))
    return d_loss, g_loss, generated_images

Let’s train on GAN, by calling over and over again train_step In batches of pictures.

Since our discrimination and birth are convincing, you want to run this code on the graphics processing unit.


import os

# Prepare the dataset. We use both the training & test MNIST digits.
batch_size = 64
(x_train, _), (x_test, _) = keras.datasets.mnist.load_data()
all_digits = np.concatenate([x_train, x_test])
all_digits = all_digits.astype("float32") / 255.0
all_digits = np.reshape(all_digits, (-1, 28, 28, 1))
dataset = tf.data.Dataset.from_tensor_slices(all_digits)
dataset = dataset.shuffle(buffer_size=1024).batch(batch_size)

epochs = 1  # In practice you need at least 20 epochs to generate nice digits.
save_dir = "./"

for epoch in range(epochs):
    print("\nStart epoch", epoch)

    for step, real_images in enumerate(dataset):
        # Train the discriminator & generator on one batch of real images.
        d_loss, g_loss, generated_images = train_step(real_images)

        # Logging.
        if step % 200 == 0:
            # Print metrics
            print("discriminator loss at step %d: %.2f" % (step, d_loss))
            print("adversarial loss at step %d: %.2f" % (step, g_loss))

            # Save one generated image
            img = keras.utils.array_to_img(generated_images[0] * 255.0, scale=False)
            img.save(os.path.join(save_dir, "generated_img" + str(step) + ".png"))

        # To limit execution time we stop after 10 steps.
        # Remove the lines below to actually train the model!
        if step > 10:
            break

Start epoch 0
discriminator loss at step 0: 0.72
adversarial loss at step 0: 0.72

That’s it! You will get fake Mnist numbers with a beautiful look after only 30 years of GPU Colab training.


It was originally published on Tensorflow Web site, this article appears here under a new and licensed title under CC by 4.0. Shared code samples under APache 2.0 license

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button