Hands-on Machine Learning with Scikit-Learn, Keras & TensorFlow 2nd Edition 第十三章课后练习题
中文翻译请参考本书第一版
Why would you want to use the Data API?
What are the benefits of splitting a large dataset into multiple files?
During training, how can you tell that your input pipeline is the bottleneck? What can you do to fix it?
Can you save any binary data to a TFRecord file, or only serialized protocol buffers?
Why would you go through the hassle of converting all your data to the Example
protobuf format? Why not use your own protobuf definition?
When using TFRecords, when would you want to activate compression? Why not do it systematically?
Data can be preprocessed directly when writing the data files, or within the tf.data pipeline, or in preprocessing layers within your model, or using TF Transform. Can you list a few pros and cons of each option?
Name a few common techniques you can use to encode categorical features. What about text?
Load the Fashion MNIST dataset (introduced in Chapter 10); split it into a training set, a validation set, and a test set; shuffle the training set; and save each dataset to multiple TFRecord files. Each record should be a serialized Example protobuf with two features: the serialized image (use tf.io.serialize_tensor()
to serialize each image), and the label. Then use tf.data to create an efficient dataset for each set. Finally, use a Keras model to train these datasets, including a preprocessing layer to standardize each input feature. Try to make the input pipeline as efficient as possible, using TensorBoard to visualize profiling data.
In this exercise you will download a dataset, split it, create a tf.data.Dataset
to load it and preprocess it efficiently, then build and train a binary classification model containing an Embedding
layer:
tf.data
to create an efficient dataset for each set.TextVectorization
layer to preprocess each review. If the TextVectorization
layer is not yet available (or if you like a challenge), try to create your own custom preprocessing layer: you can use the functions in the tf.strings
package, for example lower()
to make everything lowercase, regex_replace()
to replace punctuation with spaces, and split()
to split words on spaces. You should use a lookup table to output word indices, which must be prepared in the adapt()
method.Embedding
layer and compute the mean embedding for each review, multiplied by the square root of the number of words (see Chapter 16). This rescaled mean embedding can then be passed to the rest of your model.tfds.load("imdb_reviews")
.ID:
链接:
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。