Create TensorFlow Dataset from TFRecord files

Thiago G. Martins
1 min readNov 5, 2021

--

Prototyping with YouTube 8M video-level features

Play around with Youtube 8M video-level dataset. The goal of this post is to create a tf.data.Dataset from a set of .tfrecords file.

Link to the original notebook used to create this post.

Requirements

This code works with TensorFlow 2.6.0.

2.6.0

Load data

The sample data were downloaded with

per instruction available on the YouTube 8M dataset download page.

Load raw dataset

Import libraries and specify data_folder.

List .tfrecord files to be loaded.

/home/default/video/train0093.tfrecord
/home/default/video/train3749.tfrecord

Load .tfrecord files into a raw (not parsed) dataset.

Parse raw dataset

According to YouTube 8M dataset download section, the video-level data are stored as TensorFlow Example protocol buffers with the following text format:

Create a function to parse the raw data:

Apply the parse function to each file contained in the raw_dataset:

<MapDataset shapes: {
id: (1,),
labels: (None,),
mean_audio: (128,),
mean_rgb: (1024,)
}, types: {
id: tf.string,
labels: tf.int64,
mean_audio: tf.float32,
mean_rgb: tf.float32
}>

Check parsed dataset

{
'id': <tf.Tensor:
shape=(1,),
dtype=string,
numpy=array([b'eXbF'],
dtype=object)>,
'labels': <tf.Tensor:
shape=(2,),
dtype=int64,
numpy=array([ 0, 12])>,
'mean_audio': <tf.Tensor:
shape=(128,),
dtype=float32,
numpy=array([-1.2556146 , 0.17297305, ..., 0.81667864], dtype=float32)>,
'mean_rgb': <tf.Tensor:
shape=(1024,),
dtype=float32,
numpy=array([ 0.5198898 , 0.30175963, ..., -0.48050806], dtype=float32)>
}

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Thiago G. Martins
Thiago G. Martins

Written by Thiago G. Martins

Working on Vespa.ai. Follow me on Twitter @Thiagogm

No responses yet

Write a response