如何在Tensorflow.js中处理MNIST图像数据

news/2024/7/7 21:12:15

by Kevin Scott

凯文·斯科特(Kevin Scott)

如何在Tensorflow.js中处理MNIST图像数据 (How to deal with MNIST image data in Tensorflow.js)

There’s the joke that 80 percent of data science is cleaning the data and 20 percent is complaining about cleaning the data … data cleaning is a much higher proportion of data science than an outsider would expect. Actually training models is typically a relatively small proportion (less than 10 percent) of what a machine learner or data scientist does.

有人开玩笑说,80%的数据科学正在清理数据,20%的人们抱怨清理数据……数据清理在数据科学中所占的比例比外界预期的要高得多。 实际上,训练模型通常只占机器学习者或数据科学家所做工作的一小部分(不到10%)。

There’s the joke that 80 percent of data science is cleaning the data and 20 percent is complaining about cleaning the data … data cleaning is a much higher proportion of data science than an outsider would expect. Actually training models is typically a relatively small proportion (less than 10 percent) of what a machine learner or data scientist does.

有人开玩笑说,80%的数据科学正在清理数据,20%的人们抱怨清理数据……数据清理比外部人期望的要高得多。 实际上,训练模型通常只占机器学习者或数据科学家所做工作的一小部分(不到10%)。

— Anthony Goldbloom, CEO of Kaggle

— Kaggle首席执行官Anthony Goldbloom

Manipulating data is a crucial step for any machine learning problem. This article will take the MNIST example for Tensorflow.js (0.11.1), and walk through the code that handles the data loading line-by-line.

对于任何机器学习问题,处理数据都是至关重要的一步。 本文将以Tensorflow.js(0.11.1)的MNIST示例为例 ,并逐行介绍处理数据加载的代码。

MNIST示例 (MNIST example)

18 import * as tf from '@tensorflow/tfjs';1920 const IMAGE_SIZE = 784;21 const NUM_CLASSES = 10;22 const NUM_DATASET_ELEMENTS = 65000;2324 const NUM_TRAIN_ELEMENTS = 55000;25 const NUM_TEST_ELEMENTS = NUM_DATASET_ELEMENTS - NUM_TRAIN_ELEMENTS;2627 const MNIST_IMAGES_SPRITE_PATH =28     'https://storage.googleapis.com/learnjs-data/model-builder/mnist_images.png';29 const MNIST_LABELS_PATH =30     'https://storage.googleapis.com/learnjs-data/model-builder/mnist_labels_uint8';`

First, the code imports Tensorflow (make sure you’re transpiling your code!), and establishes some constants, including:

首先,代码导入Tensorflow (确保您正在编译代码!) ,并建立一些常量,包括:

  • IMAGE_SIZE – the size of an image (width and height of 28x28 = 784)

    IMAGE_SIZE –图片大小(宽度和高度28x28 = 784)

  • NUM_CLASSES – number of label categories (a number can be 0-9, so there's 10 classes)

    NUM_CLASSES –标签类别的数量(一个数字可以是0-9,因此有10个类别)

  • NUM_DATASET_ELEMENTS – number of images total (65,000)

    NUM_DATASET_ELEMENTS –图像总数(65,000)

  • NUM_TRAIN_ELEMENTS – number of training images (55,000)

    NUM_TRAIN_ELEMENTS –训练图像数(55,000)

  • NUM_TEST_ELEMENTS – number of test images (10,000, aka the remainder)

    NUM_TEST_ELEMENTS –测试图像的数量(10,000,也称为余数)

  • MNIST_IMAGES_SPRITE_PATH & MNIST_LABELS_PATH – paths to the images and the labels

    MNIST_IMAGES_SPRITE_PATHMNIST_LABELS_PATH –图像和标签的路径

The images are concatenated into one huge image which looks like:

这些图像被串联成一个巨大的图像,看起来像:

MNISTData (MNISTData)

Next up, starting on line 38, is MnistData, a class that exposes the following functions:

接下来,从第38行开始是MnistData ,该类提供以下功能:

  • load – responsible for asynchronously loading the image and labeling data

    load –负责异步加载图像和标签数据

  • nextTrainBatch – load the next training batch

    nextTrainBatch加载下一个训练批次

  • nextTestBatch – load the next test batch

    nextTestBatch –加载下一个测试批次

  • nextBatch – a generic function to return the next batch, depending on whether it is in the training set or test set

    nextBatch –返回下一批的通用函数,具体取决于它在训练集中还是在测试集中

For the purposes of getting started, this article will only go through the load function.

为了入门,本文将仅介绍load函数。

load (load)

44 async load() {45   // Make a request for the MNIST sprited image.46   const img = new Image();47   const canvas = document.createElement('canvas');48   const ctx = canvas.getContext('2d');

async is a relatively new language feature in Javascript for which you will need a transpiler.

async 是Javascript中相对较新的语言功能 ,您需要使用该功能。

The Image object is a native DOM function that represents an image in memory. It provides callbacks for when the image is loaded along, with access to the image attributes. canvas is another DOM element that provides easy access to pixel arrays and processing by way of context.

Image对象是本机DOM函数,表示内存中的图像。 它提供了在加载图像时的回调以及对图像属性的访问。 canvas是另一个DOM元素,可以通过context轻松访问像素数组和进行处理。

Since both of these are DOM elements, if you’re working in Node.js (or a Web Worker) you won’t have access to these elements. For an alternative approach, see below.

由于这两个都是DOM元素,因此,如果您在Node.js(或Web Worker)中工作,则将无法访问这些元素。 有关替代方法,请参见下文 。

imgRequest (imgRequest)

49 const imgRequest = new Promise((resolve, reject) => {50   img.crossOrigin = '';51   img.onload = () => {52     img.width = img.naturalWidth;53     img.height = img.naturalHeight;

The code initializes a new promise that will be resolved once the image is loaded successfully. This example does not explicitly handle the error state.

该代码初始化一个新的Promise,一旦成功加载图像,该Promise将被解决。 本示例未明确处理错误状态。

crossOrigin is an img attribute that allows for the loading of images across domains, and gets around CORS (cross-origin resource sharing) issues when interacting with the DOM. naturalWidth and naturalHeight refer to the original dimensions of the loaded image, and serve to enforce that the image's size is correct when performing calculations.

crossOrigin是一个img属性,它允许跨域加载图像,并且在与DOM交互时crossOrigin了CORS(跨域资源共享)问题。 naturalWidthnaturalHeight是指加载的图像的原始尺寸,用于在执行计算时强制图像的大小正确。

55     const datasetBytesBuffer =56     new ArrayBuffer(NUM_DATASET_ELEMENTS * IMAGE_SIZE * 4);5758     const chunkSize = 5000;59     canvas.width = img.width;60     canvas.height = chunkSize;

The code initializes a new buffer to contain every pixel of every image. It multiplies the total number of images by the size of each image by the number of channels (4).

该代码初始化一个新缓冲区,以包含每个图像的每个像素。 它将图像总数乘以每个图像的大小乘以通道数(4)。

I believe that chunkSize is used to prevent the UI from loading too much data into memory at once, though I'm not 100% sure.

相信chunkSize用于阻止加载太多的数据UI到内存中一次,虽然我不是100%肯定。

62     for (let i = 0; i < NUM_DATASET_ELEMENTS / chunkSize; i++) {63       const datasetBytesView = new Float32Array(64         datasetBytesBuffer, i * IMAGE_SIZE * chunkSize * 4,65         IMAGE_SIZE * chunkSize);66       ctx.drawImage(67         img, 0, i * chunkSize, img.width, chunkSize, 0, 0, img.width,68         chunkSize);6970       const imageData = ctx.getImageData(0, 0, canvas.width, canvas.height);

This code loops through every image in the sprite and initializes a new TypedArray for that iteration. Then, the context image gets a chunk of the image drawn. Finally, that drawn image is turned into image data using context's getImageData function, which returns an object representing the underlying pixel data.

此代码循环遍历子画面中的每个图像,并为该迭代初始化一个新的TypedArray 。 然后,上下文图像将获得绘制图像的一部分。 最后,使用上下文的getImageData函数将该绘制的图像转换为图像数据,该函数返回一个表示基础像素数据的对象。

72       for (let j = 0; j < imageData.data.length / 4; j++) {73         // All channels hold an equal value since the image is grayscale, so74         // just read the red channel.75         datasetBytesView[j] = imageData.data[j * 4] / 255;76       }77     }

We loop through the pixels, and divide by 255 (the maximum possible value of a pixel) to clamp the values between 0 and 1. Only the red channel is necessary, since it’s a grayscale image.

我们遍历像素,然后除以255(像素的最大可能值)以将值限制在0和1之间。由于红色通道是灰度图像,因此仅需要红色通道。

78     this.datasetImages = new Float32Array(datasetBytesBuffer);7980     resolve();81   };82   img.src = MNIST_IMAGES_SPRITE_PATH;83 });

This line takes the buffer, recasts it into a new TypedArray that holds our pixel data, and then resolves the Promise. The last line (setting the src) actually begins loading the image, which starts the function.

这行代码将缓冲区,将其TypedArray到容纳我们的像素数据的新TypedArray中,然后解析Promise。 最后一行(设置src )实际上开始加载图像,从而启动功能。

One thing that confused me at first was the behavior of TypedArray in relation to its underlying data buffer. You might notice that datasetBytesView is set within the loop, but is never returned.

一开始让我感到困惑的是TypedArray与其底层数据缓冲区有关的行为。 您可能会注意到, datasetBytesView是在循环内设置的,但是从不返回。

Under the hood, datasetBytesView is referencing the buffer datasetBytesBuffer (with which it is initialized). When the code updates the pixel data, it is indirectly editing the values of the buffer itself, which in turn is recast into a new Float32Array on line 78.

datasetBytesViewdatasetBytesView引用了缓冲区datasetBytesBuffer (用于对其进行初始化)。 当代码更新像素数据时,它正在间接编辑缓冲区本身的值,然后将其Float32Array到第78行的新Float32Array

在DOM之外获取图像数据 (Fetching image data outside of the DOM)

If you’re in the DOM, you should use the DOM. The browser (through canvas) takes care of figuring out the format of images and translating buffer data into pixels. But if you're working outside the DOM (say, in Node.js, or a Web Worker), you'll need an alternative approach.

如果您在DOM中,则应使用DOM。 浏览器(通过canvas )负责确定图像的格式并将缓冲区数据转换为像素。 但是,如果您在DOM之外工作(例如,在Node.js或Web Worker中),则需要另一种方法。

fetch provides a mechanism, response.arrayBuffer, which gives you access to a file's underlying buffer. We can use this to read the bytes manually, avoiding the DOM entirely. Here's an alternative approach to writing the above code (this code requires fetch, which can be polyfilled in Node with something like isomorphic-fetch):

fetch提供了一种机制response.arrayBuffer ,使您可以访问文件的基础缓冲区。 我们可以使用它来手动读取字节,从而完全避免使用DOM。 这是编写以上代码的另一种方法(此代码需要fetch ,可以将它用isomorphic-fetch类的东西填充到Node中):

const imgRequest = fetch(MNIST_IMAGES_SPRITE_PATH).then(resp => resp.arrayBuffer()).then(buffer => {  return new Promise(resolve => {    const reader = new PNGReader(buffer);    return reader.parse((err, png) => {      const pixels = Float32Array.from(png.pixels).map(pixel => {        return pixel / 255;      });      this.datasetImages = pixels;      resolve();    });  });});

This returns an array buffer for the particular image. When writing this, I first attempted to parse the incoming buffer myself, which I wouldn’t recommend. (If you are interested in doing that, here’s some information on how to read an array buffer for a png.) Instead, I elected to use pngjs, which handles the png parsing for you. When dealing with other image formats, you'll have to figure out the parsing functions yourself.

这将返回特定图像的数组缓冲区。 在编写此代码时,我首先尝试自己解析传入的缓冲区,我不建议这样做。 (如果您对此感兴趣, 这里有一些有关如何读取png数组缓冲区的信息 。)相反,我选择使用pngjs ,它为您处理png解析。 处理其他图像格式时,您必须自己弄清楚解析函数。

只是划伤表面 (Just scratching the surface)

Understanding data manipulation is a crucial component of machine learning in JavaScript. By understanding our use cases and requirements, we can use a few key functions to elegantly format our data correctly for our needs.

了解数据操作是JavaScript机器学习的重要组成部分。 通过了解我们的用例和需求,我们可以使用一些关键功能来优雅地正确格式化我们的数据以满足我们的需求。

The Tensorflow.js team is continuously changing the underlying data API in Tensorflow.js. This can help accommodate more of our needs as the API evolves. This also means that it’s worth staying abreast of developments to the API as Tensorflow.js continues to grow and be improved.

Tensorflow.js团队正在不断更改Tensorflow.js中的基础数据API。 随着API的发展,这可以帮助满足我们的更多需求。 这也意味着,随着Tensorflow.js的持续增长和改进,有必要紧跟API的发展 。

Originally published at thekevinscott.com

最初发布于thekevinscott.com

Special thanks to Ari Zilnik.

特别感谢Ari Zilnik 。

翻译自: https://www.freecodecamp.org/news/how-to-deal-with-mnist-image-data-in-tensorflow-js-169a2d6941dd/


http://lihuaxi.xjx100.cn/news/239463.html

相关文章

Xcode 创建.a和framework静态库(转)

最近因为项目中的聊天SDK&#xff0c;需要封装成静态库&#xff0c;所以实践了一下创建静态库的步骤&#xff0c;做下记录。 库介绍 库从本质上来说是一种可执行代码的二进制格式&#xff0c;可以被载入内存中执行。库分静态库和动态库两种。iOS中的静态库有 .a 和 .framework两…

mysql减少锁等待_降低锁竞争 减少MySQL用户等待时间

【IT168 技术】通过锁机制&#xff0c;可以实现多线程同时对某个表进行操作。如下图所示&#xff0c;在某个时刻&#xff0c;用户甲、用户乙、用户丙可能会同时或者先后(前面一个作业还没有完成)对数据表A进行查询或者更新的操作。当某个线程涉及到更新操作时&#xff0c;就需要…

firebase 推送_如何使用Firebase向Web应用程序添加推送通知?

firebase 推送by Leonardo Cardoso由莱昂纳多卡多佐(Leonardo Cardoso) 如何使用Firebase向Web应用程序添加推送通知&#xff1f; (How to add push notifications to a web app with Firebase ??) As web applications evolve, it is increasingly common to come across f…

Linux硬盘性能测试工具 - FIO

1.安装&#xff1a;方法一&#xff1a;直接用指令yum -y install fio方法二&#xff1a;如果方法一不可行则&#xff0c;在官网http://freshmeat.net/projects/fio/下载fio的安装包。安装方法很简单。解压缩后&#xff0c;进入目录输入./configure make make install。2.执行…

610D - Vika and Segments(线段树+扫描线+离散化)

扫描线&#xff1a;http://www.cnblogs.com/scau20110726/archive/2013/04/12/3016765.html 看图&#xff0c;图中的数字是横坐标离散后对应的下标&#xff0c;计算时左端点不变&#xff0c;右端点加1&#xff0c;所以总的更新的区间是l到r-1。 也可以理解为1代表的是&#xff…

识别手写字体app_我如何构建手写识别器并将其运送到App Store

识别手写字体app从构建卷积神经网络到将OCR部署到iOS (From constructing a Convolutional Neural Network to deploying an OCR to iOS) 项目动机✍️?? (The Motivation for the Project ✍️ ??) While I was learning how to create deep learning models for the MNIS…

linux mysql 不稳定_linux,mysql:今天写出一个十分弱智的bug!

今天写出一个十分弱智的bug&#xff0c;记录一下&#xff0c;提醒自己以后别这种犯错&#xff0c;不怕丢人哈~在写一个分页查询记录的sql时&#xff0c;要根据添加的时间逆序分页输出&#xff0c;之前的写法是酱紫&#xff1a;selectrecord.a, y.c from ( selecta,b from xorde…

在全面部署 IPV6 前,你需要了解都在这儿

IPv6 的发展形势 近日&#xff0c;中办国办印发《推进互联网协议第六版&#xff08;IPv6&#xff09;规模部署行动计划》&#xff08;以下简称《计划》&#xff09;&#xff0c;加快推进基于 IPv6 的下一代互联网规模部署&#xff0c;计划指出到 2018 年末国内 IPv6 活跃用户数…