glTF: An Image Format for AR/VR

Augmented Reality and Virtual Reality (AR/VR) are cool technologies that are coming more and more mainstream. While today they may be primarily in games, and some require fancy headsets like Oculus, there are technologies that are bringing AR/VR into the mainstream. For example, A-Frame is a web framework that allows you to bring VR experiences into the browser. As a framework, it is syntactically similar to writing HTML, bringing a low barrier of entry for many developers. If you are not familiar with A-Frame, and want to learn more – the tutorials are great.

Building a simple world in A-frame is easy – and once you’ve finished with the tutorial linked above, you’ll have a simple world! But it would be cool to populate this world with more than just boxes, spheres and cylinders. Luckily, there are hundreds (if not thousands) of 3D artists creating renderings of objects that are available to be used in your AR/VR worlds. And, even better, there is an open, royalty free standard for these files that is well supported across the ecosystem: glTF.

The sample glTF files in this post are from KhronosGroup’s Github of Sample files, and from SketchFab, where I downloaded “Ship in a Bottle.”

These files, and their optimized versions, are all in a GitHub repository.

glTF

Called by its founders as the “JPEG of 3D”, glTF is based on JSON, but has supporting files in its hierarchy to create the 3D structure. From the glTF page linked above:

The glTF Structure

The .gltf file (as shown above) is just straight JSON- linking to the files and geometries into a 3D file. For example, here is the file structure for the Avocado gltf file:

Screen Shot 2018-12-16 at 8.01.45 PM.png

glTF in action

Let’s show off that 3D Avocado goodness in AFrame. To do that, we establish the Avocado gltf as an asset, and then we display the entity. (It is scaled up 10x because this avocado displays very small by default.):

<a-scene>
<a-assets>
<a-asset-item id="avocado" src="Avocado/glTF/Avocado.gltf"></a-asset-item>
</a-assets>
<a-entity position="0 1.5 -1"scale="10 10 10" rotation="0 0 0" gltf-model="#avocado"></a-entity>
</a-scene>

If you open this link in your favorite developer tools, you’ll see that this simple Avocado uses 8.2 MB to render on the page. 7.9 MB of this is because of images. Of an Avocado. I think we can do better.

Screen Shot 2018-12-16 at 9.14.30 PM.png

In all of the models tested, the images used to wrap the models account for >95% of the size of the gltf file.

Image Optimization

The images associated with the gltf files can be found in the associated folders in the Github repo. They appear to be exported as the raw output from the 3D rendering engine, with little thought to the file size. Since any of these models are designed to be distributed in applications or websites, the size of these images should be a concern for load time (and general tonnage of data).

It is a far argument that VR applications are typically run on devices with fast network conditions, these very large files will still add appreciable delay to the scene loading onto the page. On slow networks, the file size may prevent the 3D model from loading at all.

To optimize the images in the gltfs I studied, I chose 2 image optimizations: changing the image format, and reducing the quality of the images. To do this, I uploaded all of the raw files to Cloudinary, and then used their image transformation tools to choose the optimal format for the browser – I am testing in Chrome, so the PNG and JPGs are mostly converted to WebP. I also lowered the image quality using structural similarity to a resolution where the human eye can not tell a difference. I do this by uploading the images to Cloudinary, and adding parameters q_auto,f_auto to the url.

To apply these new images to the gltf files, we must replace the image references in the gltf JSON. Here they move from a local reference to the 3rd party Cloudinary URI:

Screen Shot 2018-12-16 at 9.37.43 PM.png

to:

Screen Shot 2018-12-16 at 9.39.11 PM

Again, the gltf files are all in the GitHub repo, and the optimized gltf and html pages all have “_opt” appended to the filename.

Results:

Here are two similar views of the Ship in the Bottle:

Visually, there are no major differences between the two sails. Using Chrome DevTools and a simulated 5 MBPS downlink connection, I estimated the MB and load times of 4 different models:

	Original (MB)	Optimized (MB)	Load (Original)	Load (Optimized)	% data savings	load time savings
Cube	1.1	0.33	3.7	1.3	70.00%	64.86%
Avocado	8.2	0.584	14	1.85	92.88%	86.79%
Camera	43.5	4.7	80	8.5	89.20%	89.38%
Ship	31.2	9.1	51	15.1	70.83%	70.39%

Data savings are between 70-93%, and load times are 65-90% faster with the optimized images.

You can compare them yourself:

Cube	https://dougsillars.github.io/glTF_optimization/gltf_cube.html
Cube Optimized	https://dougsillars.github.io/glTF_optimization/gltf_cube_opt.html
Avocado	https://dougsillars.github.io/glTF_optimization/gltf_avocado.html
Avocado Optimized	https://dougsillars.github.io/glTF_optimization/gltf_avocado_opt.html
Ship	https://dougsillars.github.io/glTF_optimization/gltf_ship.html
Ship Optimized	https://dougsillars.github.io/glTF_optimization/gltf_ship_opt.html
Camera	https://dougsillars.github.io/glTF_optimization/gltf_camera.html
Camera Optimized	https://dougsillars.github.io/glTF_optimization/gltf_camera_opt.html

State of glTF Optimization Today

From a quick reading of the documentation, there is work on image formats that will be smaller, but it appears that little is being done to lower the size in KB of the JPG or PNG files used today.

Another focus is the use of Draco optimization/compression that allows faster reading of the gltf and bin files that build the geometries. This is great work (check out the demo how much faster Draco speeds the rendering of Manhattan). The performance improvement is phenomenal. When I ran the AntiqueCamera model through a Draco optimization using gltf Pipeline, the entire gltf was one JSON file. In looking at this file, it is 59MB, and all of the images are Base64 encoded in the JSON. I recently researched Base64 as an anti-pattern for images. Base64 encoded images will grow 10-20% larger than the original JPG or PNG files, and further, rendering is slower from Base64 than from raw files. Let’s hope that this pattern is not one that continues in glTF optimization.

Conclusion

AR/VR usage is growing, and now can be easily added to mainstream games, apps and websites. There are thousands of 3D rendered objects that can be added to these worlds, and there is an awesome, royalty free, open standard called gltf that allows for easy sharing and reusing of these objects.

However, these objects are far from optimized for delivery over the internet. The images tend to be PNG or JPG files with no quality reduction or attempt to format for a smaller size. In this simple experiment, I was able to resize 4 gltf files by 70-92% smaller by utilizing JPG, WebP and Structural Similarity, with very little loss in image clarity.

Thanks to @tanay1337 for introducing me to glTF during his talk at DevFest St. Petersburg.