Basics
WebGPU is a very simple system. All it does is run 3 types of functions on the GPU.
Vertex Shaders: A Vertex Shader computes vertices. The shader returns vertex positions. For every group of 3 vertices the vertex shader function returns, a triangle is drawn between those 3 positions
Fragment Shaders: A Fragment Shader computes colors (Fragment shaders indirectly write data to textures. Colors in WebGPU are usually specified as floating point values from 0.0 to 1.0. That data does not have to be colors. For example, it’s common to output the direction of the surface that pixel represents.) . When a triangle is drawn, for each pixel to be drawn the GPU calls your fragment shader. The fragment shader then returns a color.
Compute Shaders: It’s effectively just a function you call and say “execute this function N times”. The GPU passes the iteration number each time it calls your function so you can use that number to do something unique on each iteration.
The shaders reference resources (buffers, textures, samplers) indirectly through Bind Groups
To execute shaders on the GPU, you need to create all of these resources and set up this state. Creation of resources is relatively straightforward. One interesting thing is that most WebGPU resources can not be changed after creation.You can change their contents but not their size, usage, format, etc…If you want to change any of that stuff you create a new resource and destroy the old one.
Drawing triangles
WebGPU can draw triangles to textures. The <canvas> element represents a texture on a webpage. In WebGPU we can ask the canvas for a texture and then render to that texture.(There are actually 5 modes:
'point-list': for each position, draw a point'line-list': for each 2 positions, draw a line'line-strip': draw lines connecting the newest point to the previous point'triangle-list': for each 3 positions, draw a triangle (default)'triangle-strip': for each new position, draw a triangle from it and the last 2 positions)
steps:
create shader module
create pipeline
create command encoder (command buffer)
submit command buffer
WebGPU takes every 3 vertices we return from our vertex shader and uses them to rasterize a triangle. It does this by determining which pixels’ centers are inside the triangle. It then calls our fragment shader for each pixel to ask what color to make it.
Positions in WebGPU need to be returned in clip space where X goes from -1.0 on the left to +1.0 on the right, and Y goes from -1.0 at the bottom to +1.0 at the top. This is true regardless of the size of the texture we are drawing to.
Inter-stage Variables
Inter-stage variables come into play between a vertex shader and a fragment shader.When a vertex shader outputs 3 positions a triangle gets rasterized. The vertex shader can output extra values at each of those positions and by default, those values will be interpolated between the 3 points (every time the GPU called fragment shader, it passed in a color that was interpolated between all 3 points).
An important point, like nearly everything in WebGPU, the connection between the vertex shader and the fragment shader is by index. For inter-stage variables, they connect by location index.
for inter-stage variables, all that matters is the @location(?). So, it’s common to declare different structs for a vertex shader’s output vs a fragment shader’s input.
Interpolation Settings
The outputs from a vertex shader, are interpolated when passed to the fragment shader. There are 2 sets of settings that can be changed for how the interpolation happens. Setting them to anything other than the defaults is not extremely common but there are use cases.
Interpolation type:
perspective: Values are interpolated in a perspective correct manner (default)linear: Values are interpolated in a linear, non-perspective correct manner.flat: Values are not interpolated. Interpolation sampling is not used with flat interpolated, the value passed to the fragment shader is the value of the inter-stage variable for the first vertex in that triangle.
Interpolation sampling:
center: Interpolation is performed at the center of the pixel (default)centroid: Interpolation is performed at a point that lies within all the samples covered by the fragment within the current primitive. This value is the same for all samples in the primitive.sample: Interpolation is performed per sample. The fragment shader is invoked once per sample when this attribute is applied.
You specify these as attributes. For example:
1 | @location(2) @interpolate(linear, center) myVariableFoo: vec4f; |
1 | import { |
1 | 0, 1, 2, 3, 4, 5, .... |
Flipping the data is common enough that there are even options when loading textures from images, videos, and canvases to flip the data for you.
to draw something with a texture we have to create the texture, put data it in, bind it to bindGroup with a sampler, and reference it from a shader.
Texture Types and Texture Views
There are 3 types of textures
- 1d
- 2d
- 3d
demension can be passed when creating texture, see device.createTexture
In some way you can kind of consider a “2d” texture just a “3d” texture with a depth of 1. And a “1d” texture is just a “2d” texture with a height of 1. Two actual differences, textures are limited in their maximum allowed dimensions. The limit is different for each type of texture “1d”, “2d”, and “3d”.
Another is speed, at least for a 3d texture vs a 2d texture, with all the sampler filters set to linear, sampling a 3d texture would require looking at 16 texels and blending them all together. Sampling a 2d texture only needs 8 texels.
There are 6 types of texture views
- “1d”
- “2d”
- “2d-array”
- “3d”
- “cube”
- “cube-array”
A “2d-array” is an array of 2d textures. You can then choose which texture of the array to access in your shader. They are commonly used for terrain rendering among other things.
3d textures can be used in cases like 3dLUTS.Each type of texture has its own corresponding type in WGSL.
| type | WGSL types |
|---|---|
| 1d | texture_1d or texture_storage_1d |
| 2d | texture_2d or texture_storage_2d or texture_multisampled_2d as well as a special case for in certain situations texture_depth_2d and texture_depth_multisampled_2d |
| 2d-array | texture_2d_array or texture_storage_2d_array and sometimes texture_depth_2d_array |
| 3d | texture_3d or texture_storage_3d |
| cube | texture_cube and sometimes texture_depth_cube |
| cube-array | texture_cube_array and sometimes texture_depth_cube_array |
Texture Formats
“unorm” is unsigned normalized data (0 to 1) meaning the data in the texture goes from 0 to N where N is the maximum integer value for that number of bits. That range of integers is then interpreted as a floating point range of (0 to 1). In other words, for an 8unorm texture, that’s 8 bits (so values from 0 to 255) that get interpreted as values from (0 to 1).
“snorm” is signed normalized data (-1 to +1) so the range of data goes from the most negative integer represented by the number of bits to the most positive. For example 8snorm is 8bits. As a signed integer the lowest number would be -128 and the highest is +127. That range gets converted to (-1 to +1).
Texture Altas
A Texture Atlas is a fancy name for a texture with multiple images it in. We then use texture coordinates to select which parts go where.
Using Video Effectively
copyExternalImageToTexture. This function copies the current frame of video from the video itself into a pre-existing texture that we created.WebGPU has another method for using video. It’s called importExternalTexture and, like the name suggests, it provides a GPUExternalTexture. This external texture represents the data in the video directly. No copy is made. (What actually happens is up to the browser implementation. The WebGPU spec was designed in the hope that browser would not need to make a copy)
There are a few big caveats to using an texture from importExternalTexture
The texture is only valid until you exit the current JavaScript task. An implication of this is that you must make a new bindgroup each time you call
importExternalTextureso that you can pass the new texture into your shader.You must use
texture_externalin your shadersYou must use
textureSampleBaseClampToEdgein your shaders. Like the name suggests,textureSampleBaseClampToEdgewill only sample the base texture mip level (level 0). In other words, external textures can not have a mipmap. Further, the function clamps to the edge, meaning, setting a sampler toaddressModeU: 'repeat'will be ignored.
Storage Texture
Multi-Sampling Anti-aliasing
Setting colorAttachment[0].resolveTarget says to WebGPU, “when all the drawing in this render pass has finished, downscale the multisample texture into the texture set on resolveTarget. If you have multiple render passes you probably don’t want to resolve until the last pass. While it’s fastest to resolve in the last pass it’s also perfectly acceptable to make an empty last render pass to do nothing but resolve. Just make sure you set the loadOp to 'load' and not 'clear' in all the passes except the first pass otherwise it will be cleared.
Pipeline-Overridable constants
pipeline-overridable constants are a type of constant you declare in your shader but you can change when you use that shader to create a pipeline.
Pipeline overridable constants can only be scalar values so boolean (true/false), integers, floating point numbers. They can not be vectors or matrices.
If you don’t specify a value in the shader then you must supply one in the pipeline. You can also give them a numeric id and then refer to them by their id.
Canvas alphaMode
By default a WebGPU canvas is opaque. Its alpha channel is ignored. To make it not ignored we have to set its alphaMode to 'premultiplied' when we call configure. The default is 'opaque'
1 | context.configure({ |
alphaMode: 'premultiplied' means the colors you put in the canvas must have their color values already multiplied by the alpha value.
Blending
Where color is what happens to the rgb portion of a color and alpha is what happens to the a (alpha) portion.
operation can be one of
- add
- subtract
- reverse-subtract
- min
- max
srcFactor and dstFactor can each be one of
- zero
- one
- src
- one-minus-src
- src-alpha
- one-minus-src-alpha
- dst
- one-minus-dst
- dst-alpha
- one-minus-dst-alpha
- src-alpha-saturated
- constant
- one-minus-constant
Most of them are relatively straight forward to understand. Think of it as
1 | result = operation((src * srcFactor), (dst * dstFactor)) |
Of the blend factors above, 2 mention a constant, 'constant' and 'one-minus-constant'. The constant referred to here is set in a render pass with the setBlendConstant command and defaults to [0, 0, 0, 0]. This lets you change it between draws.
Data Copying
writeBuffercopies data from aTypedArrayorArrayBufferin JavaScript to a buffer. This is arguably the most straight forward way to get data into a buffer.1
2
3
4
5
6
7
8
9
10
11
12device.queue.writeBuffer(
destBuffer, // the buffer to write to
destOffset, // where in the destination buffer to start writing
srcData, // a typedArray or arrayBuffer
srcOffset?, // offset in **elements** in srcData to start copying
size?, // size in **elements** of srcData to copy
)
//If srcOffset is not passed it’s 0.
// If size is not passed it’s the size of srcData.
// srcOffset and size are in elements of srcData, for example,
// if srcData is Float32Array and srcOffset is 6, it will copy
// starting at 24 byteswriteTexturecopies data from aTypedArrayorArrayBufferin JavaScript to a texture.1
2
3
4
5
6
7
8
9
10
11
12
13device.writeTexture(
// details of the destination
{ texture, mipLevel: 0, origin: [0, 0, 0], aspect: "all" },
// the source data
srcData,
// details of the source data
{ offset: 0, bytesPerRow, rowsPerImage },
// size:
[ width, height, depthOrArrayLayers ]
)texturemust have a usage ofGPUTextureUsage.COPY_DSTmipLevel,origin, andaspectall have defaults so they often do not need to be specifiedbytesPerRow: This is how many bytes to advance to get to the next block row of data. This is required if you are copying more than 1 block row. It is almost always true that you’re copying more than 1 block row so it is therefore almost always required.rowsPerImage: This is the number of block rows to advance to get from the the start of one image to the next image. This is required if you are copying more than 1 layer. In other words, ifdepthOrArrayLayersin the size argument is > 1 then you need to supply this value.aspectreally only comes into play when copying data to a depth-stencil format. You can only copy to one aspect at a time, either thedepth-onlyor thestencil-only.
copyBufferToBuffer, like the name suggests, copies data from one buffer to another.1
2
3
4
5
6
7encoder.copyBufferToBuffer(
source, // buffer to copy from
sourceOffset, // where to start copying from
dest, // buffer to copy to
destOffset, // where to start copying to
size, // how many bytes to copy
)sourcemust have a usage ofGPUBufferUsage.COPY_SRCdestmust have a usage ofGPUBufferUsage.COPY_DSTsizemust be a multiple of 4
copyBufferToTexture, like the name suggests, copies data from a buffer to a texture.1
2
3
4
5
6
7
8
9
10encoder.copyBufferToTexture(
// details of the source buffer
{ buffer, offset: 0, bytesPerRow, rowsPerImage },
// details of the destination texture
{ texture, mipLevel: 0, origin: [0, 0, 0], aspect: "all" },
// size:
[ width, height, depthOrArrayLayers ]
)texturemust have a usage ofGPUTextureUsage.COPY_DSTbuffermust have a usage ofGPUBufferUsage.COPY_SRCbytesPerRowmust be a multiple of 256
copyTextureToBufferlike the name suggests, copies data from a texture to a buffer1
2
3
4
5
6
7
8
9
10encoder.copyTextureToBuffer(
// details of the source texture
{ texture, mipLevel: 0, origin: [0, 0, 0], aspect: "all" },
// details of the destination buffer
{ buffer, offset: 0, bytesPerRow, rowsPerImage },
// size:
[ width, height, depthOrArrayLayers ]
)texturemust have a usage ofGPUTextureUsage.COPY_SRCbuffermust have a usage ofGPUBufferUsage.COPY_DSTbytesPerRowmust be a multiple of 256
copyTextureToTexturecopies a portion of one texture to another, The two textures must be must either be the same format, or they must only differ by the suffix'-srgb'.1
2
3
4
5
6
7
8
9
10encoder.copyTextureToBuffer(
// details of the source texture
src: { texture, mipLevel: 0, origin: [0, 0, 0], aspect: "all" },
// details of the destination texture
dst: { texture, mipLevel: 0, origin: [0, 0, 0], aspect: "all" },
// size:
[ width, height, depthOrArrayLayers ]
);- src.
texturemust have a usage ofGPUTextureUsage.COPY_SRC - dst.
texturemust have a usage ofGPUTextureUsage.COPY_DST widthmust be a multiple of block widthheightmust be a multiple of block height- src.
origin[0]or.xmust be a multiple block width - src.
origin[1]or.ymust be a multiple block height - dst.
origin[0]or.xmust be a multiple block width - dst.
origin[1]or.ymust be a multiple block height
- src.
Shaders: Shaders can write to storage buffers, storage textures, and indirectly they can render to textures. Those are all ways of getting data into buffers and textures. In other words you can use shaders to generate data.
mapping buffers: You can map a buffer. Mapping a buffer means making it available to read or write from JavaScript. At least in version 1 of WebGPU, mappable buffers have severe restrictions, namely, a mappable buffer can can only be used as a temporary place to copy from. A mappable buffer can not be used as any other type of buffer (like a Uniform buffer, vertex buffer, index buffer, storage buffer, etc…)
You can create a mappable buffer with 2 combinations of usage flags.
GPUBufferUsage.MAP_READ | GPU_BufferUsage.COPY_DSTThis is a buffer you can use the copy commands above to copy data to from another buffer or a texture, then map it to read the values in JavaScript
GPUBufferUsage.MAP_WRITE | GPU_BufferUsage.COPY_SRCThis is a buffer you can map in JavaScript, you can then put data in it from JavaScript, and finally unmap it and use the and the copy commands above to copy its contents to another buffer or texture.
The process of mapping a buffer is asynchronous. You call
buffer.mapAsync(mode, offset = 0, size?)whereoffsetandsizeare in bytes. Ifsizeis not specified it’s the size of the entire buffer.modemust be eitherGPUMapMode.READorGPUMapMode.WRITEand must of course match theMAP_usage flag you passed in when you created the buffer.mapAsyncreturns a Promise. When the promise resolves the buffer is mappable. You can then view some or all of the buffer by callingbuffer.getMappedRange(offset = 0, size?)whereoffseta byte offset into the portion of the buffer you mapped.getMappedRangereturns anArrayBufferOnce mapped, the buffer is not usable by WebGPU until you call
unmap. The momentunmapis called the buffer disappears from JavaScript.mappedAtCreation: trueis a flag you can add when you create a buffer. In this case, the buffer does not need the usage flagsGPUBufferUsage.MAP_WRITE. This is a special parameter to let you put data in the buffer on creation. You add the flatmappedAtCreation: truewhen you create the buffer. The buffer is created, already mapped for writing.1
2
3
4
5
6
7
8
9const buffer = device.createBuffer({
size: 16,
usage: GPUBufferUsage.UNIFORM,
mappedAtCreation: true,
});
const arrayBuffer = buffer.getMappedRange(0, buffer.size);
const f32 = new Float32Array(arrayBuffer);
f32.set([1, 2, 3, 4]);
buffer.unmap();
Optional Features and limits
When you request an adapter with
1 | const adapter = await navigator.gpu?.requestAdapter() |
The adapter will have a list of limits on adapter.limits and array of feature names on adapter.features.
By default, when you request a device, you get the minimum limits and you get no optional features. The hope is, if you stay under the minimum limits, then your app will run on all devices that support WebGPU.
But, given the available limits and features listed on the adapter, you can request them when you call requestDevice by passing your desired limits as requiredLimits and your desired features as requiredFeatures
1 | const adapter = await navigator.gpu?.requestAdapter(); |
The recommended way to use features and limits is to decide on what you absolutely must have and throw errors if user’s device can not support those features.
WGSL
attributes
The word attributes has 2 meanings in WebGPU. One is vertex attributes. The other is in WGSL where an attribute starts with @.
For a vertex shader, inputs are defined by the @location attributes of the entry point function of the vertex shader.
1 | @vertex vs1(@location(0) foo: f32, @location(1) bar: vec4f) ... |
For inter stage variables, @location attributes define the location where the variables are passed between shaders.
1 | struct VSOut { |
For fragment shaders, @location specifies which GPURenderPassDescriptor.colorAttachment to store the result in.
1 | struct FSOut { |
@builtin attribute is used to specify that a particular variable’s value comes from a built-in feature of WebGPU.
| Builtin Name | Stage | IO | Type | Description |
|---|---|---|---|---|
| vertex_index | vertex | input | u32 |
Index of the current vertex within the current API-level draw command,
independent of draw instancing.
For a non-indexed draw, the first vertex has an index equal to the For an indexed draw, the index is equal to the index buffer entry for the
vertex, plus the |
| instance_index | vertex | input | u32 |
Instance index of the current vertex within the current API-level draw command.
The first instance has an index equal to the |
| position | vertex | output | vec4 |
Output position of the current vertex, using homogeneous coordinates. After homogeneous normalization (where each of the x, y, and z components are divided by the w component), the position is in the WebGPU normalized device coordinate space. See WebGPU § 3.3 Coordinate Systems. |
| fragment | input | vec4 |
Framebuffer position of the current fragment in framebuffer space. (The x, y, and z components have already been scaled such that w is now 1.) See WebGPU § 3.3 Coordinate Systems. | |
| front_facing | fragment | input | bool | True when the current fragment is on a front-facing primitive. False otherwise. |
| frag_depth | fragment | output | f32 | Updated depth of the fragment, in the viewport depth range. See WebGPU § 3.3 Coordinate Systems. |
| local_invocation_id | compute | input | vec3 |
The current invocation’s local invocation ID, i.e. its position in the workgroup grid. |
| local_invocation_index | compute | input | u32 | The current invocation’s local invocation index, a linearized index of the invocation’s position within the workgroup grid. |
| global_invocation_id | compute | input | vec3 |
The current invocation’s global invocation ID, i.e. its position in the compute shader grid. |
| workgroup_id | compute | input | vec3 |
The current invocation’s workgroup ID, i.e. the position of the workgroup in the workgroup grid. |
| num_workgroups | compute | input | vec3 |
The dispatch size, vec, of the compute shader dispatched by the API. |
| sample_index | fragment | input | u32 | Sample index for the current fragment.
The value is least 0 and at most sampleCount-1, where sampleCount is the MSAA sample count specified for the GPU render pipeline. See WebGPU § 10.3 GPURenderPipeline. |
| sample_mask | fragment | input | u32 | Sample coverage mask for the current fragment.
It contains a bitmask indicating which samples in this fragment are covered
by the primitive being rendered. See WebGPU § 23.3.11 Sample Masking. |
| fragment | output | u32 | Sample coverage mask control for the current fragment.
The last value written to this variable becomes the shader-output mask.
Zero bits in the written value will cause corresponding samples in
the color attachments to be discarded. See WebGPU § 23.3.11 Sample Masking. |