The whole volume is considered at once, not planewise- the strategy is identical in 2D or 3D. I’ll describe the strategy, then explain the key point more thoroughly

First the image is converted to a binary mask- is this pixel “on” or “off”? Typically you do this based on thresholding the fluorescence intensity.

Then, each voxel that’s “on” (pixel plus plane) is considered for how far it is in 3D distance to a voxel that’s “off”. That voxel is then assigned an intensity of that distance (this is called a distance transform and is visualized as “Distances” in the skimage link).

Next, you look for the highest values in your distance transformed image- assuming your objects are roughly spherical, the voxels that are farthest from the edge (aka have a highest distance from the “off” voxels) will be the center of your objects. We’ll identify each maximum distance and call it a seed.

Finally, you start from the seeds and push outward, adding voxels to each object as we go, until all “on” voxels have been assigned to some object.

The only difference is in the calculation of the distance transform, which is done by this scipy function- in 2D, you just calculate the distance to a boundary in X and Y. In 3D, you calculate the distance in X, Y, and Z. Otherwise, the algorithm is identical.

Did that help at all?