clEsperanto development/ideas

Hi
@haesleinhuepf suggested I post my thoughts/questions about development of clEsperanto on the forum which can hopefully initiate discussions. I did post this on github, but it should gain more visibility here. For those not familiar, info on clEsperanto can be found here. The idea is to provide GPU-accelerated image processing across languages and platforms using the same workflow commands.

Details on the Proposed Roadmap

Following on from the roadmap in regard to core development and the two approaches:

  • I like the idea of translating ClearCL to Python by exploiting PyOpenCl and/or gputools. I am assuming that this means all other platforms would have to adopt a similar approach where they use platform specific libraries?

  • However, if ClearCL was translated to C++, is it easier to write wrappers around this in different platforms (Python, Matlab…)? Would it be easier to troubleshoot as it will be working from a common C++ code? Or would it be difficult if developers in each platform are not as familiar with C++, i.e., will it be harder to troubleshoot? Also, how does this approach maintenance and development in the long run?

Disclaimer: I am not experienced with software development,so pardon my naivety and I am more than happy to hear criticisms/thoughts on the below…

Thanks
Pradeep

5 Likes

Hey @pr4deepr,

thanks for your initiative!

In this aspect, I would love to hear @mweigert (gputools author) opinion: Martin, do you think it’s possible to make gputools execute these kernels? I assume, we would need to construct a layer around gputools, which implements this OpenCL dialect, e.g. replacing IMAGE_src_TYPE by float* src or byte* src or … depending on image type during run time. gputools can do something like that already, right? Is there maybe some example code showing us how customize gputools in that way?

I personally think this is the right way to go, as we prevent code-duplication in python and Java.
However, it’s also more effort. I’m also afraid we haven’t found a C++ expert yet who could help us with this. I do have some experience with C++, but I would be happy to talk to a real expert to go into the right direction from the very beginning :wink:

Cheers,
Robert

1 Like

Hi @haesleinhuepf & @pr4deepr,

A few vague thoughts:

  • Moving the general chat to this forum is a good idea. One of the main tasks eventually should be striving for wide community adoption, and this helps make more people aware.
  • Translating to a common C++ core with language-specific wrappers also seems the most sensible approach to me, although we are potentially limited by the number of developers with lots of C++ experience (not me). This would also allow wrappers to be written in whatever new high-level languages of choice spring up.
  • Being able to copy and paste code between languages would be nice, but IMHO, not that useful. Even if I want to port someone’s analysis to another language, it’s unlikely that I’d want to do it in exactly the same way. I’d be happy to look up the specific Python syntax etc. What’s important to me is knowing that the underlying computation is the same.
  • Would a poll be useful to see what languages bio-image analysts are using, and importantly, which of them would use this library? E.g. in my experience, Python users are more likely to use external libraries compared to MATLAB users.
  • IMO documentation and examples are critical. Not just toy examples, but real world examples showing why I would use this library, not just how. Napari does this well.
  • Testing and cross language testing is also very important. I’d love to be able to trust a Python implementation of something because the Java implementation was validated (at least to some extent).
  • Is there a reason for potentially investigating CUDA support? Are there CUDA/OpenCL performance benchmarks? Or is the aim just to allow existing CUDA code to be executed across languages? It would be a shame IMO to have any functionality that is restricted to a subset of hardware (memory requirements etc. excepted).

Very excited about clesperanto. I don’t think a programming language is ever going to win in bioimage analysis, so this is a great way to bring people together.

Cheers,
Adam

2 Likes

Hi all!

In gputools you can build an openCL Program directly from a string containing the code:

import numpy as np
from gputools import OCLProgram, OCLArray 

code = """kernel void foo(global float * dst){
uint i = get_global_id(0);
dst[i] = i; 
}"""
prog = OCLProgram(src_str=code)
baz = OCLArray.empty((128,), np.float32)
prog.run_kernel("foo",baz.shape, None, baz.data)

So any preprocessing of code e.g. substitution of image types and read/write functions could be easily integrated.

I tentatively agree with that - it will be hard to produce an API that is idiomatic in all the target languages. E.g. in python, having the same function signatures as e.g. scikit-image (with e.g. keyword arguments) but operating on and returning gpu images/buffers would be imo more useful. But I can as well see the benefits of having standardised signatures (maintenance, testing). So it’s a trade-off.

2 Likes

Thanks Martin! This will help us getting started!

I discussed that with @jni at some point and there are several ideas in the air: My favorite is building clEsperanto as optional layer below scikit-image (available via context switch CPU-> GPU). Thus, it has all the kernels from #clij. scikit-image could then just use some of them internally. The challenge is then to make the opencl-kernels do the same as the CPU would do (e.g. float versus double).

Thanks everyone for your ideas! :heart::green_heart::blue_heart:

I put it there as I was assuming somebody will ask for it anyway. There are quite some examples online demonstrating that CUDA can be faster than OpenCL. However, as you pointed out, hardware compatibility is an issue.

Do you have good use cases in mind? I spent quite some effort on examples and benchmarking (speed is the “why”) for #clij. Feedback is very welcome! :wink:

Yeees! :partying_face:

Thanks for your thoughts!

I like this idea. Adding an option like this makes it easier to use plus ensures wider adoption.

With the CUDA support, if/when considering it, is it a good idea to look at NVIDIA RAPIDS?

1 Like

Huh, maybe it’s a bit early as we don’t have clEsperanto yet. Extending it towards CUDA or whatever is currently not an option :wink: But it’s good to think about potential extensions. E.g. HIP - it also has a C++ layer and wrappers for python and matlab. I added both links to the roadmap. Btw you all should have write access to it. Feel free to add your thoughts! I don’t want to be the dictator :wink:

Hey all,

I managed to write a prototype for clEsperanto in python using gputools (Thanks @mweigert for the example code!). It executes CLIJ2 opencl kernels under the hood. An example script looks like this:

If a python expert could have a look at the implementation and provide some honest feedback that would be nice. I would especially be interested in how to modularize methods. E.g. in CLIJ2, all operations have an individual .java file and corresponding .cl file(s) which makes maintenance easier.

Furthermore, I realised that kernel-caching in CLIJ and gputools is done differently

I’m wondering what @mweigert says: Does it make sense to extend gputools or should we better do this on clEsperanto side?

I’m just also CCing @dwaithe because I know he was interested in the python side as well.

Feedback is highly appreciated! Thanks!

Cheers,
Robert

On C++, could we not use the OpenCL C++ wrappers:?

The initial phase of this project should probably stick to using the already implemented OpenCL core. If the scope were simply just having a common API vocabulary, then this might become more of a translation project.

I am interested in building a version of clEperanto for Julia, CLEperanto.jl, using OpenCl.jl:


It looks like I should be able to follow @haesleinhuepf lead since this is similar to pyopencl.

An alternative is to take advantage of Julia’s LLVM backend. There was an earlier but not maintained effort to build a transpiler that would allow one to write kernels in Julia and translate them to OpenCL for compilation.


The idea would be to also interface all of this with the fantastic Images.jl work led by @timholy:

1 Like

Hey @markkitt,

cool to see you chiming in here. :slight_smile:

Yes, absolutely. I also would build something on top of this.

AWESOME!! Before you go ahead, would you mind entering Julia and your ideas regarding how to do it in the roadmap document? You should have write-access to this repository.
If you start coding a prototype, feel free to create a repository in the clEsperanto organisation as I just did for the python prototype.

I’m not sure but I think OpenCL.jl is the Julia conterpart to pyopencl and/or gputools. Thus, the layer you may want to write for Julia would be the Julia counterpart to the python prototype linked above. Particularily, we might need a Julia version of the kernel-excutor I wrote for Java and/or the python prototype. It’s major task is to translate types. If the input image is of type float for example, a define statement must be added to the OpenCL code: #define IMAGE_input_TYPE __global float*. I called this an OpenCL dialect which is documented here (note: it’s on the development branch).

If you find some time for exploring how to do this in Julia, that would be fantastic. A small prototype might be enough for now as we just want to see if/how things are possible. Let us know how it goes and if you need support of any kind. IMHO the more popular languages/platforms we can offer, the better.

Good to have you on board!

Cheers,
Robert

For the python version, have you considered building defines with a multiline (f-)string?

https://github.com/clEsperanto/pyclesperanto_prototype/blob/8e86281b9078142a8804f818706a0b57a2306f1a/clesperanto/init.py#L65

For example

    defines = """
              #define GET_IMAGE_WIDTH(image_key) IMAGE_SIZE_ ## image_key ## _WIDTH
              #define GET_IMAGE_HEIGHT(image_key) IMAGE_SIZE_ ## image_key ## _HEIGHT'
              #define GET_IMAGE_DEPTH(image_key) IMAGE_SIZE_ ## image_key ## _DEPTH'
              """
    ...

    defines += f"""
               #define CONVERT_{key}_PIXEL_TYPE clij_convert_float_sat
               #define IMAGE_{key}_TYPE __global float*
               #define IMAGE_{key}_PIXEL_TYPE float
               """

Since f-strings are relatively new (Python 3.6, circa ~2016) you could also use

    defines += """
               #define CONVERT_{key}_PIXEL_TYPE clij_convert_float_sat
               #define IMAGE_{key}_TYPE __global float*
               #define IMAGE_{key}_PIXEL_TYPE float
               """.format(key=key)

I just thought it might make the string construction easier to read.

1 Like

Hi,

I guess the easiest would simply be to write a “translation” module/class from the clEsperanto side that is able to generate the appropriate OpenCL kernels from a requested function signature (datatype etc).

1 Like

I have seen many projects (e.g. pyopencl) that use template libraries (e.g. jinja2 or mako) for such things, as they provide additional goodies such as loop construction, conditionals, etc. So I would maybe got along that path…

Example:

from jinja2 import Template 

kernel = """kernel void foo(global {{buffertype}} dst){ 
    const uint i = get_global_id(0);  
    float res=0; 
    {% for n in range(1,niter+1) %} 
        res += 1.f/{{n*n}}; 
    {%- endfor %} 

    dst[i] = res; 
    }"""    

src=Template(kernel).render(buffertype="float *", niter=4) 

print(src)
"kernel void foo(global float * dst){
const uint i = get_global_id(0); 
float res=0;

res += 1.f/1;
res += 1.f/4;
res += 1.f/9;
res += 1.f/16;

dst[i] = res;
}"
1 Like

@haesleinhuepf :wave: hello again! =)

Firstly, we’ll need to translate names to be Pythonic and using snake_case. =)

Secondly, I think we would want to automate the transfer, and output creation, to make the code more friendly for Python users. Something like

def ensure_ocl(array):
    if not isinstance(image, OCLArray):
        output = push(array)
    else:
        output = array
    return array


def add_image_and_scalar(image, scalar, out=None):
    image = ensure_ocl(image)
    if out is not None:
        if not isinstance(out, OCLArray):
            ocl_out = create_like(image)
            pull(addImageAndScalar(image, scalar, ocl_out), out)
        else:
            addImageAndScalar(image, scalar, out)
    else:
        out = create_like(image)
        addImageAndScalar(image, scalar, out)
    return out

You can probably simplify that logic but it’s late and I hope that’s enough to get the gist. =) The point is:

  • autogenerate a functional interface whatever the raw interface that clEsperanto expects
  • return an OCLarray by default, which can be converted using the __array__ interface by end users, but can also be threaded to other GPU operations.
3 Likes

No problem. As you may know, the API is generated. We can make it in aNy_Case :wink:

I also like your code snipped a lot!

One more question: In CLIJ I have hundreds of Java files, one per operation. I think this should also be the way to go for python, or? Is there some guide/resource available on how to build a “good” library in python? I’ve seen many different ways of achieving this. The only thing I would like to prevent is a hand-written 6000-liner. Any advice is welcome!

Thanks again @jni Good to see you here :slight_smile:

1 Like

I don’t think there’s a right and wrong way here. Some Python packages go for the monofile, others go for lots of modularity. I think it’d be nice to somehow group related operations together, whether in directories or in files, but otherwise I tend to favour the modularity approach.

1 Like