You are here: Foswiki>Main Web>EinsteinAtHomeOpenCLIssues (13 Jul 2018, OliverBock)Edit Attach

Einstein@Home: OpenCL issues

OpenCL (in general)

  • Missing equivalent to memset() or cuMemset8()
    • Current workaround: custom OpenCL kernel
  • OpenCL lacks API call to query the amount of used/free global memory
    • This is used by BOINC (or even Condor in cluster environments) for task scheduling
    • This is used by developers during application design
    • Proposal: add this to clGetDeviceInfo() (e.g.: CL_DEVICE_GLOBAL_MEM_FREE or CL_DEVICE_GLOBAL_MEM_USED)
    • Note: this feature is available in CAL but it seems to be buggy right know (David Anderson knows the details). Needs fixing as long as CAL is still used by BOINC projects (currently 6-8).
  • OpenCL lacks API call to query the version of the underlying GPU driver
    • This is used by BOINC (or even Condor in cluster environments) for task scheduling (min driver version required, e.g. because of a known driver bug)
    • This is used by developers during application design (min required driver because of SDK version used during build)
    • Proposal: add this to clGetDeviceInfo() (e.g.: CL_DEVICE_DRIVER_VERSION)
    • Alternatively this could be implemented as a vendor extension by NVIDIA and AMD

FFT library


  • Open: OpenCL FFT code sample (link) fails on 10.6.x when using AMD GPU (here: HD 4870).
    • Works when using NVIDIA GPU
    • Works when using 10.7 preview
    • Apple bug report ID: 9141177 (closed as won't fix)


  • Open: lack of viable FFT library
    • The available library supports Windows and Linux only (link) (also required: Mac OS X version)
    • The available library supports power-of-2 FFT sizes only (required: currently 1.5 * 2^23, can be factorized into: 2^22 * 3^1)
    • The available library supports complex-to-complex FFTs only (required: real-to-complex)
    • AMD's reply: will take stated requirements as feature requests for existing library
      • Latest news (clAmdFft 1.4):
        • Support for factors of powers of 2, 3 and 5)
        • Support for OpenCL 1.0 devices
        • Still no support for R2C transforms
        • Still no support for Mac OS X


AMD / NVIDIA / Apple

  • OPEN: Missing unique device identification when enumerating devices (i.e. GPUs) using CUDA and OpenCL
    • BOINC needs a way to distinguish/identify all devices on a given host in order to schedule tasks across them without over-booking available resources.
      • CUDA enumerates devices differently than OpenCL (with respect to device order)
      • OpenCL implementations might enumerate devices differently across vendors (with respect to device order)
      • Proposal: provide an API call that returns the (PCI) bus ID. For OpenCL this could be clGetDeviceInfo() (e.g.: CL_DEVICE_BUS_ID)
        • Potential caveat: how could this be done for devices like AMD's Fusion where CPU and GPU might not even have individual bus IDs?
    • There's a feature request by Rom Walton (BOINC developer) with the Khronos Group
      • Task for David: please ask Rom to include the proposed solution of using the (PCI) bus ID as possible ordinal
    • AMD's reply:
      1. AMD is going to check if device enumeration order is consistent between CAL and AMD's OpenCL
      2. AMD will discuss the feature request (for returning bus ID) internally

Drivers / SDK


  • SOLVED: The APPSDK (up to 2.3) doesn't follow standard naming conventions for shared libraries on Linux
    • On Linux AMD provides only. It's common to use a versioned filename and a symlink without a version so that apps can find the lib: ->
    • The problem is that if you compile an OpenCL app using NVIDIA's toolchain, the app will look for and won't find it on an AMD-based host. This should be fixed such that apps can be built with and run on any OpenCL implementation - the very idea behind an open vendor-independent standard like this.
    • I talked to Himanshu Gautam (, somehow related to AMD) about this and got a positive reply that this will be looked into
      • This got solved in the latest APPSDK 2.4 release
  • SOLVED: Current AMD drivers (up to 11.2) don't seem to contain the OpenCL runtime components (like on Linux)
    • This makes app deployment unnecessarily hard because users have to install the SDK (in addition to the driver) in order to run OpenCL app (the driver should be sufficient)
    • I talked to Himanshu Gautam (, somehow related to AMD) about this and got a positive reply that this will be looked into
      • The latest SDK update mentions that this got solved: "Commencing with Catalyst 11.3, the AMD Accelerated Parallel Processing (APP) OpenCL runtime is included in the GPU drivers. More frequent updates to the run-time may be obtained by updating the drivers" (link)
      • AMD's reply:
        • This is solved on Windows but not yet on Linux (now fixed in 11.11)
        • AMD is going to announce an ETA for this being addressed in the Linux driver
  • OPEN (11.11): Current AMD drivers require a running X server on Linux in order to use the GPU.
    • This introduces unnecessary management overhead for large Linux GPU clusters where host usually don't run a X server.
    • NVIDIA doesn't have this requirement
    • AMD's reply:
      • This is going to be addressed by AMD, ETA is going to be announced
      • Note: AMD's FireStream is the equivalent of NVIDIA's Tesla line (might be that this will be solved for the FireStream line only)
  • OPEN (11.11): Current Catalyst driver returns 0 when calling clGetDeviceInfo() with CL_DEVICE_MAX_CLOCK_FREQUENCY
    • Tested on Linux (Debian Lenny, x86_64)
    • Tested with HD 6970
  • OPEN (11.11): Linux driver installer crashes (double free() issue?) during install (fglrx build)
    • Workaround: export MALLOC_CHECK_=0
    • It's a known issue, search the web
    • Bug report sent to Mark Ireton

Related Topics: CategoryRadioPulsar

Topic revision: r29 - 13 Jul 2018, OliverBock
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback