# Clomp-Without-MP **Repository Path**: cputests/clomp-without-mp ## Basic Information - **Project Name**: Clomp-Without-MP - **Description**: 1111111111 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2023-06-09 - **Last Updated**: 2024-12-31 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README CLOMP README (Updated for Version 1.2 23Dec2013) ------------------------------------------------ CLOMP is the C version of the Livermore OpenMP benchmark developed to measure OpenMP overheads and other performance impacts due to threading in order to influence future system designs. A detailed description of CLOMP and how to build/run CLOMP is can be found under the CLOMP link at: https://asc.llnl.gov/CORAL-benchmarks/ Please see that web page for much more complete and recent information. New in Version 1.2: The serial section of CLOMP (calc_deposit()) now takes practically no time, allowing much larger number of parts and threads to be used. Performance results will be better than previous versions of CLOMP. CLOMP now can use MPI. By default, CLOMP does not use MPI but if compiled with -DWITH_MPI (referred to as CLOMP_MPI from now on), CLOMP_MPI will report the slowest run over all MPI tasks for each measurement. In our Hybrid MPI/OpenMP runs, the slowest task determines overall performance. CLOMP's new last line is comma-separated values requested for the CORAL RFP. CLOMP consists of just one source file, clomp.c, and should be built with your favorite OpenMP compiler and optimization levels. The Makefile supports just a few compilers by default (icc, bgq, gcc) and typing 'make icc', 'make bgq', or 'make gcc' with invoke those build lines. Feel free to add or tweak the Makefile and the CLOMP benchmark assumes the code will be highly optimized. The sample makefile will also attempt to build CLOMP_MPI. On most systems we have tried, at least one environment variable needs to be set in order to get good OpenMP/threading performance. Usually this parameter tells the threading library to spin wait at locks/barriers and to never yield the processor. Here are the minimum settings (the web page describes others) for two OS/compiler combinations to get you started: BG/Q/xlc: setenv BG_SMP_FAST_WAKEUP YES setenv OMP_WAIT_POLICY ACTIVE Linux/icc: setenv KMP_BLOCKTIME 10000 CLOMP is highly configurable (as described on the CLOMP web page) and a wide variety of runs can be done. Running CLOMP with around 6400 total zones with 16 threads is highly representative for many interesting applications when running large parallel simulations. Here is how to run CLOMP in what we consider one of the most important configurations to make run well: clomp 16 1 16 400 32 1 100 The above command runs CLOMP with the 16 threads allowed on the node, uses 16 zone partitions with 400 zones each, and uses the standard settings for the other parameters. See the web page or run CLOMP with no arguments for more details. On all current systems we have tested, OpenMP overheads usually make the above 6400 total zone run not scale well. We typically see less than a 4X speedup with 16 threads for the "Static OMP Speedup" result reported by CLOMP. Our hope is that threading overheads can be reduced so that so that near ideal speedups can be achieved for the 6400 zone case in the future. We know it is possible since BG/Q's Lightweight OpenMP Research Prototype compiler gets a 11.8X speedup with this input on BG/Q. The provided run_clomp.bgq example script that shows how CLOMP and CLOMP_MPI was run on BG/Q for the CORAL RFP. This script generates a comma delimited data file run_clomp.CORAL_RFP that can be loaded into the RFP spreadsheet. This script can be modified to run on other architectures if desired. /* CLOMP Benchmark to measure OpenMP overheads with typical usage situations. * Also measures cache effects and memory bandwidth affects on performance. * Provides memory layout options that may affect performance. * Removes earlier variations that didn't provide expected insight. * Version 1.2 Drastically reduces serial overhead for high part count * inputs, allowing reasonable messages with large thread counts. * Also added -DWITH_MPI to measure OpenMP overheads when run * with multiple MPI tasks per node, defaults to no MPI. * Reduced digits reported after decimal point for most measurments. * Last output line prints CORAL RFP data for easy importing. * Changes by John Gyllenhaal at LLNL 12/23/13 * Version 1.1 calculates error bounds (but not yet used) and dramatically * reduces chance of underflows in math. John Gyllenhaal 6/16/10 * Version 1 (first public release) of the CLOMP benchmark. * Written by John Gyllenhaal and Greg Bronevetsky at LLNL 5/25/07 * Based on version 0.2 (4/24/07) and version 0.1 (12/14/07) by John and Greg. ****************************************************************************** COPYRIGHT AND LICENSE Copyright (c) 2007, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory Written by John Gyllenhaal (gyllen@llnl.gov) and Greg Bronevetsky (bronevetsky1@llnl.gov) UCRL-CODE-233406. All rights reserved. This file is part of CLOMP. For details, see www.llnl.gov/asc/sequoia/benchmarks Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the disclaimer below. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the disclaimer (as noted below) in the documentation and/or other materials provided with the distribution. * Neither the name of the UC/LLNL nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OF THE UNIVERSITY OF CALIFORNIA, THE U.S. DEPARTMENT OF ENERGY OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ADDITIONAL BSD NOTICE 1. This notice is required to be provided under our contract with the U.S. Department of Energy (DOE). This work was produced at the University of California, Lawrence Livermore National Laboratory under Contract No. W-7405-ENG-48 with the DOE. 2. Neither the United States Government nor the University of California nor any of their employees, makes any warranty, express or implied, or assumes any liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately-owned rights. 3. Also, reference herein to any specific commercial products, process, or services by trade name, trademark, manufacturer or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or the University of California. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or the University of California, and shall not be used for advertising or product endorsement purposes. ******************************************************************************/