add x86_64 optimization to Makefile#402
Conversation
|
What's the rationale behind adding openmp et all? |
|
Please provide rationale and performance comparison |
|
The reason for using OpenMP is platform independent SIMD instruction set and compiler support. Compile the following arbitrary length SIMD dot product on any OpenMP supported target platform using any compiler supported. Compile with CFLAGS="-march=native -ftree-vectorize -fopenmp -fopenmp-simd -std=std2y" adding -O0 -to -O3 and view the assembler generated with -S for different target platforms and optimization levels" #include <stdlib.h>
#include <stdint.h>
#include <stddef.h>
#include <stdbool.h>
#include "q_shared.h"
#ifdef __clang__
#define addr(a) (__typeof__((a)[0])*) __builtin_addressof((a))
#else
#define addr(a) &(a)[0]
#endif
/* dup - non-parallel memcpy supporting array and vector types */
#define dup(n,dst,src) \
({ \
_Pragma("omp simd") \
for(size_t i = 0; i < n; i++) \
(dst)[i] = (src)[i]; \
})
/* dot - non-parallel scalar product supporting array and vector types */
#define dot(n,a,b,i,t) \
({ \
t dst = (t)i; \
_Pragma("omp simd reduction(+:dst)") \
for(size_t j = 0; j < n; j++) \
dst += (t)(a)[j] * (t)(b)[j]; \
dst; \
})
int main(int argc, char** argv)
{
evec(4,vec_t) a = { 0, 1, 2, 3 };
vec_t b[3];
dup(3,b,a);
vec_t c = dot(3,a,b,0,vec_t);
printf("%f\n", c);
exit(EXIT_SUCCESS);
} |
|
Did you tried compiling code from PR? |
Adding target platform dependent SIMD restrictions to the RELEASE_CFLAGS is recommended, e.g. -mno-avx512 |
You're just changing ceiling which bottom is at SSE2 for x86_64, same for non-x86 architectures. Please fix compilation errors in a first order, do not submit non-working code here |
|
ioquake/ioq3#869 Y can try compile use build in local version before merge |
only .gitignore is usable at these links. |
|
workflow should complete now with #warning being removed |
|
It breaks VC2005 build as well, not present in GitHub Actions though. Also could you measure actual performance difference? |
|
I've checked in game performance with Intel UHD Graphics 630 and NVIDIA GeForce RTX 2060/PCIe/SSE2 card using opengl1 renderer and map Q3DM1. Intel/Nvidia framerate compare
seta r_fbo 1
seta r_vbo 1
seta r_bloom 1
seta r_hdr 1
seta r_ext_supersample 1
seta r_ext_multisample 8
seta r_fbo 0
seta r_vbo 0
seta r_bloom 0
seta r_hdr 0
seta r_ext_supersample 0
seta r_ext_multisample 0
seta r_fbo 1
seta r_vbo 1
seta r_bloom 0
seta r_hdr 0
seta r_ext_supersample 0
seta r_ext_multisample 0I'll get +20 with using the aligned one. |
Please submit and post a link to the error log. |
|
Please test changes on your own GitHub Actions / compilers first, I will not approve test runs anymore |
+20 from what base, what those |
|
Fyi openmp support is excluded in 2005 and 2008 express editions, may need to be extra download with regular too. |
|
Yes, Visual Studio 2005/2008 Express Editions need the extra download, while the Standard Editions include it. |
|
Please approve another workflow run. |
You havent even launched actions in your GitHub repository once in all this time... |
|
|
||
| #if defined( _MSC_VER ) && _MSC_VER >= 1400 // MSVC++ 8.0 at least | ||
| #define OS_STRING "win_msvc" | ||
| #define ID_INLINE __forceinline __flatten inline |
There was a problem hiding this comment.
Fixed mispelled MSVC syntax here!
There was a problem hiding this comment.
Wow now start build it in you repo and check it
There was a problem hiding this comment.
No, there are no action workflow builds possible in a fork.
There was a problem hiding this comment.
All the scripts are not setup.
There was a problem hiding this comment.
Ah, I see the run workflow now.
- assure ID_INLINE inlines everything - add vec(4,byte) and reindent not dependent on ts - inline ColorBytes and move to q_shared.h - add avec(N,T) and use it for avec3_t/avec5_t - add clang support - add evec(N,T) SIMD vector extension type - use __typeof__ instead of typeof
|
Sorry, that's enough |
-march=nativeif not cross-compiling-mfpmath=sseif target isx86_64-O3 -ftree-vectorize -fopenmp -fopenmp-simdtoOPTIMIZE-fdata-sections -ffunction-sectionstoRELEASE_CFLAGS