Skip to content

add x86_64 optimization to Makefile#402

Closed
joneqdaniel wants to merge 3 commits into
ec-:mainfrom
joneqdaniel:main
Closed

add x86_64 optimization to Makefile#402
joneqdaniel wants to merge 3 commits into
ec-:mainfrom
joneqdaniel:main

Conversation

@joneqdaniel

@joneqdaniel joneqdaniel commented May 19, 2026

Copy link
Copy Markdown
  • use -march=native if not cross-compiling
  • use -mfpmath=sse if target is x86_64
  • add -O3 -ftree-vectorize -fopenmp -fopenmp-simd to OPTIMIZE
  • add -fdata-sections -ffunction-sections to RELEASE_CFLAGS

@ensiform

Copy link
Copy Markdown
Contributor

What's the rationale behind adding openmp et all?

@ec-

ec- commented May 20, 2026

Copy link
Copy Markdown
Owner

Please provide rationale and performance comparison

@joneqdaniel

joneqdaniel commented May 20, 2026

Copy link
Copy Markdown
Author

The reason for using OpenMP is platform independent SIMD instruction set and compiler support.
OpenMP is the only platform independent SIMD that is supported by MSVC compilers.

Compile the following arbitrary length SIMD dot product on any OpenMP supported target platform using any compiler supported.

Compile with CFLAGS="-march=native -ftree-vectorize -fopenmp -fopenmp-simd -std=std2y" adding -O0 -to -O3 and view the assembler generated with -S for different target platforms and optimization levels"

#include <stdlib.h>
#include <stdint.h>
#include <stddef.h>
#include <stdbool.h>

#include "q_shared.h"

#ifdef __clang__
#define addr(a) (__typeof__((a)[0])*) __builtin_addressof((a))
#else
#define addr(a) &(a)[0]
#endif

/* dup - non-parallel memcpy supporting array and vector types */
#define dup(n,dst,src) \
({ \
    _Pragma("omp simd") \
    for(size_t i = 0; i < n; i++) \
        (dst)[i] = (src)[i]; \
})

/* dot - non-parallel scalar product supporting array and vector types */
#define dot(n,a,b,i,t) \
({ \
    t dst = (t)i; \
    _Pragma("omp simd reduction(+:dst)") \
    for(size_t j = 0; j < n; j++) \
        dst += (t)(a)[j] * (t)(b)[j]; \
    dst; \
})

int main(int argc, char** argv)
{
        evec(4,vec_t) a = { 0, 1, 2, 3 };
        vec_t b[3];
        dup(3,b,a);
        vec_t c = dot(3,a,b,0,vec_t);
        printf("%f\n", c);
        exit(EXIT_SUCCESS);
}

@ec-

ec- commented May 20, 2026

Copy link
Copy Markdown
Owner

Did you tried compiling code from PR?
Not counting compilation errors, passing -march=native is simply not acceptable because it will make release binaries incompatible for A LOT of systems if compiled with AVX512 on a host for example.
Also I'd like to see performance measurements for an actual Quake3e code

@joneqdaniel

joneqdaniel commented May 20, 2026

Copy link
Copy Markdown
Author

passing -march=native is simply not acceptable because it will make release binaries incompatible for A LOT of systems if compiled with AVX512 on a host for example.

Adding target platform dependent SIMD restrictions to the RELEASE_CFLAGS is recommended, e.g. -mno-avx512

@ec-

ec- commented May 20, 2026

Copy link
Copy Markdown
Owner

Adding target platform dependent SIMD restrictions to the RELEASE_CFLAGS is recommended, e.g. -mno-avx512

You're just changing ceiling which bottom is at SSE2 for x86_64, same for non-x86 architectures.

Please fix compilation errors in a first order, do not submit non-working code here

@KG7x

KG7x commented May 20, 2026

Copy link
Copy Markdown
Contributor

ioquake/ioq3#869
ioquake/ioq3#870

Y can try compile use build in local version before merge

@joneqdaniel

joneqdaniel commented May 20, 2026

Copy link
Copy Markdown
Author

Y can try compile use build in local version before merge

only .gitignore is usable at these links.

@joneqdaniel

Copy link
Copy Markdown
Author

workflow should complete now with #warning being removed

@ec-

ec- commented May 20, 2026

Copy link
Copy Markdown
Owner

It breaks VC2005 build as well, not present in GitHub Actions though.

Also could you measure actual performance difference?

@joneqdaniel

joneqdaniel commented May 20, 2026

Copy link
Copy Markdown
Author

I've checked in game performance with Intel UHD Graphics 630 and NVIDIA GeForce RTX 2060/PCIe/SSE2 card using opengl1 renderer and map Q3DM1.

Intel/Nvidia framerate compare

  • High: 23/140
seta r_fbo 1
seta r_vbo 1
seta r_bloom 1
seta r_hdr 1
seta r_ext_supersample 1
seta r_ext_multisample 8 
  • Low: 500/200
seta r_fbo 0
seta r_vbo 0
seta r_bloom 0
seta r_hdr 0
seta r_ext_supersample 0
seta r_ext_multisample 0
  • FBO/VBO only: 300/200
seta r_fbo 1
seta r_vbo 1
seta r_bloom 0
seta r_hdr 0
seta r_ext_supersample 0
seta r_ext_multisample 0

I'll get +20 with using the aligned one.
I still need to compare the vector type differences and write a profiling benchmark.

@joneqdaniel

Copy link
Copy Markdown
Author

It breaks VC2005 build as well

Please submit and post a link to the error log.
Could you please trigger workflow run again. So I am able to check that MSVC compiles except VC2005.

@ec-

ec- commented May 20, 2026

Copy link
Copy Markdown
Owner

Please test changes on your own GitHub Actions / compilers first, I will not approve test runs anymore

@ec-

ec- commented May 20, 2026

Copy link
Copy Markdown
Owner

I'll get +20 with using the aligned one.

+20 from what base, what those High: 23/140 numbers means?
Did you measured difference that came with -O3 -mfpmath=sse + forced inlining alone?

@ensiform

Copy link
Copy Markdown
Contributor

Fyi openmp support is excluded in 2005 and 2008 express editions, may need to be extra download with regular too.

@joneqdaniel

Copy link
Copy Markdown
Author

Yes, Visual Studio 2005/2008 Express Editions need the extra download, while the Standard Editions include it.

@joneqdaniel

Copy link
Copy Markdown
Author

Please approve another workflow run.
I'll look into coding a benchmark example soon.

@KG7x

KG7x commented May 21, 2026

Copy link
Copy Markdown
Contributor

Please approve another workflow run. I'll look into coding a benchmark example soon.

You havent even launched actions in your GitHub repository once in all this time...
How y can сonfirm that you fixed this and there will be no warnings or errors?

Comment thread code/qcommon/q_platform.h Outdated

#if defined( _MSC_VER ) && _MSC_VER >= 1400 // MSVC++ 8.0 at least
#define OS_STRING "win_msvc"
#define ID_INLINE __forceinline __flatten inline

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed mispelled MSVC syntax here!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow now start build it in you repo and check it

@joneqdaniel joneqdaniel May 21, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there are no action workflow builds possible in a fork.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the scripts are not setup.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see the run workflow now.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you.

lndpj added 2 commits May 21, 2026 18:44
- assure ID_INLINE inlines everything
- add vec(4,byte) and reindent not dependent on ts
- inline ColorBytes and move to q_shared.h
- add avec(N,T) and use it for avec3_t/avec5_t
- add clang support
- add evec(N,T) SIMD vector extension type
- use __typeof__ instead of typeof
@ec-

ec- commented May 21, 2026

Copy link
Copy Markdown
Owner

Sorry, that's enough

@ec- ec- closed this May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants