Hello,
Probably I got unlucky and hit a weird bug when I was running vllm with TP 8. I got the following error from the quark library:
FileExistsError: [Errno 17] File exists: 'quark_logs'
From quark/shares/utils/log.py:20
if not os.path.exists(debug_file_dir):
os.makedirs(debug_file_dir)
I think this part is not multi-threading safe (potential race condition).
I think multiple threads passed the if guard and tried to create the folder which led to the error.
Hello,
Probably I got unlucky and hit a weird bug when I was running vllm with TP 8. I got the following error from the quark library:
FileExistsError: [Errno 17] File exists: 'quark_logs'From quark/shares/utils/log.py:20
I think this part is not multi-threading safe (potential race condition).
I think multiple threads passed the if guard and tried to create the folder which led to the error.