Issue

apex 설치하려고 했는데 다음과 같은 에러가 발생하였습니다.

$ pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
  (생략)
  # This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
  #
  # - It imports setuptools before invoking setup.py, to enable projects that directly
  #   import from `distutils.core` to work with newer packaging standards.
  # - It provides a clear error message when setuptools is not installed.
  # - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
  #   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
  #     manifest_maker: standard file '"'"'-c'"'"' not found".
  # - It generates a shim setup.py, for handling setup.cfg-only projects.
  import os, sys, tokenize
  
  try:
      import setuptools
  except ImportError as error:
      print(
          "ERROR: Can not execute `setup.py` since setuptools is not available in "
          "the build environment.",
          file=sys.stderr,
      )
      sys.exit(1)
  
  __file__ = %r
  sys.argv[0] = __file__
  
  if os.path.exists(__file__):
      filename = __file__
      with tokenize.open(__file__) as f:
          setup_py_code = f.read()
  else:
      filename = "<auto-generated setuptools caller>"
      setup_py_code = "from setuptools import setup; setup()"
  
  exec(compile(setup_py_code, filename, "exec"))
  '"'"''"'"''"'"' % ('"'"'/home/users/projects/apex/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' --cpp_ext --cuda_ext install --record /tmp/pip-record-fw79azy1/install-record.txt --single-version-externally-managed --compile --install-headers /home/users/projects/translator/venv/include/site/python3.8/apex
  cwd: /home/users/projects/apex/
  Running setup.py install for apex ... error
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> apex

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.


Solution

apex가 cuda version이 제대로 설정 안되어 있어서 문제가 생긴 것 같아 보입니다.

$nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

현재 제 디바이스에 깔린 cuda 버전은 11.6인데 10.1로 나오네요.

예전에 생각하기로는 path 설정 해주면 nvcc –version 했을 때 제대로 나온다는 기억이 남아서 시도해봤습니다.

$vi ~/.bashrc

.bashrc 파일을 열어서 아래 내용을 추가해줍니다.

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

다시 nvcc 버전을 확인해보면

$nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Fri_Dec_17_18:16:03_PST_2021
Cuda compilation tools, release 11.6, V11.6.55
Build cuda_11.6.r11.6/compiler.30794723_0

다시 apex 설치 명령어를 입력해봅니다.

$ pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

(생략)
running install_scripts
  writing list of installed files to '/tmp/pip-record-cntwtaqo/install-record.txt'
  Running setup.py install for apex ... done
Successfully installed apex-0.1

드디어 apex 설치 성공!!


megatron-lm 때문에 apex 설치하려던거라서 megatron-lm까지 설치했는데 잘되네요.


References