linux内存占用问题

在测试vllm部署的时候,因为可以调节对应的内存,并且会被vllm真实的占用,在部署到64G内存的机器时,调节VLLM_CPU_KVCACHE_SPACE为30以上之后一直报错如下:

Traceback (most recent call last):
  File "/opt/code/vllm_reranker_test.py", line 89, in <module>
    main()
  File "/opt/code/vllm_reranker_test.py", line 80, in main
    llm = get_llm()
          ^^^^^^^^^
  File "/opt/code/vllm_reranker_test.py", line 28, in get_llm
    return LLM(
           ^^^^
  File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/vllm-0.10.2rc1+cpu-py3.11-linux-x86_64.egg/vllm/entrypoints/llm.py", line 272, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/vllm-0.10.2rc1+cpu-py3.11-linux-x86_64.egg/vllm/engine/llm_engine.py", line 492, in from_engine_args
    return engine_cls.from_vllm_config(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/vllm-0.10.2rc1+cpu-py3.11-linux-x86_64.egg/vllm/v1/engine/llm_engine.py", line 127, in from_vllm_config
    return cls(vllm_config=vllm_config,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/vllm-0.10.2rc1+cpu-py3.11-linux-x86_64.egg/vllm/v1/engine/llm_engine.py", line 104, in __init__
    self.engine_core = EngineCoreClient.make_client(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/vllm-0.10.2rc1+cpu-py3.11-linux-x86_64.egg/vllm/v1/engine/core_client.py", line 80, in make_client
    return SyncMPClient(vllm_config, executor_class, log_stats)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/vllm-0.10.2rc1+cpu-py3.11-linux-x86_64.egg/vllm/v1/engine/core_client.py", line 600, in __init__
    super().__init__(
  File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/vllm-0.10.2rc1+cpu-py3.11-linux-x86_64.egg/vllm/v1/engine/core_client.py", line 446, in __init__
    with launch_core_engines(vllm_config, executor_class,
  File "/root/anaconda3/envs/py311/lib/python3.11/contextlib.py", line 144, in __exit__
    next(self.gen)
  File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/vllm-0.10.2rc1+cpu-py3.11-linux-x86_64.egg/vllm/v1/engine/utils.py", line 733, in launch_core_engines
    wait_for_engine_startup(
  File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/vllm-0.10.2rc1+cpu-py3.11-linux-x86_64.egg/vllm/v1/engine/utils.py", line 786, in wait_for_engine_startup
    raise RuntimeError("Engine core initialization failed. "
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {'EngineCore_0': -9}

一开始以为是资源不足,一直加资源重试,或者释放buff的内存,直到后面降低资源配置重试才成功,最后才发现端倪

通过free -h命令一直关注内存发现占用不到一半内存之后,转而去占用swap的内存了,直到swap的内存占用完就直接报错了
image.png

一查发现这块知识缺失,补一下,补一下

内核参数 swappiness 的值的大小,决定着linux何时开始使用swap

比如:

  • swappiness=0 时表示尽最大可能的使用物理内存以避免换入到swap
  • swappiness=100 时候表示最大限度使用swap分区,并且把内存上的数据及时的换出到swap空间里面.

通过命令查看参数的配置大小

cat /proc/sys/vm/swappiness

发现配置的是60(应该是linux默认的配置),因此也就是说使用到内存总量的40%的时候就会开始使用swap交换区


标题:linux内存占用问题
作者:linrty
地址:https://blog.linrty.top/articles/2025/09/01/1756719842487.html

    评论
    0 评论
avatar

取消