背景:實驗室的計算集群安裝了ROCKS集群管理軟體,系統是centos,PBS是troque。
問題:提交算例以后就一直處于Q等待調度的狀態。
[bgb@cluster test]$ qstat
Job id Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
25.cluster 0.01-0.00001 bgb 0 Q default
強制運行
[bgb@cluster test]$ qrun 25.cluster
pbs_iff: Access from host not allowed, or unknown host MSG=request not authorized from host cluster.local
pbs_iff: Access from host not allowed, or unknown host MSG=request not authorized from host cluster.local
qrun: Unknown Job Id MSG=cannot locate job 25.cluster.local
查了下pbs_iff,說是和用戶認證有關,為pbs server提供pbs信任狀。但是vi pbs_iff全是亂碼
弄了一天不知道到底是什么原因?
還有問一下關于Pbs佇列配置的問題:
我發現在/opt/troque目錄下有一個pbs.default檔案,default是我定義的一個佇列,打開以后如下:
#
# Create and define queue default
#
create queue default
set queue default queue_type = Execution
set queue default keep_completed = 120
set queue default enabled = True
set queue default started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_host_enable = False
set server managers = [email protected]
set server managers += [email protected]
set server default_queue = default
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server allow_node_submit = True
set server moab_array_compatible = True
這和我用qmgr -c 'p s'命令查到的佇列配置:
[bgb@cluster test]$ qmgr -c 'p s'
#
# Create queues and set their attributes.
#
#
# Create and define queue default
#
create queue default
set queue default queue_type = Execution
set queue default acl_host_enable = True
set queue default acl_user_enable = True
set queue default acl_users = bgb
set queue default enabled = True
set queue default started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_host_enable = True
set server acl_hosts = cluster.hpc.org
set server acl_users = root@*
set server default_queue = default
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server poll_jobs = True
set server mom_job_sync = True
set server auto_node_np = True
set server next_job_number = 26
不同,到底哪個配置被執行了?下面的配置有一部分是自己寫的,應該有不少錯誤的地方還望各位前輩能幫忙指正。
謝謝
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/95733.html
標籤:服務器
下一篇:AWS如何做子網隔離
