RIKEN CCJ (PHENIX Computing Center in Japan)

RIKEN CCJ users guide(2016.01.01)

  1. Overview
    PHENIX Computing Center in Japan (RIKEN CC-J) is located at RIKEN Wako campus and is intended as a principal site of computing for PHENIX simulation, a regional PHENIX computing center especially for Japanese and Asian collaborators, and a center for the analysis of RHIC spin physics. CC-J is operated by the local Planning and Coordination Office (PCO), headed by Takashi Ichihara . In addition, the CC-J Working Group has been organized by PHENIX-J members to prepare and construct CC-J. Because CC-J is still under construction phase of three years, users are encouraged to contact these people closely. CC-J has its own web site. Please visit http://ccjsun.riken.go.jp/ccj/.
  2. How to start
    1. Get account.
    2. login ccjgw.riken.go.jp or ccjsun.riken.go.jp from WAN and login linux1/2/3/4/5, which are linux nodes assigned for interactive work (edit, compile, test, submitting jobs etc.) from ccjgw or ccjsun. ccjgw and ccjsun are operated with Linux. linux4 and 5 are operated by Scientific Linux 5.3 as same as PHENIX/RCF. linux1 is operated by SL3, and linux2 and 3 are by SL4.4 for the compatibility. From 2007/09/01, ssh public-key authentication is requried to login the login servers.
    3. Please use ccjsun.riken.go.jp to transfer your files to/from other site because the Linux farm cannot be seen from WAN. "scp", "ssh protected ftp" and "bbftp". These can be used between rftpexp.rcf.bnl.gov. On rcas20??, ccjsun can be seen using scp. About bbftp, you can find the performance achieved in CCJ here. 'bbftp' command should be operated on CCJ, not on rftpexp or rcas at RCF. If you need to transfer more than 50GB of files, please consult phenix-ccj-admin -at- ribf.riken.go.jp.
    4. make binaries for the Linux Farm on above interactive hosts. PHENIX software environment is also build here using AFS.
    5. Users have to use the LSF to submit large scale jobs which use many CPU's. Sample script to submit using LSF is here: http://ccjsun.riken.go.jp/ccj/doc/LSF/Primer.html.
    6. A large scale storage HPSS in CCJ was removed in 2015/Mar. The users data and DSTs are archived on the new HSM in HOKUSAI, operated RIKEN ACCC (IT division), can be accessed using 'scp'. The directory is /arc/CCJ/(users)/. To access the disk, user account in HOKUSAI is required. Almost users already sent the account in 2015/5/25 by RIKEN IT division. If you lost that, please consult to phenix-ccj-admin -at- ribf.riken.go.jp.
    7. Temporary disk on each Linux node (/job_tmp) is used as a read/write buffer for jobs on each node. Users have to clear the disk at the end of each job running on the node. If any file is found after jobs, it can be removed by System Manager. Please use "/opt/ccj/bin/rcpx" command instead of rcp command in order to upload/download the large files to/from local disk from/to non-local disk. It limit the maximum number of access to disk at the same time and prevent the crowding out of the NFS server.
    8. Back up policy is here. Users are recommended to back up own important source files by themselves via network.
    9. Tutorials
    10. local software
    11. technical local rules/tips (in Japanese)
  3. System Configuration
    The system configuration is summarized in this figure. The key components are;
    • Login Server (ccjgw.riken.go.jp, ccjsun.riken.go.jp)
    • Linux CPU Farm (28 nodes)
    • Disk Server(ccjnfs16, ccjnfs17, ccjnfs30)
    • Network

    1. Login Server(ccjgw.riken.go.jp and ccjsun.riken.go.jp)
      Users have to login this machine from WAN, because Linux Farm have only private IP address. Users can receive e-mail at ccjsun, especially from Mailing List for user support, but Japanese input and tex are not supported.
    2. Linux CPU Farm
      The Linux CPU Farm consists 18 dual quad-core Xeon nodes ap001~ap018 and hexa-core ap019-028 (thus 264 cores). OS is the Scientific Linux 5.3 since April 2010. They are in the private IP address space and can be reached only via ccjgw/ccjsun. Each node has local disk space, which should be used as a temporary work area for each job. The area must be made clean after each job. Now 5 nodes are assigned for interactive use and aliased linux1~5 respectively now, while others are for batch jobs where users should use LSF to submit jobs. Interactive nodes subject to change. Users are recommended to use the alias linux1~5.
    3. Disk Server
      The Disk Server ccjnfs16/17/30 are NFS servers for /ccj/u and /ccj/w, and ccjnfs30 is also a NIS server. Users cannot login these servers, except for the large scale data transfer.
    4. Network
      Disk Servers, login Servers and calculation nodes are connected with Gigabit Ethernet and a Gigabit Ether Switch. Super-Sinet/SINET4 (10Gbps) connects between the Internet and RIKEN-NET. Between the RIKEN-NETand CCJ(Disk servers and CPUs), and CCJ and HOKUSAI(dedicated CPUs) are connected with 10gigabit fibers.
    5. HOKUSAI
      HOKUSAI is computing system operated by RIKEN IT division. Ten nodes, mpc2001-2010 are dedicated to CCJ use. The PHENIX environment are shared by the mpc cluster. Because the NIS are separated, user have to make another application form to login the mpc cluster. Please consult the phenix-ccj-admin -at- ribf.riken.go.jp to get the account. After the application is processed by RIKEN IT division, you can use same login name and same home directory as CCJ on the mpc cluster. mpc2001 is the interactive nodes of the cluster. You can login them from ccjgw or ccjsun as for linux1-5. If you want to submit the batch job to mpc cluster, you have to submit them from the mpc2001. Condor is used as the batch queueing system on the cluster. Archive system is also served by HOKUSAI, replacing the old HPSS.
    see also http://ccjsun.riken.go.jp/ccj/Wako-Sys/to get more detail infomation.
  4. User Accounts
    Accounts are issued to the responsible persons for the large scale computing project authorized by PHENIX Physics Working Groups (PWG). In principle 3 accounts at most are allowed for each PWG . Users have to fill the account request form and email it to phenix-ccj-admin -at- ribf.riken.go.jp.
    CC-J user also should have a visiting position of RIKEN. If you don't have any position in RIKEN, please read here(Japanese || English ) and fill out another form.
    When you lose your RIKEN position, your account subject to be suspend. In such a case, please consult us to access your account.
    For the ssh-publickey login, the fingerprint of your key should be put with the account request form. If you have your web page in RCF, the publickey ifself should be located your WWW/p/draft-region and the URL should be notified. If you don't have account in RCF, please consult us.
    Initial disk quota is 4GB/5GB(soft/hard limit) on /ccj/u/ (user home region) and 40GB/50GB on /ccj/w/r01 (work region), which are served by ccjnfs30. Users can confirm their own quota sizes by themselves using a command 'quota -v' on any node, via xfs. When one project is finished, account may be suspended till next project is started.
    storage usage guide line
    • /ccj/u/ and /ccj/w/ are quota-limited.
    • /job_tmp on each Linux node can be used the temporary working area. Users should make own directory named their username under /job_tmp of each node and use it. Temporary files under the directory should be removed by the end of each job on the node.
  5. Proposal of large scale computing project
    CCJ-PCO expects that proposals of computing project to use the CCJ computing resources are approved in and submitted from Physics Working Groups in PHENIX. The proposals are submitted to CCJ-PCO either directly or through the simulation coordinator in PHENIX from Physics Working Groups. CCJ-PCO allocate the available computing resources to each project. If necessary, job priority is defined by CCJ-PCO reflecting the PHENIX decision made in the collaboration meeting. CCJ-PCO reports regularly to the PHENIX collaboration the status of the computing projects, including allocation of computing resources, the priority, and the completeness. A conflict amongst the projects is to be consulted to PHENIX spokes- person and Executive Council if the conflict is beyond the coordination by CCJ-PCO.
    The template of the proposal is here: http://ccjsun.riken.go.jp/ccj/forms/app.html . Fill and send by Email to ccj-pco -at- ribf.riken.go.jp
  6. Batch Queues(LSF)
    LSF (Load Sharing Facility) is also used at RCF.
    In CCJ, many queues (short,long,bg etc.) are set now. See http://ccjsun.riken.go.jp/ccj/doc/LSF/index.html.
    For PWG work, other queue for each group will be made to control the job priority by CCJ-PCO.
    Submit the jobs from the interactive nodes linux1-5 to ap001-028.
  7. PHENIX software environment
    PHENIX Linux software environment at CC-J is pretty much similar to that of RCF. Because you will find "/afs/rhic/phenix" and "/opt/phenix" directories from your Linux environment.
    -----> If you have experienced RCF/PHENIX user, this is enough to start.

    In order to get a standard PHENIX environment, DO source /opt/phenix/bin/phenix_setup.csh However, there is one point to remember. CC-J Linux nodes do not use AFS, neither Transarc's original AFS or openAFS. But these directories "/afs/rhic/...", "/opt/phenix/..." and "/cern" are served via NFS. "sys name" (@sys) is symbolic-linked at every switchyard.
    No klog necessary for "CVS checkout". But of course it non-sense to do "CVS checkin", because it will not reflect real AFS server at RCF.

    -----> If you are PHENIX software beginner, follow a very basic tutorial, To BUILD and RUN PISA99 (PHENIX version of GEANT3) !!

    Objectivity/DB is not supported yet. PostgreSQL database of PHENIX is copied and operated. See http://ccjsun.riken.go.jp/ccj/doc/phenix-data/phenix-db/index.html
  8. PHENIX data file storage
    When user transport the PHENIX data file ( raw data or DST ) from RCF to CC-J, please consult phenix-ccj-admin -at- ribf.riken.go.jp. We assign some disk space for public use. A part of nDSTs are available on the disks. See also the page:PHENIX DATA Location at CCJ
  9. Backup policy
    /ccj/u and /ccj/w are RAID disk, while /job_tmp on each Linux node is non-RAID. /ccj/w and /job_tmp are NOT backed up. /ccj/u is 'rsync'ed to ccjnfs16:/ccj/sys-bkp/ccj-u-bkp once a day. User can retrieve own data from there if it is just after removing it. /ccj/u is also backed up to be able to recover from a disk crash. Restore requests coming from users who have lost files will NOT be approved.
  10. System Monitoring tool
  11. mailing lists and announce/information for users
    Users should subscribe a mailing list ccj-users. Notices concering shutdowns and other informations are announced in this list. Users can report any troubles about CC-J to this list. Use English.
    PHENIX-J members are recommended to subscribe a mailing list ccj-users-j. This list is for Discussion on CC-J as a regional center. Use Japanese.
    A mailing list ccj-pco is for the administrative issue.
    Users(phenix-j member) will be added in the list ccj-users (and ccj-users-j) when they get own account. Mails will be delivered to your ccj account. Please set your .forward if you want to get mails on your mail server.

Last modified: Fri Mar 1 09:38:51 2019
S. Yokkaichi/T. Nakamura Back to the CC-J Home page