The GFI computing system: Difference between revisions

From gfi
Ngfih (talk | contribs)
Hso039 (talk | contribs)
mNo edit summary
 
(14 intermediate revisions by 2 users not shown)
Line 1: Line 1:
The Geophysical Institute use a department compute server cluster  named "cyclone". Cyclone consists of three separate compute nodes organized in a cluster which access the same centralized storage. Two of the nodes are for regular computational tasks, and one is set up with a GPU to enable use for machine learning (ML) application. The new cyclone replaces the older system '"cyclone.hpc.uib.no", acquired in Summer 2018, which at the time of writing this documentation is still accessible as well (14.03.2026).
The Geophysical Institute uses a department compute server cluster  named "cyclone". Cyclone consists of three separate compute nodes organized in a cluster which access the same centralized network storage. Two of the nodes are for regular computational tasks, and one is set up with a GPU to enable use for machine learning (ML) applications. The new cyclones replace the older system '"cyclone.hpc.uib.no", acquired in Summer 2018, which at the time of writing this documentation is still accessible as well (14.03.2026).


The server cluster is maintained by UiB ITA and compute resources are located in NREC as virtual servers, currently operated with linux, Rocky 8
The server cluster is maintained by [https://www.uib.no/en/brita UiB ITA] and compute resources are located in [https://nrec.no NREC] as virtual servers, currently running Linux, Rocky 8


All users of the cyclones should subscribe to the mailing list [linux@gfi.uib.no] to receive updates about maintenance downtime etc.  
All users of the cyclones should subscribe to the mailing list [linux@gfi.uib.no] to receive updates about maintenance downtime etc.  


Please notify the UiB support [https://hjelp.uib.no * https://hjelp.uib.no] in any case of problems or questions. You may give your support request some ref. to  "Group of scientific computing".
Please notify the UiB support [https://hjelp.uib.no https://hjelp.uib.no] in any case of problems or questions. And please be accurate to mark your support request with some ref. to  "Group of scientific computing" - and with all necessary details for UiB support to decode your issue description.


In the following, you will find:
In the following, you will find:
Line 18: Line 18:


* all other researchers and technical staff for data analysis, data storage, and routine computations.
* all other researchers and technical staff for data analysis, data storage, and routine computations.
The cyclone compute system has been set up to enable users to perform smaller-scale data analysis work in a low-threshold environment. This means that there is no queue system for resource allocation. Users can use the system interactively, running applications with a GUI, or via Linux shell etc.
All users share the same resources, without having to deal with a queue system. This requires users to excert discipline when using resources to prevent system instability due to too heavy load.
Common use cases include data analysis connected to the shared data files in the storage environment, model simulations lasting up to several days, and tests of larger-scale computations before taking those to the national computation infrastructures NRIS.


==Requirements==
==Requirements==
Line 33: Line 38:
* optimized for serial I/O (as is typical for post-processing tasks)
* optimized for serial I/O (as is typical for post-processing tasks)


* single-node parallelisation jobs
* single-node parallelization jobs


* lifetime of 5-7 years
* lifetime of 5-7 years
Line 40: Line 45:


The system has been created as a low threshold, multi-user system. Resources are limited, and shared among all users. This implies that users need to have an overview over which resources they are using.
The system has been created as a low threshold, multi-user system. Resources are limited, and shared among all users. This implies that users need to have an overview over which resources they are using.
Users are expected to read and relate to the "MOTD" message at ssh-login. This monitors computer load and memory consumption. Please follow server load during login period - top/htop and free ('free -h -t') utilities. Do not initiate another heavy compute process on multiple cores if the compute node already runs heavy on load or memory consumption, then rather check the alternate server.


The system characteristics, and the good user practice that follows along with these, are as follows:
The system characteristics, and the good user practice that follows along with these, are as follows:
Line 47: Line 54:
* Computational tasks will be terminated if users log out. To keep jobs running while logged out, start jobs in the background using the  linux commands 'tmux', 'screen' or 'nohup'.
* Computational tasks will be terminated if users log out. To keep jobs running while logged out, start jobs in the background using the  linux commands 'tmux', 'screen' or 'nohup'.


* There is resource limitation per user. Each session can use up to 8 out of the 64 CPU cores. In addition, each session can allocate up to 120 GB of memory. This is meant to prevent individual users from accidentally bringing down the system.
* There is a resource limitation per user. Each session can use up to 8 out of the 64 CPU cores. In addition, each session can allocate up to 120 GB of memory. This is meant to prevent individual users from accidentally bringing down the system.


* Software packages are activated using <nowiki>'''</nowiki>module load<nowiki>'''</nowiki> command. Use 'module spider <software>' to find available software. The available software stack is identical on all cyclones.
* Software packages are activated using <nowiki>'''</nowiki>module load<nowiki>'''</nowiki> command. Use 'module spider <software>' to find available software. The available software stack is identical on all cyclones.


* Users have access to a home directory, a work storace, and long-term project and shared resource storage. The same storage environments is connected to all of the cyclones. Use the work storage for all non-permanent input/output. Files on the work environment are deleted after 60 days. Use NIRD for long-term storage of larger output.
* Users have access to a home directory, a work storage, and long-term project and shared resource storage. The same storage environment is connected to all of the cyclones. Use the work storage for all non-permanent input/output. Files on the work environment are deleted after 60 days. Use NIRD for long-term storage of larger output.


* The system is maintained by the experts from UiB's HPC group. There is bi-weekly maintenance planned, typically on Wednesdays. Sometimes this will involve a reboot of the system. Plan longer jobs that they don't get interrupted or interfere with maintenance scheduling.
* The system is maintained by the experts from UiB's HPC group. There is bi-weekly maintenance planned, typically on Wednesdays. Sometimes this will involve a reboot of the system. Plan longer jobs so that they don't get interrupted or interfere with maintenance scheduling.
* Subscribe to the mailing list [linux@gfi.uib.no](<nowiki>mailto:linux@gfi.uib.no</nowiki>) to receive updates about problems and maintenance
* Subscribe to the mailing list [linux@gfi.uib.no](<nowiki>mailto:linux@gfi.uib.no</nowiki>) to receive updates about problems and maintenance


== Storage environment ==
== Storage environment ==


GFI&SKD have a disposal of approx 450TB network storage quotas from UiB ITA(May 2024). This storage is all organized via ITA standard NetApp solution and distributed towards cyclones servers, GFI&SKD Windows clients, Macos clients and Linux clients. The storage is also available for GFI&SKD users connecting from UiB-vpn.
GFI&SKD have a disposal of approx 450TB network storage quotas from UiB ITA(May 2024). This storage is all organized via ITA standard NetApp solution and distributed towards cyclones servers, GFI&SKD Windows clients, MacOS clients and Linux clients. The storage is also available for GFI&SKD users connecting from UiB-vpn.


*The GFI&SKD storage is organized into larger shared areas for model data, more limited group and project areas, areas for individual users and common areas for short time storage. The different storage areas are in general maintained via ITA backup procedures and governed by group or individual storage quotas.  
*The GFI&SKD storage is organized into larger shared areas for model data, more limited group and project areas, areas for individual users and common areas for short time storage. The different storage areas are in general maintained via ITA backup procedures and governed by group or individual storage quotas.  


* GFI&SKD storage is organized in two main folders:     Linux: /Data/gfi   - Win11/Mac: \\klient.uib.no\felles\matnat\gfi   and       Linux: /Data/skd   - Win11/Mac: \\klient.uib.no\felles\matnat\skd
* GFI&SKD storage is organized in two main folders:
    Linux: /Data/gfi   - Win11/Mac: \\klient.uib.no\felles\matnat\gfi   and
 
    Linux: /Data/skd   - Win11/Mac: \\klient.uib.no\felles\matnat\skd


== Computational resources ==
== Computational resources ==

Latest revision as of 10:35, 23 March 2026

The Geophysical Institute uses a department compute server cluster named "cyclone". Cyclone consists of three separate compute nodes organized in a cluster which access the same centralized network storage. Two of the nodes are for regular computational tasks, and one is set up with a GPU to enable use for machine learning (ML) applications. The new cyclones replace the older system '"cyclone.hpc.uib.no", acquired in Summer 2018, which at the time of writing this documentation is still accessible as well (14.03.2026).

The server cluster is maintained by UiB ITA and compute resources are located in NREC as virtual servers, currently running Linux, Rocky 8

All users of the cyclones should subscribe to the mailing list [linux@gfi.uib.no] to receive updates about maintenance downtime etc.

Please notify the UiB support https://hjelp.uib.no in any case of problems or questions. And please be accurate to mark your support request with some ref. to "Group of scientific computing" - and with all necessary details for UiB support to decode your issue description.

In the following, you will find:

Users and intended use

The intended users at GFI for the cyclones are:

  • MSc students for course and thesis work.
  • PhD students and PostDocs for scientific work.
  • all other researchers and technical staff for data analysis, data storage, and routine computations.

The cyclone compute system has been set up to enable users to perform smaller-scale data analysis work in a low-threshold environment. This means that there is no queue system for resource allocation. Users can use the system interactively, running applications with a GUI, or via Linux shell etc.

All users share the same resources, without having to deal with a queue system. This requires users to excert discipline when using resources to prevent system instability due to too heavy load.

Common use cases include data analysis connected to the shared data files in the storage environment, model simulations lasting up to several days, and tests of larger-scale computations before taking those to the national computation infrastructures NRIS.

Requirements

Users should be able to use the system in interactive use (click and point, typing) and in batch use (using shell scripts). Typical tasks are data analysis, plotting. GFI will also run routine data processing of observations and forecasts on this system.

Therefore, we required the system to be or have

  • low threshold access (easy to use)
  • interactive use
  • safe usage (against unintentional overuse)
  • optimized for serial I/O (as is typical for post-processing tasks)
  • single-node parallelization jobs
  • lifetime of 5-7 years

System characteristics and good user practice

The system has been created as a low threshold, multi-user system. Resources are limited, and shared among all users. This implies that users need to have an overview over which resources they are using.

Users are expected to read and relate to the "MOTD" message at ssh-login. This monitors computer load and memory consumption. Please follow server load during login period - top/htop and free ('free -h -t') utilities. Do not initiate another heavy compute process on multiple cores if the compute node already runs heavy on load or memory consumption, then rather check the alternate server.

The system characteristics, and the good user practice that follows along with these, are as follows:

  • There is (currently) no queue system on any of the "cyclones". Computational tasks are started immediately.
  • Computational tasks will be terminated if users log out. To keep jobs running while logged out, start jobs in the background using the  linux commands 'tmux', 'screen' or 'nohup'.
  • There is a resource limitation per user. Each session can use up to 8 out of the 64 CPU cores. In addition, each session can allocate up to 120 GB of memory. This is meant to prevent individual users from accidentally bringing down the system.
  • Software packages are activated using '''module load''' command. Use 'module spider <software>' to find available software. The available software stack is identical on all cyclones.
  • Users have access to a home directory, a work storage, and long-term project and shared resource storage. The same storage environment is connected to all of the cyclones. Use the work storage for all non-permanent input/output. Files on the work environment are deleted after 60 days. Use NIRD for long-term storage of larger output.
  • The system is maintained by the experts from UiB's HPC group. There is bi-weekly maintenance planned, typically on Wednesdays. Sometimes this will involve a reboot of the system. Plan longer jobs so that they don't get interrupted or interfere with maintenance scheduling.
  • Subscribe to the mailing list [linux@gfi.uib.no](mailto:linux@gfi.uib.no) to receive updates about problems and maintenance

Storage environment

GFI&SKD have a disposal of approx 450TB network storage quotas from UiB ITA(May 2024). This storage is all organized via ITA standard NetApp solution and distributed towards cyclones servers, GFI&SKD Windows clients, MacOS clients and Linux clients. The storage is also available for GFI&SKD users connecting from UiB-vpn.

  • The GFI&SKD storage is organized into larger shared areas for model data, more limited group and project areas, areas for individual users and common areas for short time storage. The different storage areas are in general maintained via ITA backup procedures and governed by group or individual storage quotas.
  • GFI&SKD storage is organized in two main folders:

    Linux: /Data/gfi   - Win11/Mac: \\klient.uib.no\felles\matnat\gfi and

    Linux: /Data/skd   - Win11/Mac: \\klient.uib.no\felles\matnat\skd

Computational resources

The compute system is separated into three separate nodes. Two nodes (cyclone1 and cyclone2) are dedicated to computing tasks without GPU use. A third node is dedicated for GPU applications (cyclone 3). In detail, the specifications of the 3 compute nodes are:

Cyclone 1 - CPU applications (cyclone1.gfi.uib.no):

    AMD EPYC Processor, 64 physical cores and 512 GB memory per node, 2.3 GHz - no GPU

Cyclone 2 - CPU applications (cyclone2.gfi.uib.no):

    AMD EPYC Processor, 64 physical cores and 512 GB memory per node, 2.3 GHz - no GPU

Cyclone 3 - GPU applications (cyclone3.gfi.uib.no):

    Intel CPU, 16 physical cores, 128 GB memory

    1 NVIDIA GPU, L40S-24Q, 24 GB GPU memory

GPU acceleration

One of the cyclones (cyclone3) is equipped with an advanced Graphics Processing Units (GPUs) of type NVIDIA L40S-24Q, and is reserved for GPU application usage only. The GPU is shared between all users on the system.

The GPU usage can be monitored using the command: "nvidia-smi".

Programming on this GPU is done using CUDA.

Modules are available for programming with CUDA.

Python users can load module PyTorch