The systems described in this document constitute a set of linux-based resources whose goal is to facilitate computational requirements of faculty members in the Economics Department or affiliated Social Sciences Research groups. This document is describes how to access and begin to use this collection of systems. The systems consist of three major components: 1) storage, 2) interactive nodes, and 3) batch nodes.
These systems are the NFS/SLURM backed research systems. If you are trying to utilize the older AFS/SGE backed systems, please visit the documentation here.
Super Short Version
Use ssh to connect to login.econ.duke.edu
Jobs can be run interactively with the following executables:
Any other generally available RHEL application.
Jobs can be submitted to the cluster with the following submission scripts:
Custom scripts can be written to be used with sbatch
The OS version of the software is Scientific Linux versus 6. This is binary compatible, and contains the same packages, as Red Hat Enterprise Linux 6. Any software generally available from RHEL can be installed easily upon request.
Additionally, the following software packages are available:
Matlab version 2018-b
Stata version 15 (SE and MP)
R version 3.5.2
Older versions of some software may be available under /econ/sw. For example, /econ/sw/matlab/R2014b is available in that path.
Other software not available through RPM package management will be installed under /econ/sw into a directory named after the package.
The username for the Economics Research Computing Cluster is going to be the same as your University NetID username. The password is a different password. If you do not know your password, please contact ECS: firstname.lastname@example.org. All faculty have accounts on this system. Master's and PhD students should request access by sending email to email@example.com. Guest access may be provided for others outside of the department, contact firstname.lastname@example.org for more information.
The Economics Research Computing Cluster currently has one front-end node. This may be referred to as a login node, front-end node, or interactive node. This system performs two main functions, to allow jobs to be run interactively, and to submit jobs to batch nodes.
The front-end node on the Economics Research Computing Cluster can be accessed using the host name faculty.econ.duke.edu, or, after 7/4/2016, login.econ.duke.edu.
To access faculty.econ.duke.edu, utilize an ssh client to connect to faculty.econ.duke.edu. Further details for OS X and windows are available below.
Please note that in the examples provided, the hostname login-01.econ.duke.edu is used in the screenshots. That will also work, however, faculty.econ.duke.edu is the preferred hostname to use, as additional login nodes may be available. Using faculty.econ.duke.edu as the host name will allow the system to distribute logins across the nodes in a somewhat balanced fashion.
All versions of OSX have access to an application called Terminal. This will get you command line access to the interactive node. It is possible to run applications like matlab and stata with their graphical interfaces, but later versions of OSX require the installation of XQuartz to enable the graphical user interface.
For Windows, at mimimum, you will need a terminal application. To utilize the GUI interfaces of applications such as Matlab, Stata and SAS, you will need to install an X11 emulator.
Accessing the interactive linux servers from a computer running Windows requires PuTTy, available for free on the Duke OIT Website. Assuming PuTTy is installed, following are the required steps to access the interactive cluster:
After installing PuTTy:
Double click on Putty to launch.
Enter the following connection settings:
Host Name: login.econ.duke.edu or faculty.econ.duke.edu
Connection Type: SSH (default)
On the left side Under Category
Click the “+” next to SSH
Click the “+” next to Auth
Check box: Enable X11 Forwarding
Go back to Session on the left pane
Under Saved Sessions, Type “Econ” or any name of your choosing
The most important setting is the X11 Forwarding. Without that set, the X-window system cannot find your PC for display. Save the configuration by typing a name (i.e. econ) in the box under 'Saved Sessions' on the Sessions screen. Press the Save button to save the configuration. Click Open to open the terminal window or Cancel to close PuTTy.
One conceptualization of the cluster is to divide it into two parts: storage and computational resources. Storage is where the files go, computational resources would refer the CPU’s and system memory (distinct from storage) that are used to process data.
By and large, there are two main places you will store your files as relates to the cluster. The first would be your home directory, the second would be in a research directory. Home directories have more finite controls placed on the limits of sizes. We are currently limiting home directory quotas to 10G. This eases administrative burdens of the cluster to keep home directory sizes more controlled. Research directories can be more generous in sizes.
The paths for home directories follow this pattern:
For example, if your username is tsefhg123, then your home directory would be located at the following path:
Research directories are located as subdirectories of /econ/research. The more specific path will vary according to function, but generally, the next level of the directory would use the netid of the primary researcher for that space. For example, if the netid of the primary researcher were ‘j_rt32’ the research space would be:
For broader projects, the space would take on a directory name reflective of the project.
The best way to transfer files to the cluster is to utilize an scp or sftp client. If you are familiar with command line usage, on OS X you can open up the terminal application and use scp directly from there. Otherwise, you will need to obtain a client such as filezilla which has the ability to do SFTP.
Sometimes it is useful to start up screen or tmux. These applications let you disconnect from a running command line session on the remote system and then reconnect at a later time or from another system. After you log into an interactive node, you invoke them with either the ‘screen’ or ‘tmux’ command. The full scope of using these applications is beyond this documentation, but googling ‘screen linux tutorial’ or ‘tmux linux tutorial’ should get you started.
A computer that is not accessed directly, but rather runs jobs that are distributed by SLURM from interactive nodes.
A collection of computers logically grouped to meet a goal. In the context of the faculty cluster, this collection is targeted to facilitate faculty computational designs.
command line / command prompt
The command line is a user interface typically used on UNIX and UNIX-like operating systems. It consists of a window in which you type commands to execute programs and perform tasks. This is in contrast to Graphical User Interfaces (GUI), which are better known and typical of desktop operating systems. A GUI typically uses a combination of mouse and keyboard to input instructions to the computer or program the computer is using. The command prompt or just prompt is where you input commands for the linux system.
Graphical User Interface – when a program presents an interface that utilized not just typed commands but also mouse and menu driven methods of interacting with the program, it is referred to a having a GUI or graphical user interface.
interactive node / login node
A system that is configured to allow direct logins and has the ability to run jobs directly on its own resources. Typically, in Econ, interactive nodes are also where jobs are sumitted to batch nodes.
Specifically, the code needing to be run on computational systems. Loosely, the term can also include the data that such a program needs in order to perform its calculation and (even more loosely) the results/output the program generates.
memory, or system memory, or RAM
On each individual computer, small amounts of very fast storage are used directly by CPU’s to cache results and store intermediary computations until it results in a form to be displayed on the display. This can be referred to as memory, system memory, or RAM. This is not to be confused with storage, or any discussion regarding home directory or research directory storage.
An individual computer that is a part of a set of systems. In our case, any individual computer in the cluster is referred to as a node. It may be a head node/login node, computational node, management node, etc.
Storage refers to where files are saved for long term reference. Within the economics cluster, storage is a networked resource with certain redundancies built in to ensure against a single disk failure resulting in a loss of data. Storage is distinct from system memory. Can also be referred to as disk space. If you are speaking about quota, you are speaking about storage, not memory.