The Settings app enables you to configure how Datatailr operates for all users.

You may refer to Configuring Datatailr Settings for a description of the settings to be defined on day one of using Datatailr. We recommend that after using Datatailr for a week or so, you scroll through all the settings (according to the descriptions below) and adjust them as needed.

Note – A global Datatailr administrator (Admin) and a member of the admin group can access the Settings app.

Click the Settings icon. The following is displayed –

Datatailr developer and Admin app (tools) icons with settings are displayed with their respective settings. Additional icons represent various Datatailr background processes. Note that not all Datatailr apps (tools) have settings.

Datatailr IDE

The following describes the fields –

  • CPU – The Default CPU (MHz) field specifies the default amount of CPU assigned to a developer’s environment if they do not make a selection. The Min CPU and Max CPU fields specify the limits of the range of CPU that developers can choose for their environment (container).
  • Memory – The Default Memory (MiB) field specifies the default amount of memory assigned to a developer's environment if they do not make a selection. The Min Memory and Max Memory fields specify the limits of the range of memory that developers can choose for their environment.
  • Default Image – Specifies the default image (fallback image) that is used to launch the Datatailr IDE, when a user has not made their own selection or when a default image has not been assigned to the group to which the user belongs.
  • Inactivity Timeout – In order to save cloud resource costs, this value specifies the amount of time after which the IDE container (and all the programs running inside) are automatically terminated if a user does not interact with it. No data is lost, and the IDE can be restarted at any time by clicking the Datatailr IDE icon. The default is three days.

Jupyter

Developers can create Jupyter Notebooks for end-users.

  • CPU – The Default CPU (MHz) field specifies the default amount of CPU assigned to a developer’s environment if they do not make a selection. The Min CPU and Max CPU fields specify the limits of the range of CPU that developers can choose for their environment (container).

  • Memory – The Default Memory (MiB) field specifies the default amount of memory assigned to a developer's environment if they do not make a selection. The Min Memory and Max Memory fields specify the limits of the range of memory that developers can choose for their environment.

  • Default Image – Specifies the default image (fallback image) that is used to launch the Jupyter lab development environment when a user has not made their own selection or when a default image has not been assigned to the group to which the user belongs.

  • Inactivity Timeout – In order to save cloud resource costs, this value specifies the amount of time after which the personal Jupyter Notebook instance that is assigned to each end-user is automatically terminated if the user does not interact with it. If there are any transient states stored in the Jupyter Notebook, they are also terminated permanently. No data is lost, and the Jupyter environment can be restarted at any time. The default is three days.

Image Builder

The following are the default settings of the Image Builder assigned to users.

  • CPU – Specifies the default amount of CPU assigned to your instance of the Image Builder app.

  • Memory – Specifies the default amount of memory assigned to your instance of the Image Builder app.

  • Inactivity Timeout – Specifies after how long your Image Builder instance will terminate if not used. No data is lost.

  • Block/Allowlist – Specifies the Python packages to be included or excluded in the selection of Python packages that can be selected while building a container image. The Blocklist excludes the specified packages, meaning that they cannot be selected when building a container image. The Allowlist only allows the selection of the specified packages, meaning that only they can be selected when building a container image.

  • Click the Configure button. The following is displayed –

  • Select the Block List or the Allow List option on left of the
    window.

  • Each time you select a Python package to be excluded/included it is moved to the right side of the window.

  • Click OK.

Package Builder

  • CPU – Specifies the default amount of CPU assigned to your instance of the Package Builder app.

  • Memory – Specifies the default amount of memory assigned to your instance of the Package Builder app.

  • Inactivity Timeout – Specifies after how long your Package Builder instance terminates if not used. No data is lost.

Auto Scaling

Datatailr’s Auto Scaling feature dynamically allocates just enough compute when it is required, as defined by the Auto‑Scaling app in the Datatailr Settings app. It closely monitors compute demand, and allocates additional VMs as needed. As soon as commute demand is no longer needed, Datatailr automatically shuts down the relevant VMs in order to save costs. You can mark certain VMs as exempt from Auto Scaling. Datatailr does not terminate VMs that were scheduled manually or VMs that have been configured by their owners not to be terminated.

In the Image Manager , hovering over the name of a VM shows whether the Automatically Scaling app is active or not, meaning AutoscalerManaged = True or False.

The following settings are provided for Auto Scaling –

  • Memory Instance Type – Specifies the type of memory instance to be dynamically allocated by the Auto-scaling app when a job runs out of memory and requires an additional instance. This might happen because there are many instances of small jobs running or even one very large job. Clicking on this dropdown menu shows a selection of options, as shown below –
  • The meaning of these options is quite cryptic and is explained below.

Each option in the dropdown menu represents a box.

The name of each box starts with the box type (for example, r5 or r5a), followed by a dot (.) and then followed by a size. For example, r5a.large. r5a.xlarge, r5a.2xlarge and so on.

To understand the specifications of each box, hover over it in the dropdown menu, which displays a pop-up of its specifications. For example, the following shows the specifications of r5a.large, which has 16 GB of memory –

Note – An extremely large job requires an extremely large box on which to run. When there are multitudes of small jobs, then Datatailr favors spawning smaller boxes and then later terminating individual boxes when they are no longer needed.

IMPORTANT! Therefore, when defining these settings, it is highly recommended to select the smallest box (menu option) in each range (type) (for example, in the r5 range, it might be r5.large or in the r5a range, it might be r5a.large). The Auto-scaling app then dynamically adds this amount of memory when required. In addition,
Datatailr automatically detects whether adding this amount is enough, and if it’s not, Datatailr instead adds as large a box as needed from the same range. For example, if you selected r5a.large (which only has 16 GB), and then run a runnable that requires a larger chunk of additional memory, then the Auto-scaling app automatically selects the next biggest (but large enough) r5a memory box, which in our example might be r5a.xlarge (if that’s big enough to run the current app) and if it’s not then the Auto-scaling app would automatically select a larger one, such as r5a.2xlarge.

Note – The Auto-scaling app always spawns the box that is larger than what is required by at least 10%.

To further this example, if a runnable requires that an additional 220 GB of memory be spawned, then one of the larger boxes in the r5a range would be selected, such as r5a.8xlarge, which has 256 GB, as shown below –

Note – When the Auto-scaling app spawns additional boxes, they are added to the boxes that are already running. If a box is not running any processes and is too small to run the required process, it is terminated.

  • CPU Instance Type – Specifies the CPU size that is automatically added when the Auto-scaling app detects that the running jobs have exhausted their CPU. The same logic used for the Memory Instance Type field (described above) is used here, except it applies to CPU boxes, as shown below –
  • ARM CPU Instance Type / ARM Memory Instance Type / GPU Instance Type – Specifies the type and size of the box automatically added when the Auto-scaling app detects that the running jobs require more.

  • Max #AMD, Max #ARM and Max #GPU – Specifies the maximum number of these types of boxes that Datatailr will automatically spawn. If additional resources over and above this maximum are required, then the job must wait until resources become available.

  • Min Instance Lifetime – Specifies the minimum amount of minutes that a newly spawned instance remains running before it is terminated, even if it’s not needed.

  • Min Instance Grace Time – Specifies the frequency at which Datatailr checks whether an instance is needed or should be terminated, such as every 10 minutes.

Cost Manager

  • Enabled – Enables or disables Datatailr’s Cost Manager app.

  • IDE Cost Center – The IDE cost center is created in Datatailr by default, and all the costs of developers are assigned to it.

You can define additional cost centers using the Cost Manager app.

Approvals

  • Transition – Enable you to define that approval is required for transitioning images and packages developed in Datatailr from the Datatailr Dev to Pre or from the Pre to Prod environment.

  • Users/Groups – Enables you to specify one or more users and/or group members that must approve the transition of images and packages as described above.

Min/Max Versions

Enables you to limit the choices of the Python, Julia and Rust versions that can be selected by developers when they build a container image in the Image Builder. Click Save to store definitions.

PyPi Server

Datatailr provides a default storage repository in which to store Python packages. This option enables you to specify your own repository by entering its protocol, host and port.

Cargo Server

Datatailr provides a default Cargo server. This option enables you to specify your own Cargo server by entering its protocol, host and port.

Email

Change the Production Use field to Enabled to enable Datatailr to send batch job email notifications.

In the Sender field, enter the email address to appear as the sender of batch job email notifications sent by the Job Scheduler.

Make sure to enable email for production use. The Datatailr user interface may provide a link to the cloud service that you’re using. For example, for the AWS Services, you can click the AWS SES Documentation link in the user interface, which appears in the user interface as shown above. Amazon may take a few days to activate this mail server feature.