Contents

Dell Streaming Data Platform 1.4 Software Installation Guide PDF

1 of 191
1 of 191

Summary of Content for Dell Streaming Data Platform 1.4 Software Installation Guide PDF

Streaming Data Platform 1.4 Installation and Administration Guide

Version 1.4

September 2022 Rev. 1.2

Notes, cautions, and warnings

NOTE: A NOTE indicates important information that helps you make better use of your product.

CAUTION: A CAUTION indicates either potential damage to hardware or loss of data and tells you how to avoid

the problem.

WARNING: A WARNING indicates a potential for property damage, personal injury, or death.

2022 Dell Inc. or its subsidiaries. All rights reserved. Dell Technologies, Dell, and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be trademarks of their respective owners.

Chapter 1: Product Description..................................................................................................... 9 Product summary................................................................................................................................................................ 9 Product components.........................................................................................................................................................10

About Pravega.............................................................................................................................................................. 10 About analytic engines and Pravega connectors.................................................................................................. 11 Management plane....................................................................................................................................................... 11

Deployment options...........................................................................................................................................................12 Component deployment matrix...................................................................................................................................... 13 Architecture and supporting infrastructure ................................................................................................................14 Product highlights..............................................................................................................................................................16 More features......................................................................................................................................................................17 Basic terminology...............................................................................................................................................................19 Interfaces............................................................................................................................................................................ 20

User Interface (UI)......................................................................................................................................................20 Grafana dashboards.....................................................................................................................................................21 Apache Flink Web UI...................................................................................................................................................22 Apache Spark Web UI.................................................................................................................................................22 JupyterHub overview................................................................................................................................................. 24 APIs.................................................................................................................................................................................25

What you get with SDP................................................................................................................................................... 25 Use case examples............................................................................................................................................................25 Documentation resources............................................................................................................................................... 26

Chapter 2: SDP ESE Integration.................................................................................................. 27 Overview..............................................................................................................................................................................27 Configure SupportAssist..................................................................................................................................................27 SupportAssist port requirements.................................................................................................................................. 28 Connect To SupportAssist.............................................................................................................................................. 28 View Connection................................................................................................................................................................ 31 View Support Contacts....................................................................................................................................................32 View Advanced settings.................................................................................................................................................. 32

Chapter 3: Pre Installation Steps.................................................................................................34 Installing SDP 1.4 from scratch......................................................................................................................................34 Performing multiple installations of SDP 1.4...............................................................................................................34

Part I: Install SDP Edge or SDP Micro..........................................................................................36

Chapter 4: Deploy with an OVA............................................................................................... 37 Deploy the ova on vSphere....................................................................................................................................... 37 Configure an additional disk......................................................................................................................................38 Power on the VM and set up the SDP network.................................................................................................. 38 Reload Kubespray dependencies............................................................................................................................. 38 Set up MetalLB............................................................................................................................................................ 39

Contents

Contents 3

Add a license.................................................................................................................................................................39 Install SDP..................................................................................................................................................................... 39 Obtain SDP UI credentials and log in......................................................................................................................40 Provision additional disk.............................................................................................................................................40 Create an SDP Project...............................................................................................................................................40

Chapter 5: Install Ubuntu on Bare Metal.................................................................................. 41 Install Ubuntu on bare metal......................................................................................................................................41 Fix the drive used for the boot disk........................................................................................................................ 44

Chapter 6: Deploy on Linux .................................................................................................... 46 Overview....................................................................................................................................................................... 46 Assumptions and prerequisites................................................................................................................................ 46 SDP Micro plan descriptions.....................................................................................................................................47 Required customer information................................................................................................................................47 Required preinstallation steps on Ubuntu.............................................................................................................. 51 Required preinstallation steps on Red Hat Enterprise Linux ............................................................................ 51 Install Kubespray and SDP........................................................................................................................................ 52 Install the GPU operator in Kubespray environments........................................................................................ 53

GPU overview.........................................................................................................................................................53 Prerequisites to GPU operator installation......................................................................................................54 Install the GPU Operator on Red Hat Enterprise Linux 8.6........................................................................ 54 Install GPU Operator on Ubuntu 20.04............................................................................................................55 Verify GPU Operator ........................................................................................................................................... 56 Uninstall the GPU Operator................................................................................................................................ 57

Configure UI access....................................................................................................................................................57 Get SDP URL and login credentials........................................................................................................................ 58

Chapter 7: SDP Edge with Longhorn storage...........................................................................60 SDP Edge with Longhorn storage...........................................................................................................................60

Chapter 8: Manage SDP Edge and SDP Micro..........................................................................62 Add trusted CA to browser.......................................................................................................................................62 Add new user in SDP Edge....................................................................................................................................... 62 Create a project ..........................................................................................................................................................63 Set retention size on streams.................................................................................................................................. 64 Shutdown and restart the Kubespray cluster...................................................................................................... 65 Add a node.................................................................................................................................................................... 65 Remove a node............................................................................................................................................................ 66 Backup........................................................................................................................................................................... 66 Recover the control plane.........................................................................................................................................66

Part II: Install SDP Core.............................................................................................................. 68

Chapter 9: Site Prerequisites..................................................................................................69 Obtain and save the license file............................................................................................................................... 69 Configure SupportAssist............................................................................................................................................69 Set up local DNS server.............................................................................................................................................69 Provision long-term storage on PowerScale ....................................................................................................... 70

4 Contents

Provision long-term storage on ECS...................................................................................................................... 70

Chapter 10: Configuration Values............................................................................................ 72 About configuration values files...............................................................................................................................72 Prepare configuration values files........................................................................................................................... 73 Source control the configuration values files....................................................................................................... 73 Validate the configuration values............................................................................................................................ 73 Configure global platform settings .........................................................................................................................74 TLS configuration .......................................................................................................................................................75

Configure TLS version..........................................................................................................................................75 Application requirements when using strict TLSv1.3 installation............................................................... 77 Certificate authority configuration.................................................................................................................... 77

Configure connections to a local DNS .................................................................................................................. 80 Configure long-term storage on PowerScale .......................................................................................................81 Configure long-term storage on ECS ....................................................................................................................82 Configure or remove connection to SupportAssist.............................................................................................86 Configure remote support information...................................................................................................................87 Configure passwords for the default administrative accounts........................................................................ 87

Chapter 11: Install SDP Core................................................................................................... 89 Download installation files......................................................................................................................................... 89 Install required infrastructure ..................................................................................................................................89 Unzip installation files.................................................................................................................................................90 Prepare the working environment...........................................................................................................................90 Push images into the registry....................................................................................................................................91 Run the prereqs.sh script ..........................................................................................................................................91 Prepare self-signed SSL certificate .......................................................................................................................92 Run pre-install script.................................................................................................................................................. 93 Run the validate-values script..................................................................................................................................93 Install SDP..................................................................................................................................................................... 94 Run the post-install script............................................................................................................................ 95

(Optional) Validate self-signed certificates.......................................................................................................... 95 Obtain connection URLs ...........................................................................................................................................96 Install the GPU operator in OpenShift environments......................................................................................... 97

Red Hat Entitlement............................................................................................................................................. 97 GPU Operator Installation................................................................................................................................... 98 Verify GPU Operator installation....................................................................................................................... 99 Uninstall the GPU Operator............................................................................................................................... 101

Part III: Manage SDP..................................................................................................................102 Top-level navigation in the UI ......................................................................................................................................102

Chapter 12: Post-install Configuration and Maintenance........................................................104 Obtain default admin credentials........................................................................................................................... 104 Configure an LDAP identity provider ................................................................................................................... 105 Verify telemetry cron job......................................................................................................................................... 105 Update the default password for ESE remote access .................................................................................... 105 Ensure system availability when a node is down .............................................................................................. 106 Update the applied configuration ......................................................................................................................... 107

Contents 5

Graceful shutdown and startup..............................................................................................................................108 Uninstall applications................................................................................................................................................ 109 Reinstall into existing cluster................................................................................................................................... 110 Change ECS credentials after installation............................................................................................................. 111

Chapter 13: Manage Connections and Users........................................................................... 113 Obtain connection URLs ..........................................................................................................................................113 Connect and login to the web UI ...........................................................................................................................114 Log in to OpenShift for cluster-admins.................................................................................................................114 Log in to OpenShift command line for non-admin users.................................................................................. 115 Create a user .............................................................................................................................................................. 115

Add new local user on the Keycloak UI........................................................................................................... 115 Assign roles ................................................................................................................................................................. 116 User password changes............................................................................................................................................ 116

Change password in Keycloak........................................................................................................................... 116 Password policy for SDP user accounts............................................................................................................... 117

Chapter 14: Manage Projects .................................................................................................118 Naming requirements.................................................................................................................................................118 Manage projects......................................................................................................................................................... 118

About projects ...................................................................................................................................................... 118 Create a project ................................................................................................................................................... 119 Create a project manually.................................................................................................................................. 120 Delete a project.................................................................................................................................................... 122 Add or remove project members .....................................................................................................................122 List projects and view project contents......................................................................................................... 123 What's next with projects.................................................................................................................................. 124

Manage scopes and streams...................................................................................................................................124 About scopes and streams.................................................................................................................................124 Create and manage streams............................................................................................................................. 125 Stream configuration attributes....................................................................................................................... 125 Manage cross project scope sharing............................................................................................................... 127 Start and stop stream ingestion....................................................................................................................... 128 Monitor stream ingestion................................................................................................................................... 128

Chapter 15: Manage runtime images...................................................................................... 129 The RuntimeImage resource.............................................................................................................................. 129

View runtime images................................................................................................................................................. 129 Create a runtime on the SDP UI............................................................................................................................ 130 Create runtime on the command line.................................................................................................................... 131

Chapter 16: Monitor Health....................................................................................................133 Monitor licensing........................................................................................................................................................ 133 Monitor and manage issues..................................................................................................................................... 134 Monitor and manage events....................................................................................................................................134 View detailed system-wide metrics.......................................................................................................................135 Run health-check.......................................................................................................................................................135 Monitor Pravega health............................................................................................................................................136 Monitor stream health.............................................................................................................................................. 136

6 Contents

Monitor Apache Flink clusters and applications ................................................................................................ 137 Monitor Pravega Search resources and health.................................................................................................. 138 Logging......................................................................................................................................................................... 138

Chapter 17: Use Pravega Grafana Dashboards........................................................................139 Grafana dashboards overview................................................................................................................................ 139 Connect to the Pravega Grafana UI..................................................................................................................... 140 Retention policy and time range ........................................................................................................................... 140 Pravega Alerts dashboard........................................................................................................................................142 Pravega Controller Dashboard .............................................................................................................................. 143 Pravega Operation Dashboard................................................................................................................................144 Pravega Scope dashboard....................................................................................................................................... 145 Pravega Segment Store Dashboard...................................................................................................................... 146 Pravega Stream dashboard..................................................................................................................................... 147 Pravega System dashboard.....................................................................................................................................148 Custom queries and dashboards ...........................................................................................................................150 InfluxDB Data .............................................................................................................................................................150

Chapter 18: Expand and Scale the Infrastructure .................................................................. 152 Difference between expansion and scaling......................................................................................................... 152 Expansion.....................................................................................................................................................................152

Determine expansion requirements................................................................................................................. 152 Add new rack........................................................................................................................................................ 153 Add nodes to the OpenShift cluster .............................................................................................................. 153 Add supporting storage...................................................................................................................................... 153

Scaling.......................................................................................................................................................................... 153 Get scaling recommendations...........................................................................................................................153 Scale the K8s cluster.......................................................................................................................................... 154 Scale SDP ............................................................................................................................................................. 155 Scale Apache Flink resources........................................................................................................................... 156 Impact of cluster expansion and scaling ....................................................................................................... 157

Chapter 19: Troubleshooting .................................................................................................158 View versions of system components.................................................................................................................. 158 Troubleshooting tools............................................................................................................................................... 158 Access the troubleshooting tools.......................................................................................................................... 159 Kubernetes resources...............................................................................................................................................159

Namespaces.......................................................................................................................................................... 159 Components in the nautilus-system namespace..........................................................................................160 Components in the nautilus-pravega namespace......................................................................................... 161 Components in project namespaces................................................................................................................ 161 Components in cluster-monitoring namespace............................................................................................ 162 Components in the catalog namespace..........................................................................................................162

Log files........................................................................................................................................................................ 162 Useful troubleshooting commands........................................................................................................................ 163

OpenShift client commands.............................................................................................................................. 163 helm commands.................................................................................................................................................. 163

kubectl commands.......................................................................................................................................... 163

FAQs............................................................................................................................................................................. 165

Contents 7

Application connections when TLS is enabled....................................................................................................168 Online and remote support...................................................................................................................................... 169

Part IV: Reference Information.................................................................................................. 170

Chapter 20: Configuration Values File Reference....................................................................171 Template of configuration values file.....................................................................................................................171

Chapter 21: Summary of Scripts............................................................................................ 182 Summary of scripts................................................................................................................................................... 182

Chapter 22: Installer command reference ............................................................................. 184 Prerequisites............................................................................................................................................................... 184 Command summary................................................................................................................................................... 184 decks-install apply..................................................................................................................................................... 185 decks-install config set.............................................................................................................................................188 decks-install push...................................................................................................................................................... 189 decks-install sync.......................................................................................................................................................189 decks-install unapply.................................................................................................................................................190

8 Contents

Product Description

Topics:

Product summary Product components Deployment options Component deployment matrix Architecture and supporting infrastructure Product highlights More features Basic terminology Interfaces What you get with SDP Use case examples Documentation resources

Product summary Dell Technologies Streaming Data Platform (SDP) is an autoscaling software platform for ingesting, storing, and processing continuously streaming unbounded data. The platform can process both real-time and collected historical data in the same application.

SDP ingests and stores streaming data, such as Internet of Things (IoT) devices, web logs, industrial automation, financial data, live video, social media feeds, and applications. It also ingests and stores event-based streams. It can process multiple data streams from multiple sources while ensuring low latencies and high availability.

The platform manages stream ingestion and storage and hosts the analytic applications that process the streams. It dynamically distributes processing related to data throughput and analytical jobs over the available infrastructure. It also dynamically autoscales storage resources to handle requirements in real time as the streaming workload changes.

SDP supports the concept of projects and project isolation or multi-tenancy. Multiple teams of developers and analysts all use the same platform, but each team has its own working environment. The applications and streams that belong to a team are protected from write access by other users outside of the team. Cross-team stream data sharing is supported in read-only mode.

SDP integrates the following capabilities into one software platform:

Stream ingestionThe platform is an autoscaling ingesting engine. It ingests all types of streaming data, including unbounded byte streams and event-based data in real time.

Stream storageElastic tiered storage provides instant access to real-time data, access to historical data, and near-infinite storage.

Stream processingReal-time stream processing is possible with an embedded analytics engine. Your stream processing applications can perform functions, such as: Process real-time and historical data. Process a combination of real-time and historical data in the same stream. Create and store new streams. Send notifications to enterprise alerting tools. Send output to third-party visualization tools.

Platform managementIntegrated management provides data security, configuration, access control, resource management, easy upgrade process, stream metrics collection, and health and monitoring features.

Run-time managementA web-based User Interface (UI) allows authorized users to configure stream properties, view stream metrics, run applications, view job status, and monitor system health.

Application developmentThe product distribution includes APIs. The web UI supports application deployment and artifact storage.

1

Product Description 9

Product components SDP is a software-only platform consisting of integrated components, supporting APIs, and Kubernetes Custom Resource Definitions (CRDs). This product runs in a Kubernetes environment.

Pravega Pravega is the stream store in SDP. It handles ingestion and storage for continuously streaming unbounded byte streams. Pravega is an Open Source Software project, which is sponsored by Dell Technologies.

Unified Analytics SDP includes the following embedded analytic engines for processing your ingested stream data.

Apache Flink Apache Flink is an embedded stream processing engine in SDP. Dell Technologies distributes Docker images from the Apache Flink Open Source Software project.

SDP ships with images for Apache Flink. It also supports custom Flink images.

Apache Spark Apache Spark is a unified analytics engine for large-scale data processing.

SDP ships with images for Apache Spark.

Pravega Search Pravega Search provides query features on the data in Pravega streams. It supports filtering and tagging incoming data as it is ingested as well as searching stored data.

For supported analytic engine image versions for this SDP release, see Component deployment matrix on page 13.

Management platform

The management platform is Dell Technologies proprietary software. It integrates the other components and adds security, performance, configuration, and monitoring features.

User interface The management plane provides a comprehensive web-based user interface for administrators and application developers.

Metrics stacks SDP deploys InfluxDB databases and Grafana instances for metrics visualization. Separate stacks are deployed for Pravega and for each analytic project.

Pravega schema registry

The schema registry provides a serving and management layer for storing and retrieving schemas for Pravega streams.

APIs Various APIs are included in the SDP distributions. APIs for Spark, Flink, Pravega, Pravega Search, and Pravega Schema Registry are bundled in this SDP release.

About Pravega

The Open Source Pravega project was created specifically to support streaming applications that handle large amounts of continuously arriving data.

In Pravega, the stream is a core primitive. Pravega ingests unbounded streaming data in real time and coordinates permanent storage.

Pravega user applications are known as Writers and Readers. Pravega Writers are applications using the Pravega API to ingest collected streaming data from several data sources into SDP. The platform ingests and stores the streams. Pravega Readers read data from the Pravega store.

Pravega streams are based on an append-only log data structure. By using append-only logs, Pravega rapidly ingests data into durable storage. Pravega handles all types of streams, including:

Unbounded or bounded streams of data Streams of discrete events or a continuous stream of bytes Sensor data, server logs, video streams, or any other type of information

Pravega seamlessly coordinates a two-tiered storage system for each stream. Bookkeeper (called Tier 1) stores the recently ingested tail of a stream temporarily. Long-term storage (sometimes called Tier 2) occurs in a configured alternate location. You can configure streams with specific data retention periods.

An application, such as a Java program reading from an IoT sensor, writes data to the tail of the stream. Apache Flink applications can read from any point in the stream. Multiple applications can read and write the same stream in parallel. Some of the important design features in Pravega are: Elasticity, scalability, and support for large volumes of streaming data

10 Product Description

Preserved ordering and exactly-once semantics Data retention based on time or size Durability Transaction support

Applications can access data in real time or past time in a uniform fashion. The same paradigm (the same API call) accesses both real-time and historical data in Pravega. Applications can also wait for data that is associated with any arbitrary time in the future.

Specialized software connectors provide access to Pravega. For example, a Flink connector provides Pravega data to Flink jobs. Because Pravega is an Open Source project, it can potentially connect to any analytics engine with community-contributed connectors.

Pravega is unique in its ability to handle unbounded streaming bytes. It is a high-throughput, autoscaling real-time store that preserves key-based ordering of continuously streaming data and guarantees exactly-once semantics. It infinitely tiers ingested data into long-term storage.

For more information about Pravega, see http://www.pravega.io.

About analytic engines and Pravega connectors

SDP includes analytic engines and connectors that enable access to Pravega streams.

Analytic engines run applications that analyze, consolidate, or otherwise process the ingested data.

Apache Flink Apache Flink is a high throughput, stateful analytics engine with precise control of time and state. It is an emerging market leader for processing stream data. Apache Flink provides a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. It performs computations at in-memory speed and at any scale. Preserving the order of data during processing is guaranteed.

The Flink engine accommodates many types of stream processing models, including:

Continuous data pipelines for real-time analysis of unbounded streams Batch processing Publisher/subscriber pipelines

The SDP distribution includes Apache Flink APIs that can process continuous streaming data, sets of historical batch data, or combinations of both.

For more information about Apache Flink, see https://flink.apache.org/.

Apache SparkTM Apache Spark provides a dataflow engine in which the user can express the required flow using transformation and actions. Data is handled through a Resilient Distributed Dataset (RDD) that is an immutable, partitioned dataset that transformations and operations work on. Applications dataflow graphs are broken down into stages, each stage creating a new RDD but importantly the RDD is not a materialized view on disk but rather an in-memory representation of the data held within the Spark cluster that later stages can process.

The SDP distribution includes Apache Spark APIs that can process streaming data, sets of historical batch data, or combinations of both. Spark has two processing modes: batch processing and streaming micro-batch.

For more information about Apache Spark, see https://spark.apache.org.

Other analytic engines and Pravega connectors

You may develop custom Pravega connectors to enable client applications to read from and write real- time data to Pravega streams. For more information, see the SDP Code Hub.

Management plane

The SDP management plane coordinates the interoperating functions of the other components.

The management plane deploys and manages components in the Kubernetes environment. It coordinates security, authentication, and authorization. It manages Pravega streams and the analytic applications in a single platform.

Product Description 11

The web-based UI provides a common interface for all users. Developers can upload and update application images. All project members can manage streams and processing jobs. Administrators can manage resources and user access.

Some of the features of the management plane are:

Integrated data security, including TLS encryption, multilevel authentication, and role-based access control (RBAC) Project-based isolation for team members and their respective streams and applications Possibility of sharing stream data in a read-only mode across streams Flink cluster and application management Spark application management Pravega streams management Stream data schema management and evolution history with a Schema Registry DevOps oriented platform for modern software development and delivery Integrated Kubernetes container environment Application monitoring and direct access to the Apache Flink or Apache Spark web-based UIs Direct access to predefined Grafana dashboards for Pravega Direct access to project-specific predefined Grafana dashboards showing operational metrics for Flink, Spark, and Pravega

Search clusters

Deployment options Streaming Data Platform supports options for deployment at the network edge and in the data center core.

SDP Core SDP Core provides all the advantages of on-premises data collection, processing, and storage. SDP Core is intended for data center deployment with full-size servers. It handles larger data ingestion needs and also accepts data that is collected by SDP Edge and streamed up to the Core.

High availability (HA) is built into all deployments. Deployments start with a minimum of three nodes and can expand up to 12 nodes, with integrated scaling of added resources.

Recommended servers have substantially more resources than SDP Edge. Long-term storage is Dell Technologies PowerScale clusters or Dell Technologies ECS appliances. Multitenant use cases are typical. Other intended use cases build models across larger datasets and ingest large amounts of data.

SDP Edge Deploying at the Edge, near gateways or sensors, has the advantage of local ingestion, transformation, and alerting. Data is processed, filtered, or enriched before transmission upstream to the Core.

SDP Edge is a small footprint deployment, requiring fewer minimum resources (CPU cores, RAM, and storage) than SDP Core. SDP Edge supports configurations of one or three nodes.

Single-node deployment is a low-cost option for development, proof of concept, or for production use cases where high availability (HA) is not required. Single-node deployment operates without any external long-term storage, using only node disks for storage.

Three node deployments provide HA at the Edge for local data ingestion that cannot tolerate downtime. This deployment can use node disks or PowerScale for long-term storage.

SDP Edge requires a license.

SDP Edge Container Native Storage (CNS)

NOTE: CNS is not a general supported feature in SDP 1.4 and is only RPQ-based.

SDP 1.4 supports Longhorn as the CNS solution.

CNS is a software defined data storage solution that runs in containers on Kubernetes environments. Everything that is needed within an enterprise storage environment is isolated in the container without dependencies.

In a CNS environment, underlying hardware, like HDDs and NVMe SSDs, are virtualized in a pool, which provides persistent storage for applications running inside the containers.

Benefits of CNS:

Portability: When running in containers, storage units can be transported between data center environments.

Isolation: Because containers do not have any dependencies, they run applications with everything the workload already needs. This includes storage volumes.

12 Product Description

Container native storage enables stateful workloads to run within containers by providing persistent volumes.

Combined with Kubernetes primitives such as StatefulSets, it delivers the reliability and stability to run mission-critical workloads in production environments.

SDP Edge Starter Pack

SDP Edge Starter Pack is a new low-cost deployment option that is based on the SDP-Edge deployment option. The option is limited to a maximum of one MB per second of throughput.

The minimum starter pack license option available is one core. The one core SDP Edge Starter Pack license uses as many cores that are assigned to K8s, but the throughput is limited to one MB per second.

This deployment option is designed for:

Users getting started with SDP. Users who want to deploy large numbers of inexpensive HA deployment (hence allowing as many

cores), but limited in capabilities (hence the limit on throughput) to reduce usable functionality (for example, without any real time streaming video analysis).

If these systems are required to be expanded later, you can upgrade the license to the full version of SDP-Edge.

SDP Micro SDP Micro is a lightweight version of SDP Edge. It is intended for low-volume data ingestion and movement, and it offers limited analytics capabilities. Here is a summary of SDP Micro deployment and functionality:

SDP Micro is supported as a single-node deployment only, with a few number of cores. Long-term storage (LTS) is on local storage only. Kubespray is the only supported Kubernetes environment. For data handling and analytics, SDP Micro supports only Pravega and Flink. SDP Micro can perform

basic data transformations with Flink.

SDP Micro can act as a low-cost introduction into the SDP environment. It has a smaller download package than SDP Edge. You can upgrade from SDP Micro to SDP Edge.

SDP Micro requires a license.

Component deployment matrix This matrix shows the differences between SDP Edge and SDP Core deployments.

Table 1. Component deployment matrix for SDP Core and SDP Edge

Component SDP Core 1.4 SDP Edge 1.4 SDP Micro 1.4

Apache Flink Ships with versions 1.12.7 and 1.13.6

Ships with versions 1.12.7 and 1.13.6

Apache Spark Ships with versions 2.4.8 & 3.2.1

Ships with versions 2.4.8 & 3.2.1

Kubernetes Platform OpenShift 4.10.9 KubeSpray 2.18.1 Kubespray

Number of nodes supported

3 to 12 1 to 3 1-node bare metal or VM

Number of processor cores supported

SDP Edge: unlimited

SDP Edge Starter Pack: 1

SDP Edge Container Native Storage (CNS): 3

See SDP Micro sizes below

Maximum throughput Unlimited SDP Edge: unlimited

SDP Edge Starter Pack: 1 MB per second of throughput

Unlimited

Container Runtime Crio version 1.19.0 Docker version 19.03

Product Description 13

Table 1. Component deployment matrix for SDP Core and SDP Edge (continued)

Component SDP Core 1.4 SDP Edge 1.4 SDP Micro 1.4

Operating System RHEL 8.6, CORE OS 4.10.3 Ubuntu 20.04 Ubuntu 18.04.x, RHEL 8.3 or higher

Long-term storage option:

Dell Technologies PowerScale

Gen5 or later hardware

OneFS 8.2.x or 9.x software with NFSv4.0 enabled

Gen5 or later hardware

OneFS 8.2.x or 9.x software with NFSv4.0 enabled

Single-node deployment supports local long-term storage.

See SDP Micro sizes below

Long-term storage option:

Dell Technologies ECS

ECS object storage appliance with ECS 3.5.1.4 and later, ECS 3.6.1.1 and later, and 3.7.0.9 and later

Not supported See SDP Micro sizes below

Embedded Service Enabler (ESE)

ESE 2.4.10.0

Table 2. SDP Micro sizes:

Product Size Memory Long-term Storage(local disk or vSAN)

Total virtual cores

vCPUs reserved for analytics

SDP Micro Small 48 GB 1x 1-TB 6 1

Medium 64 GB 2x 1-TB 12 2

Large 128 GB 2x 1-TB 24 4

X-Large 192 GB 2x 1-TB 36 12

Architecture and supporting infrastructure The supporting infrastructure for SDP supplies storage and compute resources and the network infrastructure.

SDP is a software-only solution. The customer obtains the components for the supporting infrastructure independently.

For each SDP version, the reference architecture includes specific products that are tested and verified. The reference architecture is an end-to-end enterprise solution for stream processing use cases. Your Dell Technologies sales representative can provide appropriate reference architecture solutions for your expected use cases.

A general description of the supporting infrastructure components follows.

Reference Hardware

SDP runs on bare metal servers using custom operating system software provided in the SDP distribution. SDP Edge runs on Ubuntu. SDP Core runs on Red Hat Enterprise Linux Core OS.

Network A network is required for communication between the nodes in the SDP cluster and for the external clients to access cluster applications.

Local storage Local storage is required for various system functions. The Dell Technologies support team helps size storage needs based on intended use cases.

Long-term storage

Long-term storage for stream data is required and is configured during installation. Long-term storage is any of the following: SDP Core production solutions require an elastic scale-out storage solution. You may use either of the

following for long-term storage: A file system on a Dell Technologies PowerScale cluster A bucket on the Dell Technologies ECS appliance

SDP Edge production solutions uses a file system on a Dell Technologies PowerScale cluster. For testing, development, or use cases where only temporary storage is needed, long-term storage

may be defined as a file system on a local mount point.

14 Product Description

Kubernetes container environment included with SDP

SDP runs in a Kubernetes container environment. The container environment isolates projects, efficiently manages resources, and provides authentication and RBAC services. The required Kubernetes environments are provided with SDP distributions and are installed and configured as part of SDP installation. They are: SDP Edge runs in Kubespray. SDP Core runs in Red Hat OpenShift.

The following figures show the supporting infrastructure in context with SDP.

Figure 1. SDP Core architecture

Figure 2. SDP Edge architecture

Product Description 15

Product highlights SDP includes the following major innovations and unique capabilities.

Enterprise-ready deployment

SDP is a cost effective, enterprise-ready product. This software platform, running on a recommended reference architecture, is a total solution for processing and storing streaming data. With SDP, an enterprise can avoid the complexities of researching, testing, and creating an appropriate infrastructure for processing and storing streaming data. The reference architecture consists of both hardware and software. The resulting infrastructure is scalable, secure, manageable, and verified. Dell Technologies defines the infrastructure and provides guidance in setting it up. In this way, SDP dramatically reduces time to value for an enterprise.

SDP provides integrated support for a robust and secure total solution, including fault tolerance, easy scalability, and replication for data availability.

With SDP, Dell Technologies provides the following deployment support:

Recommendations for the underlying hardware infrastructure Sizing guidance for compute and store to handle your intended use cases End-to-end guidance for setting up the reference infrastructure, including switching and network

configuration (trunks, VLANs, management and data IP routes, and load balancers) Comprehensive image distribution, consisting of customized images for the operating system,

supporting software, SDP software, and API distributions for developers Integrated installation and configuration for underlying software components (Docker, Helm,

Kubernetes) to ensure alignment with SDP requirements

The result is an ecosystem ready to ingest and store streams, and ready for your developers to code and upload applications that process those streams.

Unbounded byte stream ingestion, storage, and analytics

Pravega was designed from the outset to handle unbounded byte stream data.

In Pravega, the unbounded byte stream is a primitive structure. Pravega stores each stream (any type of incoming data) as a single persistent stream, from ingestion to long-term storage, like this:

Recent tailThe real-time tail of a stream exists on Tier 1 storage. Long-termThe entire stream is stored on long-term storage (also called Tier 2 storage in Pravega).

Applications use the same API call to access real-time data (the recent tail on Tier 1 storage) and all historical data on long-term storage.

In Apache Flink or Spark applications, the basic building blocks are streams and transformations. Conceptually, a stream is a potentially never-ending flow of data records. A transformation is an operation that takes one or more streams as input and produces one or more output streams. In both applications, non-streaming data is treated internally as a stream.

By integrating these products, SDP creates a solution that is optimized for processing unbounded streaming bytes. The solution is similarly optimized for bounded streams and more traditional static data.

High throughput stream ingestion

Pravega enables the ingestion capacity of a stream to grow and shrink according to workload. During ingestion Pravega splits a stream into partitions to handle a heavy traffic period, and then merges partitions when traffic is less. Splitting and merging occurs automatically and continuously as needed. Throughout, Pravega preserves order of data.

Stream filtering on ingestion

PSearch continuous queries process data as it is ingested, providing a way to filter out unwanted data before it is stored, or to enrich the data with tagging before it is stored.

Stream search PSearch queries can search an entire stored stream of structured or unstructured data.

Exactly-once semantics

Pravega is designed with exactly-once semantics as a goal. Exactly-once semantics means that, in a given stream processing application, no event is skipped or duplicated during the computations.

Key-based guaranteed order

Pravega guarantees key-based ordering. Information in a stream is keyed in a general way (for example, by sensor or other application-provided key). SDP guarantees that values for the same key are stored and processed in order. The platform, however, is free to scale the storage and processing across keys without concern for ordering.

The ordering guarantee supports use cases that require order for accurate results, such as in financial transactions.

16 Product Description

Massive data volume

Pravega accommodates massive data ingestion. In the reference architecture, Dell Technologies hardware solutions support the data processing and data storage components of the platform. All the processing and storage reference hardware are easily scaled out by adding additional nodes.

Batch and publish/ subscribe models supported

Pravega, Apache Spark, and Apache Flink support the more traditional batch and publish/subscribe pipeline models. Processing for these models includes all the advantages and guarantees that are described for the continuous stream models.

Pravega ingests and stores any type of stream, including:

Unbounded byte streams, such as data streamed from IoT devices Bounded streams, such as movies and videos Unbounded append-type log files Event-based input, streaming or batched

In Apache Flink and Apache Spark, all input is a stream. Both process table-based input and batch input as a type of stream.

ACID-compliant transaction support

The Pravega Writer API supports Pravega transactions. The Writer can collect events, persist them, and decide later whether to commit them as a unit to a stream. When the transaction is committed, all data that was written to the transaction is atomically appended to the stream.

The Writer might be an Apache Flink or other application. As an example, an application might continuously process data and produce results, using a Pravega transaction to durably accumulate the results. At the end of a time window, the application might commit the transaction into the stream, making the results of the processing available for downstream processing. If an error occurs, the application cancels the transaction and the accumulated processing results disappear.

Developers can combine transactions and other features of Pravega to create a chain of Flink jobs. The Pravega-based sink for one job is the source for a downstream Flink job. In this way, an entire pipeline of Flink jobs can have end-to-end exactly once, guaranteed ordering of data processing.

In addition, applications can coordinate transactions across multiple streams. A Flink job can use two or more sinks to provide source input to downstream Flink jobs.

Pravega achieves ACID compliance as follows:

Atomicity and Consistency are achieved in the basic implementation. A transaction is a set of events that is collectively either added into a stream (committed) or discarded (aborted) as a batch.

Isolation is achieved because the transactional events are never visible to any readers until the transaction is committed into a stream.

Durability is achieved when an event is written into the transaction and acknowledged back to the writer. Transactions are implemented in the same way as stream segments. Data that is written to a transaction is as durable as data written directly to a stream.

Security Access to SDP and the data it processes is strictly controlled and integrated throughout all components.

Authentication is provided through both Keycloak and LDAP. Kubernetes and Keycloak role-based access control (RBAC) protect resources throughout the

platform. TLS controls external access. Within the platform, the concept of a project defines and isolates resources for a specific analytic

purpose. Project membership controls access to those resources.

For information about these and other security features, see the Dell Technologies Streaming Data Platform Security Configuration Guide .

More features Here are additional important capabilities in SDP.

Fault tolerance The platform is fault tolerant in the following ways:

All components use persistent volumes to store data. Kubernetes abstractions organize containers in a fault-tolerant way. Failed pods restart automatically,

and deleted pods are created automatically.

Product Description 17

Certain key components, such as Keycloak, are deployed in "HA" mode by default. In the Keycloak case, three Keycloak pods are deployed, clustered together, to provide near-uninterrupted access even if a pod goes down.

Data retention and data purge

Pravega includes the following ways to purge data, per stream:

A manual trigger in an API call specifies a point in a stream beyond which data is purged. An automatic purge may be based on size of stream. An automatic purge may be based on time.

Historical data processing

Historical stream processing supports:

Stream cuts Set a reading start point.

Apache Flink job management

Authorized users can monitor, start, stop, and restart Apache Flink jobs from the SDP UI. The Apache Flink savepoint feature permits a restarted job to continue processing a stream from where it left off, guaranteeing exactly-once semantics.

Apache Spark job management

Authorized users can monitor, start, stop, and restart Apache Spark jobs from the SDP UI.

Monitoring and reporting

From the SDP UI, administrators can monitor the state of all projects and streams. Other users (project members) can monitor their specific projects.

Dashboard views on SDP UI show recent Pravega ingestion metrics, read and write metrics on streams, and long-term storage metrics.

Heat maps of Pravega streams show segments as they are split and merged, to help with resource allocation decisions.

Stream metrics show throughput, reads and writes per stream, and transactional metrics such as commits and aborts.

Latencies at the segment store host level are available, aggregated over all segment stores.

The following additional UIs are linked from the SDP UI.

Project members can browse directly to the Flink Web UI that shows information about their jobs. The Apache Flink Web UI monitors Flink jobs as they are running.

Project members can browse directly to the Spark Web UI that shows information about their jobs. The Apache Spark Web UI monitors Spark jobs as they are running.

In SDP, there are pe-defined Grafana dashboards for pravega. Administrators can view Pravega JVM statistics, and examine stream throughputs and latency metrics. To view Pravega Grafana UI, go to SDP UI > Pravega Metrics.

Administrators can browse Grafana dashboards in the Monitoring Grafana UI to see the Kubernetes cluster metrics for SDP components. To view the UI, go to Monitoring Metrics link from the SDP UI.

Project members can browse directly to the OpenSearch UI from a Pravega Search cluster page.

Logging Kubernetes logging is implemented in all SDP components.

Remote support SupportAssist is supported for SDP.

Event reporting Services in SDP collect events and display them in the SDP UI. The UI offers search and filtering on the events, including a way to mark them as acknowledged. In addition, some critical events are forwarded to the SRS or SCG Gateway.

There are Issues along with the Events.

An Issue can be in multiple states. The state of the Issue would be determined by the type of the last Event corresponding to the Issue.

Some Issues can be auto-cleared after a pre-determined time interval (no manual acknowledgment required).

Important Events and Issues are reported to the notifiers set up during installation (could be streamingdata-snmp-notifier, streamingdata-supportassist-ese, nautilus- notifier)

SDP Code Hub The SDP Code Hub is a centralized portal to help application developers getting started with SDP applications. Developers can browse and download example applications and code templates, download

18 Product Description

Pravega connectors, and view demos. Applications and templates from Dell Technologies teams include Pravega samples, Flink samples, Spark samples, and API templates. See the Code Hub here.

Schema Registry Schema Registry provides a serving and management layer for storing and retrieving schemas for application metadata. A shared repository of schemas allows applications to flexibly interact with each other and store schemas for Pravega streams.

GStreamer support

GStreamer is a pipeline-based multimedia framework that links together video processing elements. The GStreamer Plugin for Pravega is open-source software and is used to capture video, perform video compression, read and write Pravega stream data, and enables NVIDIA DeepStream interference for object detection.

GPU support for the GStreamer framework

A GPU (Graphics Processing Unit) is a specialized processor with dedicated memory that conventionally performs floating point operations required for rendering graphics. SDP takes advantage of GPUs for image and video processing, stream processing, and machine learning. The GPU-accelerated workload adds support for both Apache Flink and Apache Spark applications, which enables data scientists to use machine learning on analytic workloads. During installation, the NVIDIA GPU operator updates the SDP node base operating system and the Kubernetes environment with the appropriate drivers and configurations for GPU access. The GPU Operator deploys Node Feature Discovery (NDF) to identify nodes that contain GPUs and installs the GPU driver on GPU-enabled nodes.

MQTT support MQ Telemetry Transport (MQTT) is a lightweight publish/subscribe messaging transport protocol that is used for connecting remote IoT devices and optimize network bandwidth. MQTT makes it easy to encrypt messages using Transport Layer Security (TLS) and authenticate clients using protocols, such as OAuth. Pravega MQTT introduced in SDP (1.3 and later versions) implements an MQTT broker that can be used by MQTT clients to publish events to Pravega streams. MQTT subscribers are not supported in SDP 1.3. MQTT is intended for ingesting events into SDP with high-throughput using Quality of Service (QoS) 0 (at-most-once) semantics and supports only TLS connections.

TLS 1.3 support In SDP (1.3 and later versions), you can specify which Transport Layer Security (TLS) protocol versions to enable when you install SDP to use SSL/TLS for communication over public endpoints..

Basic terminology The following terms are basic to understanding the workflows supported by SDP.

Pravega scope The Pravega concept for a collection of stream names. RBAC for Pravega operates at the scope level.

Pravega stream A durable, elastic, append-only, unbounded sequence of bytes that has good performance and strong consistency. A stream is uniquely identified by the combination of its name and scope. Stream names are unique within their scope.

Pravega event A collection of bytes within a stream. An event has identifying properties, including a routing key, so it can be referenced in applications.

Pravega writer A software application that writes data to a Pravega stream.

Pravega reader A software application that reads data from a Pravega stream. Reader groups support distributed processing.

Flink application An analytic application that uses the Apache Flink API to process one or more streams. Flink applications may also be Pravega Readers and Writers, using the Pravega APIs for reading from and writing to streams.

Flink job Represents an executing Flink application. A job consists of many executing tasks.

Flink task A Flink task is the basic unit of execution. Each task is executed by one thread.

Spark application An analytic application that uses the Apache Spark API to process one or more streams.

Spark job Represents an executing Spark application. A job consists of many executing tasks.

Spark task A Spark task is the basic unit of execution. Each task is executed by one thread.

RDD Resilient Distributed Dataset. The basic abstraction in Spark that represents an immutable, partitioned collection of elements that can be operated on in parallel.

Project An SDP concept. A project defines and isolates resources for a specific analytic purpose, enabling multiple teams of people to work within SDP in separate project environments.

Product Description 19

Project member An SDP user with permission to access the resources in a specific project.

Kubernetes environment

The underlying container environment in which all SDP services run. The Kubernetes environment is abstracted from end-user view. Administrators can access the Kubernetes layer for authentication and authorization settings, to research performance, and to troubleshoot application execution.

Schema registry A registry service that manages schemas & codecs. It also stores schema evolution history. Each stream is mapped to a schema group. A schema group consists of schemas & codecs that are associated with applications.

Pravega Search cluster

Resources that process Pravega Search indexing, searches, and continuous queries.

Interfaces SDP includes the following interfaces for developers, administrators, and data analysts.

Table 3. Interfaces in SDP

Interface Purpose

SDP User Interface Configure and manage streams and analytic jobs. Upload analytic applications.

Pravega Grafana custom dashboards Drill into metrics for Pravega.

Apache Flink Web User Interface Drill into Flink job status.

Apache Spark Web User Interface Drill into Spark job status.

Keycloak User Interface Configure security features.

Pravega and Apache Flink APIs Application development.

Project-specific Grafana custom dashboards

Drill into metrics for Flink, Spark, and Pravega Search clusters.

Project-specific OpenSearch Web User Interface

Submit Pravega Search queries.

Monitoring Grafana dashboards Administrators can browse Grafana dashboards in the Monitoring Grafana UI to see the Kubernetes cluster metrics for SDP components.

JupyterHub JupyterHub provides an interactive web-based environment for data scientists and data engineers to write and immediately run Python code.

In addition, users may download the Kubernetes CLI (kubectl) for research and troubleshooting for the SDP cluster and its resources. This includes support for the SDP custom resources, such as projects.

User Interface (UI)

The Dell Technologies Streaming Data Platform provides the same user Interface for all personas interacting with the platform.

The views and actions available to a user depend on that user's RBAC role. For example: Logins with admin role see data for all existing streams and projects. In addition, the UI contains buttons that let them

create projects, add users to projects, and other management tasks. Those options are not visible to other users. Logins with specific project roles can see their projects and the streams, applications, and other resources that are

associated with their projects.

Here is a view of the initial UI window that administrators see when they first log in. Administrators see all metrics for all the streams in the platform.

20 Product Description

Figure 3. Initial administrator UI after login

Project members (non-admin users) do not see the dashboard. They only see the Analytics and the Pravega tabs for the streams in their projects.

Grafana dashboards

SDP includes the collection, storage, and visualization of detailed metrics.

SDP deploys one or more instances of metrics stacks. One instance is for gathering and visualizing Pravega metrics. There is a separate metrics stack (Prometheus + Grafana) for kubernetes cluster monitoring. Additional project-specific metrics stacks are optionally deployed.

A metrics stack consists of an InfluxDB database and Grafana.

InfluxDB is an open-source database for storing time series data. Grafana is an open-source metrics visualization tool. Grafana deployments in SDP include predefined dashboards that

visualize the collected metrics in InfluxDB.

Developers can create their own custom Grafana dashboards as well, accessing any of the data stored in InfluxDB.

Pravega metrics

In SDP, InfluxDB stores metrics that are reported by Pravega. The Dashboard page on the SDP UI shows some of these metrics. More details are available on Pravega Grafana dashboards. Links to Pravega Metrics and to Monitoring Metrics grafana dashboards are available to SDP administrators from any page of SDP UI. Administrators can use these dashboards to drill into problems or identify developing memory problems, stream-related inefficiencies, or problems with storage interactions.

The SDP UI Dashboards page contains a link to the Pravega Grafana instance. The Dashboards page and the Pravega Grafana instance are available only to administrators.

The navigation bar contains the link to Pravega Grafana Instance. It is available to admin users from any SDP UI page.

Project metrics

Metrics is a Project Feature. It can be added to (or removed from) the Project at any time (either on Project creation or at a later point).

It the Metrics feature gets removed from the project, all the previous data stored in its InfluxDB databases would be lost.

Product Description 21

If the project is created through the UI, the metrics feature is selected by default and a user has to remove that feature if it is not required.

The Metrics feature is available to both project members and administrators.

Application specific analytics

For projects that have metrics enabled, developers can add new metrics collections into their applications, and push the metrics to the project-specific InfluxDB instance. Any metric in InfluxDB is available for use on customized Grafana dashboards.

Apache Flink Web UI

The Apache Flink Web UI shows details about the status of Flink jobs and tasks. This UI helps developers and administrators to verify Flink application health and troubleshoot running applications.

To view the Apache Flink Web UI from the SDP UI:

Analytics > Project > Flink Clusters > Overview > Flink cluster name.

Figure 4. Apache Flink Web UI

Apache Spark Web UI

The Apache Spark Web UI shows details about the status of Spark jobs and tasks. This UI helps developers and administrators to verify Spark application health and troubleshoot running applications.

The SDP UI contains direct links to the Apache Spark Web UI. From the Analytics Project page, go to a project, click Spark Apps (on Overview tab). The name is a link to the Spark Web UI which opens in a new browser tab. It displays the Overview screen for the Spark application you selected. From here, you can drill into status for all jobs and tasks.

22 Product Description

Figure 5. Apache Spark Web UI

Product Description 23

JupyterHub overview

JupyterHub provides an interactive web-based environment for data scientists and data engineers to write and immediately run Python code.

Jupyter Notebook

Figure 6. Jupyter Notebook

Jupyter Notebook is a single-user web-based interactive development environment for code, and data. It supports a wide range of workflows in data science, scientific computing, and machine learning.

JupyterHub

JupyterHub is a multiuser version of Jupyter Notebook that is designed for companies, classrooms, and research labs. JupyterHub spawns, manages, and proxies multiple instances of the single-user Jupyter Notebook server.

Jupyter Notebook Single User Pod

The Jupyter Notebook Single User Pod runs a Jupyter Notebook server for a single user. All Python codes that are run on notebooks of a user are run in this pod.

24 Product Description

JupyterHub on SDP

JupyterHub can be enabled for an Analytics Project in SDP with a few clicks. It uses the single-sign-on functionality of SDP.

Each Analytics Project has its own deployment of JupyterHub. Each user can have up to one Jupyter pod per Analytics Project.

APIs

The following developer resources are included in an SDP distribution.

SDP includes these application programming interfaces (APIs):

Pravega APIs, required to create the following Pravega applications: Writer applications, which write stream data into the Pravega store. Reader applications, which read stream data from the Pravega store.

Apache Flink APIs, used to create applications that process stream data. Apache Spark APIs, used to create applications that process stream data. PSearch APIs, used to register continuous queries or process searches against the stream. Schema Registry APIs, used to retrieve and perform schema registry operations.

Stream processing applications typically use these APIs to read data from Pravega, process or analyze the data, and perhaps even create new streams that require writing into Pravega.

What you get with SDP The SDP distribution includes the following software, integrated into a single platform.

Kubernetes environments Dell Technologies Streaming Data Platform management plane software Keycloak software and an integrated security model Pravega data store and API Schema registry for managing schemas and codecs Pravega Search (PSearch) framework, query processors, and APIs Apache Flink framework, processing engine, and APIs Apache Spark framework, processing engine, and APIs InfluxDB for storing metrics Grafana UI for presenting metrics SDP installer, scripts and other tools JupyterHub, an interactive web-based environment for data scientists and data engineers to write and immediately run

Python code.

Use case examples Following are some examples of streaming data use cases that Dell Technologies Streaming Data Platform is especially designed to process.

Industrial IoT Detect anomalies and generate alerts. Collect operational data, analyze the data, and present results to real-time dashboards and trend

analysis reporting. Monitor infrastructure sensors for abnormal readings that can indicate faults, such as vibrations or

high temperatures, and recommend proactive maintenance. Collect real-time conditions for later analysis. For example, determine optimal wind turbine placement

by collecting weather data from multiple test sites and analyzing comparisons.

Streaming Video Store and analyze streaming video from drones in real time. Conduct security surveillance. Serve on-demand video.

Product Description 25

Automotive Process data from automotive sensors to support predictive maintenance. Detect and report on hazardous driving conditions that are based on location and weather. Provide logistics and routing services.

Financial Monitor for suspicious sequences of transactions and issue alerts. Monitor transactions for legal compliance in real-time data pipelines. Ingest transaction logs from market exchanges and analyze for real-time market trends.

Healthcare Ingest and save data from health monitors and sensors. Feed dashboards and trigger alerts for patient anomalies.

High-speed events

Collect and analyze IoT sensor messages. Collect and analyze Web events. Collect and analyze logfile event messages.

Batch applications

Batch applications that collect and analyze data are supported.

Documentation resources Use these resources for additional information.

Table 4. SDP documentation set

Subject Reference

Dell Technologies Streaming Data Platform documentation

Dell Technologies Streaming Data Platform Documentation InfoHub: https://www.dell.com/support/article/us/en/19/sln319974/ dell-emc-streaming-data-platform-infohub

Dell Technologies Streaming Data Platform support site: https://www.dell.com/support/home/us/en/04/ product-support/product/streaming-data-platform/overview

Dell Technologies Streaming Data Platform Developer's Guide Dell Technologies Streaming Data Platform Installation and

Administration Guide Dell Technologies Streaming Data Platform Security Configuration

Guide Dell Technologies Streaming Data Platform Release Notes

NOTE: You must log onto a Dell support account to access release notes.

SDP Code Hub Community-supported public Github portal for developers and integrators. The SDP Code Hub includes Pravega connectors, demos, sample applications, API templates, and Pravega and Flink examples from the open-source SDP developer community: https:// streamingdataplatform.github.io/code-hub/

Pravega concepts, architecture, use cases, and Pravega API documentation

Pravega open-source project documentation:

http://www.pravega.io

Apache Flink concepts, tutorials, guidelines, and Apache Flink API documentation

Apache Flink open-source project documentation:

https://flink.apache.org/

Apache Spark concepts, tutorials, guidelines, and Apache Spark API documentation

Apache Spark open-source project documentation:

https://spark.apache.org

https://github.com/StreamingDataPlatform/workshop-samples/tree/ master/spark-examples

26 Product Description

SDP ESE Integration

Topics:

Overview Configure SupportAssist SupportAssist port requirements Connect To SupportAssist View Connection View Support Contacts View Advanced settings

Overview Embedded Service Enabler (ESE) is successor of Secure Remote Services. ESE provides connectivity platform as a unified communication point between the products and Dell Technologies. A product can interact with ESE to send telemetry and send events, among other capabilities.

To send telemetry and events, enable ESE notifier by adding the below configuration in values file:

global: kahmNotifiers: [ streamingdata-supportassist-ese ]

To install ESE:

SDP uses supportassist helm chart to install SupportAssist.

SDP includes the supportassist chart as a part of its local helm repository component.

NOTE: During ESE installation, the installer creates a ConfigMap with data required to configure the HelmRelease.

Configure SupportAssist You must obtain an access key and pin from Dell EMC in order to configure SupportAssist for the first time. This access key and pin will ensure the accuracy of contact and other customer values and access to Dell Support.

Prerequisites

1. For SupportAssist connectivity you can connect directly with access to FQDN: esrs3-core.emc.com:443.

2. You are logged into the SDP Portal UI. 3. You have applied a valid license. 4. You are an active Dell EMC customer with login access to https://www.dell.com/support/home/. 5. To obtain an access key and pin, go to https://www.dell.com/support/home/en-us/product-support/product/streaming-

data-platform/overview, and click Generate Access Key. After completing the required form, Dell EMC will send an email to the email address they have set up in for the Dell portal login. The email will be from the Dell | ServicesConnectivity Team and contains the site ID, access key, and pin for the selected customer.

NOTE: The generated access key is valid for seven days.

6. See SupportAssist port requirements listed in the Streaming Data Platform 1.4 Installation and Administration Guide and validate that required ports are configured properly prior to configuring SupportAssist.

7. A Dell EMC gateway server must already configured on site if you are planning on connecting via a Gateway Server (ESE, or Secure Connect Gateway (SCG)).

2

SDP ESE Integration 27

SupportAssist port requirements Dedicated SupportAssist ports required for SDP SupportAssist and other network traffic.

Table 5. SupportAssist port requirements

Port Protocol Direction Description

22 TCP Inbound from SRS or SCG Gateway to SDP SSH Secure Copy (SCP) Secure File Transfer

Protocol (SFTP)

9443 TCP Outbound from SDP to SRS or SCG Gateway SRS or SCG V3 Gateway or later

443 TCP Outbound from SDP to Direct Connect SRS Direct Connect

8443 TCP Outbound from SDP to Direct Connect SRS Direct Connect

8443 TCP Inbound from SRS or SCG Gateway to SDP SRS or SCG V3.s Gateway or later

SCG V5.0 Gateway or later

Connect To SupportAssist SDP supports ESE configuration during installation, and configuration during the recommended Day 2 (post installation) using the SDP UI.

About this task

For ESE configuration during the recommended Day 2 (post installation) using the SDP UI, follow the steps that are mentioned below:

Steps

1. Log in to the SDP UI and click Settings > SupportAssist tab.

The SupportAssist screen opens.

28 SDP ESE Integration

Figure 7. SupportAssist

2. In Connect to Dell Services, select the Connection Type. Connect Directly to Dell Services

Direct connection, no gateway required Cannot enable Remote Support for Direct Connection.

Connect via Secure Gateway Server Gateway allows for Remote Support enablement You can add up to eight gateways for failover.

Gateway Inputs: Host Name or IP: A valid hostname or IP Port: Number between 1-65535

Default gateway port: 9443 Priority: Number between 1 and 99. Lower the number is the priority Primary Gateway, then Secondary Gateway, and so

on.

SDP ESE Integration 29

Figure 8. Connect via Secure Gateway Server

3. In Enter Access Key & PIN, enter the Site ID, Access Key, and PIN.

Figure 9. Enter Access Key & PIN

Site ID: Specific to the customer (number) Access Key: Generated hex based on Site ID (Eight hex) PIN: Four digits chosen by customer (Four digits)

4. In Enter Support Contacts (optional), add Primary and Secondary contact.

This step is optional.

30 SDP ESE Integration

Figure 10. Enter Support Contacts (optional)

5. Click Save.

View Connection After you connect to SupportAssist, you can view the SupportAssist K8 resource details from the SDP UI.

Steps

To view Connection, click Settings > SupportAssist.

Figure 11. View connection

Remote Support: Enable or Disable. NOTE: This is disabled by default when Connect Directly to Dell Services is configured, and it can not be modified.

Edit: Opens the Configure Dell Services screens to modify the Connection type, Access key info, and Support contacts. Test Connectivity: Updates the Status Message and Last Connected timestamp. Disable: Disables the connection Delete: Deletes the entire configuration

SDP ESE Integration 31

Figure 12. Configure Dell Services

View Support Contacts

Steps

1. To view Support Contacts, click Settings > SupportAssist > Support Contacts.

Figure 13. View Support Contacts

2. To edit support contacts, click Edit Support Contacts.

View Advanced settings The Advanced UI screen allows the administrator to control the advanced SupportAssist settings and update the access key and pin by linking to the UI to generate the new key.

Steps

To view Advanced Settings , click Settings > SupportAssist > Advanced.

32 SDP ESE Integration

Figure 14. Advanced settings

System Mode- PreProduction, Maintenance, Normal Automatic Support Request- Disabled, Enabled Site ID- specific to the customer Access Key- generated based on Site ID PIN- Four digits chosen by customer

NOTE:

With a valid license the installation default for System Mode is PreProduction and Automatic Support Request is

Disabled. When the SDP cluster is put into production these two setting should be modified to Normal and Enabled.

System Mode and Automatic Support Request can only be modified with a valid license.

Re-Authentication with Dell Connectivity services requires a new access key and pin are created based on iSWID serial

number.

SDP ESE Integration 33

Pre Installation Steps

Topics:

Installing SDP 1.4 from scratch Performing multiple installations of SDP 1.4

Installing SDP 1.4 from scratch When installing SDP from scratch, you must run ./scripts/pre-install.sh from the root of the distribution.

This script generates secure passwords for internal components of the system. As such it generates a value file called gen- values-1.4.yaml.

When installing SDP, you must specify this file at the end of the list of value files. For example:

./decks-install-linux-amd64 apply -k ./manifests/ --repo ./build/charts/ \ -f ./environment1-values.yaml,./scripts/pre-install/gen-values-1.4.yaml

NOTE: Upgrades to SDP 1.4 are not supported.

Performing multiple installations of SDP 1.4 It may be required to run ./decks-install-linux-amd64 against the same SDP 1.4 installation. For example to update certain settings in your values file. It is referred to as Configuration Update or Updating SDP 1.4 to 1.4 itself.

Pass the gen-values-1.4.yaml file used for the initial install, and the end of your list of files.

./decks-install-linux-amd64 apply -k ./manifests/ --repo ./build/charts/ \ -f ./environment1-values.yaml,./scripts/pre-install/gen-values-1.4.yaml

NOTE: You can change the values in environment1-values.yaml, but you must supply gen-values-1.4.yaml or

a secret reference to the generated values at the end.

Alternative to passing ./scripts/pre-install/gen-values-1.4.yaml, you can choose to pass a reference to a secret containing the generated values. This is useful when you lose the file or you do not wish to keep a local file as it stores some sensitive information. The value secrets will be created when running the pre-install script (./scripts/pre-install.sh), which will be shown the end of the output. For example:

secret/gen-values-2022-05-16.20-37-37 created

The secret can then be given the installer via the --values-from-secrets flag:

./decks-install-linux-amd64 apply -k ./manifests/ --repo ./build/charts/ \ -f ./environment1-values.yaml \ --values-from-secrets=nautilus-private/gen-values-

You can also reuse any value secrets during an upgrade the same way. To view the saved values:

> kubectl get secrets -n nautilus-private NAME TYPE DATA AGE decks-installer-values-2022-05-13.17-20-15 Opaque 1 3d3h decks-installer-values-2022-05-13.19-15-48 Opaque 1 3d1h gen-values-2022-04-27.19-45-17 Opaque 1 19d

3

34 Pre Installation Steps

gen-values-2022-04-27.19-53-02 Opaque 1 19d ...

NOTE:

gen-values- secrets store the generated values from running the pre-install script.

decks-installer-values- secrets store the final merged values which are saved and rotated (up

to ten) every time you run the deck installer's apply. --use-last-values flag can be specified to refer the last saved

value secret.

nautilus-private namespace stores the value secrets as well as other K8s resources used or generated by the

installer such as log cofigmaps. This namespace will persist across installations.

Pre Installation Steps 35

Install SDP Edge or SDP Micro

Topics:

Deploy with an OVA Install Ubuntu on Bare Metal Deploy on Linux SDP Edge with Longhorn storage Manage SDP Edge and SDP Micro

I

36 Install SDP Edge or SDP Micro

Deploy with an OVA You can deploy SDP on VMware using an Open Virtual Appliance (OVA) file. The OVA creates a VM, creates the Kubespray cluster, and deploys SDP Edge or SDP Micro in the cluster.

Topics:

Deploy the ova on vSphere Configure an additional disk Power on the VM and set up the SDP network Reload Kubespray dependencies Set up MetalLB Add a license Install SDP Obtain SDP UI credentials and log in Provision additional disk Create an SDP Project

Deploy the ova on vSphere

Prerequisites

SDP OVA deployment requires the following minimum resources:

Storage 2 disks of 1TB each

CPUs 8 CPUs

RAM 32GB

Steps

1. Log in to vSphere.

2. Go to Datacenter.

3. Click Actions > Deploy OVF Template.

4. On the Select an OVF template page, specify a URL or a local location where the Dell Technologies SDP ova resides. Click Next.

5. On the Select a name and folder page, provide a name for the VM to create. Then choose a location. Click Next.

6. On the Select a compute resource page, select a resource. Click Next. Wait for the system to validate.

7. On the Review details page, verify the template information and click Next.

8. On the Select storage page, select wanted storage resource for SDP and click Next.

9. On the Select networks page, select the network destination and click Next.

10. On the Ready to complete page, verify all information and click Finish.

4

Deploy with an OVA 37

Configure an additional disk Some use cases might require additional persistent storage for third-party applications running on the same VM as SDP.

About this task

NOTE: Configure an additional disk before starting the SDP VM.

The SDP OVA Template defines two default disks. You can choose to add a third disk. When SDP starts, it provides the tools to partition the disk and create a storage class in Kubernetes.

Steps

1. Right-click the VM name and select Edit Settings.

2. On the Virtual Hardware tab, in the upper right, click Add New Device > Hard Disk.

3. In the form under New Hard disk, choose the size of the disk, and click OK.

Power on the VM and set up the SDP network

Prerequisites

Obtain the following credentials from your Dell Technologies account team.

The login password for the Ubuntu desktop. The default username is sdpadmin.

The login credentials for Rundeck.

Steps

1. Right-click the SDP VM name and select Power On.

2. Log in to the Ubuntu desktop. See Prerequisites above for credentials.

3. Open Firefox and go to the following URL, which accesses theRundeck utility.

http://localhost:4440/

NOTE: Rundeck might take a few minutes to load.

4. Log in to Rundeck. See Prerequisites above for credentials.

5. Select Projects > SDP-Project.

6. In the left panel, select Jobs.

7. From the list of jobs, select Setup Network.

8. Type the VM Static IP, DNS server IP, and the gateway information. Click Run Job Now. Wait for the job to complete successfully.

9. Close Firefox and reboot the SDP VM.

Reload Kubespray dependencies

Steps

1. After the reboot, log in to the SDP VM and then log in to Rundeck.

See Power on the VM and set up the SDP network on page 38 for more information, including credential information.

2. Select Projects > SDP-Project.

3. Select Jobs in the left pane.

4. Select Reload kubespray images from the Jobs list.

5. Click Run Job Now.

38 Deploy with an OVA

6. Wait for the job to complete successfully before proceeding.

Set up MetalLB Configure the load balancer.

Steps

1. Log in to the Ubuntu desktop.

2. Open Firefox and go to this URL:

http://localhost:4440/

NOTE: The URL accesses the Rundeck utility. Rundeck might take a few minutes to load.

3. Log in to Rundeck.

4. Select Projects > SDP-Project.

5. In the left panel, select Jobs.

6. From the list of jobs, select Setup MetalLB.

7. Provide the IP range for MetalLB.

The IP Range should be consecutive IPs that can be allocated for the SDP load balancers. For example: 192.168.10.100-192.168.10.120

8. Click Run Job Now.

Wait for the job to complete successfully.

9. Remain logged in to Rundeck, and go to Install SDP on page 39.

Add a license Optional: Add your customer-specific license file to SDP. During installation, SDP is activated using the license file.

About this task

NOTE: If you skip these steps. SDP will install with an evaluation license. Later, you can reapply the installation using a

permanent license.

Steps

1. Download your license file (license.xml) from the Software Licensing Central. To learn more about obtaining the license file, see Obtain and save the license file.

2. Copy the license file into the required location in the ~/desdp/ directory. Do not rename this file, the license file name must be license.xml.

cp license.xml ~/desdp/sdp-auto-installer/ansible/roles/sdpinstaller/ files/

Install SDP

Steps

1. Select Projects > SDP-Project.

2. On the left, click Jobs.

3. In the list of jobs, select Install SDP.

4. In sdp_micro_plan, select the size of SDP Micro to install.

Deploy with an OVA 39

Values are: small, medium, large, or x-large. For descriptions of the plans, see SDP Micro plan descriptions on page 47.

5. In NodeIP, type the SDP VM static IP that you used in Network Setup.

6. Click Run Job Now.

This job takes 13 minutes to complete.

Obtain SDP UI credentials and log in Retrieve the password to use for the SDP default administrator.

About this task

The SDP default administrator username is desdp. Use these steps to retrieve the password.

Steps

1. Click Projects > SDP-Project.

2. Click Jobs > SDP Information.

3. Click Run Job Now.

4. In the job results, take note of the following information.

Option Description

SDP URL In the Node section, copy the SDP ingress value.

Password for the default SDP user Under Extract SDP UI login password for user desdp, copy the password value.

5. Go to the SDP URL.

6. Log in using desdp as username and the copied value as password.

Provision additional disk Use Rundeck to set up an additional disk. This step is optional.

Steps

1. Click Projects > SDP-Project.

2. Click Jobs > Partition additional disk.

3. In Additional disk, type the additional disk name.

For example, to add a second disk to the default disk, type sdc. The tool supports adding one disk at a time.

4. Click Run Job Now.

Create an SDP Project Use Rundeck to create a project.

Steps

1. Click Projects > SDP-Project.

2. Click Jobs > Setup SDP Project.

3. Type the new project name and click Run Job Now.

40 Deploy with an OVA

Install Ubuntu on Bare Metal

Topics:

Install Ubuntu on bare metal Fix the drive used for the boot disk

Install Ubuntu on bare metal This procedure installs Ubuntu from the provided iso file, configures the boot disk, configures the network, and sets other required Ubuntu settings. Perform these steps on each baremetal node.

Steps

1. Apply the iso to the node:

a. In a browser on a workstation, go to the iDRAC IP for the baremetal node. b. Attach the iso file to the node as a CD/DVD media. c. Select the boot option boot from virtual CD/DVD. d. Select Save. e. Reboot the node using chassis warm reset.

For more detail, see https://www.dell.com/support/kbdoc/000124001/using-the-virtual-media-function-on-idrac-6-7-8- and-9

2. On the Language screen, select your language.

3. On the Keyboard screen, select your keyboard layout.

4. On the Network Connections screen:

a. Move the cursor to select the first eth interface, and press Enter. b. Scroll to select Edit IPv4 and press Enter. c. On the Edit IPv4 configuration screen, press Enter, and then select Manual. d. Complete the IPv4 configuration screen, providing all the requested information, including the subnet, address, gateway,

DNS, and search domain. e. Select Save. f. Scroll down and select Done.

5. On the Misc Configuration screens,

a. Configure the Proxy screen, and select Done. b. Configure the Ubuntu archive mirror, and select Done.

6. On the Storage Configuration screen, do the following to install the OS so that the / file system goes on disk /dev/sda. This configuration uses the maximum capacity of the disk.

a. Select Guided Configuration. b. Select Use an entire disk and Set up this disk as an LVM group.

Typically select the disk with an uneven amount of disk space. For example, in the figure that follows, notice that the sda has 371GB compared to other disks that have 894GB.

c. Press Enter. d. On the Storage configuration screen, verify that the / file system is an LVM logical volume.

5

Install Ubuntu on Bare Metal 41

7. Check the boot disk configuration as follows:

a. Scroll to the USED DEVICES section. b. Select the disk that shows a partition for either bios_grub or /boot/efi.

Here is an example that uses bios_grub.

Here is an example that uses /boot/efi.

c. Select Info. d. Verify that the Info screen shows /dev/sda for Path, as shown here:

42 Install Ubuntu on Bare Metal

e. If Path contains a value other than /dev/sda, correct this situation before continuing. See Fix the drive used for the boot disk on page 44 to fix the Path value.

8. Update the main disk partition size.

NOTE: If Path for partition 1 is not /dev/sda, do not proceed with this step. See the previous step and fix the boot

disk now. Then return here and perform this step.

a. In the Storage Configuration screen, under USED DEVICES, select the ubuntu-lv partition, and select Edit.

b. On the Edit screen, change the Size value to match the max value, and select Save.

9. Continue with installation, as follows:

a. On the Storage Configuration screen, select Done.

Install Ubuntu on Bare Metal 43

The Confirm destructive action dialog appears.

b. Select Continue.

10. On the Profile screen, enter your profile information, and select Done.

11. On the SSH setup screen, press Space to select Install OpenSSH server.Then scroll down and select Done.

12. For Featured Server Snaps, scroll down and select Done.

13. Wait for installation to complete.

The Install Complete! screen appears. When installation is complete, the Reboot option appears.

14. Select Reboot.

15. After reboot, validate connections as follows:

a. Connect to the node using ssh. b. Ping the DNS Servers and the network gateway. The gateway is the switch through which the node will communicate to

peer nodes and the outside network.

Fix the drive used for the boot disk If the boot disk is not using the /dev/sda drive, use this procedure to assign the correct drive.

Steps

1. Delete all partitions from the boot device, except for partition 1 (bios_grub).

NOTE: Typically, the system will not allow you to delete partition 1 (bios_grub).

a. On the Storage Configuration screen, in the USED DEVICES section, select a partition and then select Delete. b. Continue to select and delete partitions.

2. Scroll to the AVAILABLE DEVICES section, select the device with type of LVM volume group device, and then select Delete.

3. Under USED DEVICES, select the LVM device and then select Delete.

4. Remove partition 1 by selecting the disk associated with it, and then select Reformat.

44 Install Ubuntu on Bare Metal

5. Repeat deletions until there are no hard drives remaining in the USED DEVICES section.

6. Locate the disk with path name /dev/sda:

a. In the AVAILABLE DEVICES section, select a disk, and then select Info. b. On the Info screen that appears, check the Path field. c. If the path value is /dev/sda, make a note of the disk ID, and select Close. Proceed to Step 7.

d. If the path is not /dev/sda, select Close and repeat the steps with another disk. Continue until you find the disk with the /dev/sda path. Make a note of that disk ID.

7. Go back to the Guided Storage configuration screen.

8. Check Use an entire disk, and choose the drive ID associated with /dev/sda that you discovered above.

9. Select Done.

10. Return to the main configuration procedures in Install Ubuntu on bare metal on page 41 and continue configuration at Step 8, Update the main disk partition size.

Install Ubuntu on Bare Metal 45

Deploy on Linux Use the SDPSpray tool to install SDP Edge or SDP Micro onto any supported Linux platform. SDPSpray configures the required Kubernetes cluster using Kubespray and then installs SDP Edge or SDP Micro in the cluster. It assumes that a supported Linux platform is already installed.

Topics:

Overview Assumptions and prerequisites SDP Micro plan descriptions Required customer information Required preinstallation steps on Ubuntu Required preinstallation steps on Red Hat Enterprise Linux Install Kubespray and SDP Install the GPU operator in Kubespray environments Configure UI access Get SDP URL and login credentials

Overview The supplied SDPspray tool installs Kubespray, creates the Kubernetes cluster for SDP, and installs SDP into the cluster.

SDPspray installs SDP Edge or SDP Micro. Variables that you provide as input to SDPspray control all the following:

Whether SDP Edge or SDP Micro is installed If SDP Edge is chosen, whether the configuration is a 1-node or 3+-node installation. If SDP Micro is installed, whether the configuration is for a small, medium, large, or extra-large plan.

These variables are specified in the env.yaml and ini files that are described in Required customer information on page 47.

Assumptions and prerequisites The SDPspray tool depends on certain assumptions about the operating system environment.

Supported environments

The following environments are supported for SDP Edge and SDP Micro installations:

Supported operating systems are: Ubuntu 18.04.5, 20.04.02 Red Hat Enterprise Linux 8.4

The installed Kubespray version is: 2.14.2

The SDPspray tool supports online and offline installation. For offline installation, it is recommended to use the ISO with all the required packages preinstalled. Use the offline ISO provided by your Dell Technologies Support team. An offline ISO is available for Ubuntu 18.04.5.

Prerequisites

The following prerequisites apply to all nodes in the SDP deployment:

6

46 Deploy on Linux

The nodes are bare metal nodes or VMs that are installed with one of the supported operating systems as described above. Each node has two or more disks. Each node has a configured network . SSH access to the nodes is enabled. All steps in the following sections are completed:

Required preinstallation steps on Red Hat Enterprise Linux on page 51 Required preinstallation steps on Ubuntu on page 51

SDP Micro plan descriptions You choose the SDP Micro plan size in the env.yaml file that is used by the SDPspray installer tool.

Table 6. SDP Micro sizes:

Product Size Memory Long-term Storage(local disk or vSAN)

Total virtual cores

vCPUs reserved for analytics

SDP Micro Small 48 GB 1x 1-TB 6 1

Medium 64 GB 2x 1-TB 12 2

Large 128 GB 2x 1-TB 24 4

X-Large 192 GB 2x 1-TB 36 12

Required customer information The installation process requires you to edit two files to provide customer-specific information and configuration choices. It may be useful to gather this information before starting the installation process.

The two files are: The inventory.ini file

The env.yaml file

The inventory.ini file

This file defines the hosts in the cluster. In single-node clusters, you would have one entry. The interpreter has a different path on Ubuntu and Red Hat Enterprise Linux.

Here is an example for Ubuntu for a 3-node SDP Edge installation:

[kubespray] node01 ansible_ssh_host=192.10.10.1 ansible_python_interpreter=/usr/local/bin/python3 node02 ansible_ssh_host=192.10.10.2 ansible_python_interpreter=/usr/local/bin/python3 node03 ansible_ssh_host=192.10.10.3 ansible_python_interpreter=/usr/local/bin/python3

Here is an example for Red Hat Enterprise Linux for a 3-node SDP Edge installation:

[kubespray] node01 ansible_ssh_host=192.10.10.1 ansible_python_interpreter=/usr/libexec/platform- python node02 ansible_ssh_host=192.10.10.2 ansible_python_interpreter=/usr/libexec/platform- python node03 ansible_ssh_host=192.10.10.3 ansible_python_interpreter=/usr/libexec/platform- python

Deploy on Linux 47

The env.yaml file

This file defines SDP configuration values. A table of field descriptions follows this example.

--- # variables

##################### Deploy ssh key ##################### provision_user: sdpadmin #provision_password: mypassword disable_password_auth: false disable_root_login: false add_new_user: false

##################### Ntp service ##################### ntp_enabled: true ntp_servers: - "1.ubuntu.pool.ntp.org iburst" - "2.ubuntu.pool.ntp.org iburst" - "3.ubuntu.pool.ntp.org iburst"

# Installation node install_node: node01

# Installation mode offline_installation: false

sdp_micro: false # Available SDP micro plans: small, medium, large, x-large sdp_micro_plan: small

# Primary network primary_network: network_enabled: false

# upstream dns CoreDNS upstream_dns: - 192.0.4.17 - 192.0.4.19

# MetalLB configuration metallb: true metallb_primary_range: - "192.10.10.2-192.10.10.9" metallb_protocol: "layer2"

#metallb_additional_pools: # services_pool: # ip_range: # - "192.10.10.10-192.10.10.25" # # Alternative network alternative_network: network_enabled: false

# enable socks_proxy socks_proxy_enabled: false

# SDP variables base_domain_name: sdp-demo.org sdpdir: /desdp/decks-installer sdp_values: values.yaml sdp_version: 1.4 decks_installer_version: 1.4.0.0-xxxxx sdp_domain_name: "sdp.{{base_domain_name}}" sdp_tls_enabled: true # docker registry dockerport: 31001 sdpregistry: "sdp-registry:{{dockerport}}/desdp" cadir: desdp/ca

48 Deploy on Linux

#Disks configuration # When installing in single node single disk mode, Bookkeeper ledger and LTS share a disk in single node single disk installation # single_node_single_disk_mode -- enable or disable installing with single node single disk # single_node_single_disk -- shared ledger and LTS disk # bookeeper_ledger_size, bookeeper_journal_size, bookeeper_index_size -- bookkeeper partition sizes in GiB # dedicated_bookkeeper_disks -- dedicated bookkeeper disk lables (comma separated) # dedicated_lts_disk -- disk label if a dedicated disk is allocated for Long Term Storage # dedicated_deault_sc_disk -- Add disk label if a dedicated disk is allocated for Default storage class # powerflex_nodes -- set to true when installing in powerflex nodes disk_settings: single_node_single_disk_mode: true single_node_shared_disk: 'sdd' bookeeper_ledger_size: 100 bookeeper_journal_size: 100 bookeeper_index_size: 10 dedicated_bookkeeper_disks: '' dedicated_lts_disk: '' dedicated_deault_sc_disk: '' powerflex_nodes: false

local_nfs: true # use nfs_server and nfs_path when setting local_nfs to false #nfs_server: 10.10.10.10 #nfs_path: '/nfs/path'

# Service and pod subnet. Don't change these networks unless there is a conflict with Node network. pod_network: 192.0.2.0/18 service_network: 192.0.3.0/18

Update the env.yaml file:

# sdp_serviceability sdp_serviceability: # support assist supportassist_access_key: 12345678 supportassist_pin: 1234 supportassist_siteID: 123456

# decks config decks_srsgateway_login: srsgateway@dell.com decks_srsgateway_hostname: srsgateway.dell.com

Table 7. Descriptions of variables in env.yaml

Variable Default Description

ntp_enabled required Enable or disable setting up ntp server

ntp_servers required List of ntp_servers

provision_user required A sudo user; on 1 or 3 nodes with sudo access and used by ansible to provision the nodes

upstream_dns required List of upstream DNS servers

install_node required SDP kubespray running node. Do not change it from node01 unless needed.

base_domain_name required Base domain for SDP deployment

sdp_micro false Enable or disable SDP Micro based installation with SDPspray

Deploy on Linux 49

Table 7. Descriptions of variables in env.yaml (continued)

Variable Default Description

sdp_micro_plan small Values are: small, medium, large, or x-large. For descriptions of the plans, see SDP Micro plan descriptions on page 47.

primary network Primary network is the external network for your SDP Edge system. This network is used for data collection and administrative functions, including direct browser access to the SDP UI (without the Socks5 proxy).

sdp_values required SDP values file

sdp_version required SDP version

decks_installer_version required decks installer version

sdp_domain_name required SDP domain name. Example format edge.{{ base_domain_name }}

sdp_tls_enabled required Enable or disable TLS during SDP installation

sdpregistry required SDP registry FQDN.

dockerport required SDP registry port

cadir required CA certificate folder

disk_settings See notes in the example file above.

local_nfs required Enable or disable setting up local NFS server. Enable for single node installation when an external NFS server is not avaiable.

nfs_server optional External NFS server for multinode cluster

nfs_path optional External NFS server path

offline_installation false Set to true for offline installation

pod_network 10.233.64.0/18 Kubespray cluster pod network

service_network 10.233.0.0/18 Kubespray cluster service network

metallb true Enable or disable metallb

metallb_primary_range 10.10.10.1-10.10.10.100 Primary metallb range

metallb_protocol layer2 Metallb protocol

metallb_additional_pools optional Additional metallb pools

alternative_network Typical SDP Edge installations do not require an alternative network. In special case situations for which the primary network is not suitable for all functionality, you can configure this optional network as an administrative network. For example, consider a configuration in which the primary network is a WIFI hotspot that is collecting data at the Edge. You might want an alternate network for management functions.

50 Deploy on Linux

Table 7. Descriptions of variables in env.yaml (continued)

Variable Default Description

ldap_keycloak_authentication optional Enable or disable keycloak LDAP auth

ldap_certs optional List of LDAP certificate and common name

static_coredns_hosts optional List of static fqdn and IPs to add in the coredns hosts file.

Required preinstallation steps on Ubuntu

Steps

1. Log in with ssh as the user that has sudo access.

2. Add passwordless sudo to all nodes.

echo " ALL=(root) NOPASSWD:ALL" | sudo tee -a /etc/sudoers.d/ sudo chmod 0440 /etc/sudoers.d/

3. Create ssh key for login user on node one only.

ssh-keygen -t rsa -b 4096 -f /home/ /.ssh/id_rsa -N

4. Perform the following on all nodes.

sudo apt update sudo apt install sshpass sudo apt upgrade -y sudo apt install python3-pip sudo pip3 install --force-reinstall ansible==3.4.0

5. Perform on node one.

ssh-copy-id @

Required preinstallation steps on Red Hat Enterprise Linux

Steps

1. Using ssh, log in to the node as root or as sudo user.

2. Create a nonroot user (sdpadmin) and provide that user with sudo privileges.

useradd -m sdpadmin -s /bin/bash passwd sdpadmin echo "sdpadmin ALL=(root) NOPASSWD:ALL" | tee -a /etc/sudoers.d/sdpadmin chmod 0440 /etc/sudoers.d/sdpadmin

3. Create an ssh key for the new user.

su - sdpadmin -c "ssh-keygen -t rsa -b 4096 -f /home/sdpadmin/.ssh/id_rsa -N ''"

Deploy on Linux 51

4. Log in as the new user on the provisioner node.

su - sdpadmin

5. Run the rhel-prerequisites script, which is shown below, to register nodes, subscribe necessary repos, and install the packages that are required to run the SDPspray tool.

cat <<-"EOF" > rhel-prereqs.sh #!/usr/bin/env bash set -e EXIT_CODE=0 echo "Registering node" read -p 'Enter RedHat subscription manager username: ' username sudo subscription-manager register --username=${username} --auto-attach || EXIT_CODE=$? if [ ${EXIT_CODE} -eq 0 ] || [ ${EXIT_CODE} -eq 64 ] then echo "Successfully registered node" else echo "failed to register node" exit 1 fi echo echo "subscribing to required repos" sudo subscription-manager repos --enable=rhel-8-for-x86_64-appstream-rpms \ --enable=rhel-8-for-x86_64-baseos-rpms \ --enable=ansible-2.9-for-rhel-8-x86_64-rpms || \ { echo 'failed to subscribe required repos' ; exit 1; } sudo yum -y update || { echo 'failed to update packages' ; exit 1; } sudo yum -y install ansible || { echo 'failed to install ansible' ; exit 1; } EOF

chmod +x rhel-prereqs.sh ./rhel-prereqs.sh

NOTE: If you are creating a 3+N Kubespray cluster for SDP Edge, repeat Steps 1 through 5 on each node.

6. Copy the ssh public key to the nodes.

ssh-copy-id sdpadmin@

Example:

for node in 10.10.10.1 10.10.10.2 10.10.10.3; do ssh-copy-id sdpadmin@$node; done

Install Kubespray and SDP

Steps

1. Download the SDP installer from the Streaming Data Platform page on the Dell Technologies Support site.

a. Go to https://www.dell.com/support/home/en-us/product-support/product/streaming-data-platform/drivers. b. Log in with your Dell support account. c. Go to 1.4 > 1.4 Edge. d. Download all files in the list.

2. NOTE: Download latest SDPspray version: 2.18

52 Deploy on Linux

Extract and untar sdpspray.

unzip sdp-1-4.zip cd sdp-1-4 tar xvzf sdpspray-2.18.1-*

3. Download and copy the SDP images to the roles/sdpimages folder.

pushd sdpspray/ansible/roles/sdpimages/files # copy SDP images popd

4. Download and copy the SDP installer to the roles/sdpinstaller folder.

pushd sdpspray/ansible/roles/sdpinstaller/files # copy SDP installer popd

5. Go to the ansible directory.

cd ansible

6. Open the inventory.ini file and modify it to add the cluster nodes. Ensure that node01 is your primary node IP. Save and exit the file after you update it.

7. Open the env.yaml file and modify it based on your cluster and network configuration.

See Required customer information on page 47 for guidance. Save and exist the file after you update it.

8. Run the installer script.

./run_sdp_auto_installer.sh

NOTE: The script stores the state of the last successful step. If you rerun the script, it resumes from the last successful

step to avoid any repetition in the installation process. To restart the script from the beginning, first run the following

commands:

rm -f /tmp/auto_installer_state ./run_sdp_auto_installer.sh

Install the GPU operator in Kubespray environments If any applications that run on SDP Edge or SDP Micro use GPU functionality, you must install the GPU operator.

GPU overview

You must install the NVIDIA GPU Operator before SDP can access GPUs.

The NVIDIA GPU Operator updates the SDP node base operating system and the Kubernetes environment with the appropriate drivers and configurations for GPU access. It automatically configures nodes that contain GPUs and validates the installation.

In summary, the GPU Operator performs the following tasks in SDP:

1. Deploys Node Feature Discovery (NDF) to identify nodes that contain GPUs 2. Installs the GPU Driver on GPU-enabled nodes 3. Installs the nvidia-docker runtime on GPU-enabled nodes

4. Installs the NVIDIA Device Plugin onto GPU-enabled nodes 5. Launches a validation pod to ensure that installation is successful.

For information about the NVIDIA GPU Operator, see https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/getting- started.html.

Deploy on Linux 53

Prerequisites to GPU operator installation

Before installing the GPU Operator, ensure that these prerequisites are met. Ensure that the Kubespray cluster is in the Ready state. Ensure that nodes are configured with a container engine such as Docker. Node Feature Discovery (NFD) is a dependency for the Operator on each node. By default, the Operator automatically

deploys the NFD master and worker. If NFD is already running in the cluster before the deployment of the operator, you can configure the Operator to skip NFD installation.

You must have the kubectl CLI tool, version 1.19.x or later, installed on your workstation and configured to communicate with the SDP cluster. Download kubectl at https://kubernetes.io/docs/tasks/tools/install-kubectl.

You must have the helm CLI tool, version 3.5.2 or later, installed on your workstation. Download helm at https://helm.sh/ docs/intro/install.

Install the GPU Operator on Red Hat Enterprise Linux 8.6

Prerequisites

Supported Environment: KubeSpray: 2.18 Kubernetes Version: v1.23.0 Operating System: Red Hat Enterprise Linux version: 8.6 Container Runtime: Docker

Client: Version: 18.09.9 API version: 1.39 Go version: go1.11.13 Git commit: 039a7df9ba Built: Wed Sep 4 16:51:21 2019 OS/Arch: linux/amd64 Experimental: false Server: Docker Engine - Community Engine: Version: 18.09.9 API version: 1.39 (minimum version 1.12) Go version: go1.11.13 Git commit: 039a7df Built: Wed Sep 4 16:22:32 2019 OS/Arch: linux/amd64 Experimental: false

Steps

1. Install the NVIDIA driver for Red Hat Enterprise Linux version 8 manually as described in https://docs.nvidia.com/ datacenter/tesla/pdf/NVIDIA_Driver_Installation_Quickstart.pdf.

NOTE: The NVIDIA driver image is not available for Red Hat Enterprise Linux 8.X versions in https://ngc.nvidia.com/

catalog/containers/nvidia:driver/tags.

The following commands show the installation process.

sudo dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release- latest-8.noarch.rpm ARCH=$( /bin/arch ) sudo subscription-manager repos --enable codeready-builder-for-rhel-8-${ARCH}-rpms -- enable rhel-8-for-${ARCH}-baseos-rpms --enable rhel-8-for-${ARCH}-appstream-rpms distribution=$(. /etc/os-release;echo $ID`rpm -E "%{?rhel}%{?fedora}"`) sudo dnf config-manager --add-repo http://developer.download.nvidia.com/compute/cuda/ repos/$distribution/${ARCH}/cuda-rhel8.repo sudo dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r) sudo dnf module install nvidia-driver:latest-dkms/default

54 Deploy on Linux

Post Installation Steps (before reboot). For details, see: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/ index.html#post-installation-actions

sudo systemctl enable nvidia-persistenced systemctl status nvidia-persistenced sudo cp /lib/udev/rules.d/40-redhat.rules /etc/udev/rules. sudo sed -i 's/SUBSYSTEM!="memory",.*GOTO="memory_hotplug_end"/SUBSYSTEM=="*", GOTO="memory_hotplug_end"/' /etc/udev/rules.d/40-redhat.rules

NOTE: You may need to reboot the node after installing the driver.

2. Install the NVIDIA GPU operator.

Use helm as follows:

a. Add the NVIDIA Helm Repository.

helm repo add nvidia https://nvidia.github.io/gpu-operator && helm repo update

b. Install using the Helm chart and skip the driver installation.

helm install --wait --generate-name nvidia/gpu-operator --set driver.enabled=false

c. Check for the following error.

open failed: /sbin/ldconfig.real: no such file or directory

To correct this situation, create the symlink that is described in https://github.com/NVIDIA/nvidia-docker/issues/614.

Install GPU Operator on Ubuntu 20.04

Prerequisites

Supported Environment: KubeSpray: 2.18 Kubernetes Version: v1.23.0 Operating System: Ubuntu 20.04 Container Runtime: Docker

NOTE: If the HWE kernel (for example, kernel 5.x) is used with Ubuntu 18.04 LTS or Ubuntu 20.04 LTS, you must block

the nouveau driver for NVIDIA GPUs before starting the GPU Operator. Follow the steps in the CUDA Installation Guide to

disable the nouveau driver and update initramfs.

The NVIDIA GPU Operator supports Ubuntu 18.04.

Steps

1. Add the NVIDIA Helm Repository.

helm repo add nvidia https://nvidia.github.io/gpu-operator && helm repo update

2. Install using the Helm chart. Skip the driver installation.

helm install --wait --generate-name nvidia/gpu-operator

Deploy on Linux 55

Verify GPU Operator

These steps verify the GPU Operator installation and test correct functioning with a workload.

Steps

1. Run the following command:

kubectl get pods -A

Here is example output:

NAMESPACE NAME READY STATUS RESTARTS AGE default gpu-operator-1616579493-node-feature-discovery- master-74dc7krj6 1/1 Running 0 2m18s default gpu-operator-1616579493-node-feature-discovery-worker-mb7wk 1/1 Running 0 2m18s default gpu-operator-1616579493-node-feature-discovery-worker-qtn78 1/1 Running 0 2m18s default gpu-operator-1616579493-node-feature-discovery-worker-sddj4 1/1 Running 0 2m18s default gpu-operator-74c595fc57-6bnnn 1/1 Running 0 2m18s gpu-operator-resources gpu-feature-discovery-b2895 1/1 Running 0 2m7s gpu-operator-resources gpu-feature-discovery-bcnhs 1/1 Running 0 2m7s gpu-operator-resources gpu-feature-discovery-hj762 1/1 Running 0 2m7s gpu-operator-resources nvidia-container-toolkit-daemonset-8pfsk 1/1 Running 0 2m7s gpu-operator-resources nvidia-container-toolkit-daemonset-shg88 1/1 Running 0 2m7s gpu-operator-resources nvidia-container-toolkit-daemonset-xpf6f 1/1 Running 0 2m7s gpu-operator-resources nvidia-dcgm-exporter-8hpgz 1/1 Running 0 2m7s gpu-operator-resources nvidia-dcgm-exporter-9kkh2 1/1 Running 0 2m7s gpu-operator-resources nvidia-dcgm-exporter-zzd5q 1/1 Running 0 2m7s gpu-operator-resources nvidia-device-plugin-daemonset-7qwfj 1/1 Running 0 2m7s gpu-operator-resources nvidia-device-plugin-daemonset-qggkm 1/1 Running 0 2m7s gpu-operator-resources nvidia-device-plugin-daemonset-tznzv 1/1 Running 0 2m7s gpu-operator-resources nvidia-device-plugin-validation 0/1 Completed 0 2m7s gpu-operator-resources nvidia-driver-daemonset-5n2g4 1/1 Running 0 2m7s gpu-operator-resources nvidia-driver-daemonset-gswhw 1/1 Running 0 2m7s gpu-operator-resources nvidia-driver-daemonset-jg6fd 1/1 Running 0 2m7s

2. In the output, ensure that all the containers are in a Running status.

3. In the output, ensure that the correct pods exist.

On each Kubernetes node, there should be one pod for each of the following services: In the default namespace, there should be several gpu-operator services.

In the gpu-operator-resources namespace, there should be one pod per node for each of the following services. The example is showing output for a 3-node SDP Kubespray cluster, so there are three replicas for each service. gpu-feature-discovery serivce nvidia-container-toolkit-daemonset nvidia-dcgm-exporter

56 Deploy on Linux

nvidia-device-plugin-daemonset nvidia-driver-daemonset gpu-operator-resource

4. Verify a GPU workload.

a. Examine the pod log for the nvidia-device-plugin-validation pod.

The service runs a vecadd example for the validation, and a successful installation results in the following logged entry.

device-plugin-validation device-plugin validation is successful

5. Optionally, you can validate by manually running the following Kubernetes pod specs.

cat << EOF | kubectl create -f - apiVersion: v1 kind: Pod metadata: name: vectoradd spec: restartPolicy: OnFailure containers: - name: vectoradd image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.2.1 resources: limits: nvidia.com/gpu: 1 EOF

The pod should complete without error and log the following:

[Vector addition of 50000 elements] Copy input data from the host memory to the CUDA device CUDA kernel launch with 196 blocks of 256 threads Copy output data from the CUDA device to the host memory Test PASSED Done

Uninstall the GPU Operator

If the GPU Operator is not needed in your environment, you can uninstall it.

Steps

1. Delete the GPU operator helm chart.

helm delete gpu-operator -n default

2. Delete clusterpolicy crd.

kubectl delete crd clusterpolicies.nvidia.com

Fore details, see https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/openshift/clean-up.html.

Configure UI access Configure access to the SDP browser-based user interface.

About this task

You may configure access to the SDP UI in either of the following ways: Enable access through browsers on the SDP LAN. Enable access through a VPN using the Socks5 proxy in a Chrome browser.

Deploy on Linux 57

The kubeconfig necessary to access the cluster is set automatically during Kubespray installation. No need to login separately to kubectl to run the following commands.

Enable access through browsers on the SDP LAN

About this task

This configuration is optional and typically not needed. It configures a browser connected to the Primary Network (wireless) to view the SDP UI.

Steps

1. Get the SDP external ingress details.

The following command gets the ingress details in the appropraite format for the /etc/hosts file:

kubectl get ingress -A | awk '{ print $4, $3 }'

2. Copy the output from the previous command.

3. On each host where the browsers are installed, edit the corresponding hosts file and paste the copied information.

The hosts files are located here:

Linux /etc/hosts Windows C:\Windows\System32\Drivers\Etc\Hosts

4. Go to Get SDP URL and login credentials on page 58.

Get SDP URL and login credentials

Steps

1. Determine the URL: The SDP UI URL is:

https://${SDP_DOMAIN_NAME}

where SDP_DOMAIN_NAME is the value you specified in the environment variable file before installing the product.

You can get the domain name with the following command:

kubectl get ing -n nautilus-system nautilus-ui

In the output, the FQDN in the HOSTS field is the SDP_Domain_Name to use in the URL.

2. Get the login credentials for the default admin account. The default admin user name is desdp.

To get the password value, ask a cluster admin or other user with access to the nautilus-system namespace to run this command:

kubectl get secret keycloak-desdp -n nautilus-system -o \ jsonpath='{.data.password}' | base64 -d ; echo

3. Go to the SDP UI URL and log in.

58 Deploy on Linux

Deploy on Linux 59

SDP Edge with Longhorn storage

Topics:

SDP Edge with Longhorn storage

SDP Edge with Longhorn storage

Steps

Create a special storage class with one replica for Bookkeeper.

Figure 15. Create a special storage class

Values file changes for specifying the Bookkeeper storage class:

bookkeeper-cluster:

replicas: 3

storage:

ledger:

volumeSize: 100Gi

className: longhorn-bk

journal:

volumeSize: 100Gi

className: longhorn-bk

index:

7

60 SDP Edge with Longhorn storage

volumeSize: 30Gi

className: longhorn-bk

Figure 16. Values file changes for specifying the Tier 2 and components that require shared storage.

SDP Edge with Longhorn storage 61

Manage SDP Edge and SDP Micro

Topics:

Add trusted CA to browser Add new user in SDP Edge Create a project Set retention size on streams Shutdown and restart the Kubespray cluster Add a node Remove a node Backup Recover the control plane

Add trusted CA to browser Each user who needs access to the SDP UI must add the trusted Certificate Authority to their browser.

Steps

1. Copy the certificate (.crt) from ~/desdp/certs.

2. Add the certificate to the browser or operating system as a trusted certificate authority (CA) for identifying web sites.

3. If you are using an intermediate CA, users may need to trust both the root CA and the intermediate (technical) CA.

Add new user in SDP Edge Add users to the Keycloak instance to provide them with access to the SDP UI.

About this task

1. Add a new user account on the Keycloak UI, as described below. 2. Give that user access to a project by making the user a project member.

NOTE: It is not possible to give a Keycloak local user access to the Kubernetes command line.

Only cluster-admin users have access to the Kubernetes command line.

Steps

1. In a browser window, go to the Keycloak endpoint in the SDP cluster.

To list connection endpoints, see Obtain connection URLs on page 96. If the SDP UI is open, prepend keycloak to the UI endpoint. For example, https://

keycloak.sdp.lab.myserver.com. Depending on your configuration, this might not always work.

2. On the Keycloak UI, click Administration Console.

3. Log in using the keycloak administrator username (admin) and password.

To get the password value, ask a cluster admin or other user with access to the nautilus-system namespace to run this command:

kubectl get secret keycloak-admin-creds -n nautilus-system -o \ jsonpath='{.data.password}' | base64 -d ; echo

4. Click Manage > Users.

8

62 Manage SDP Edge and SDP Micro

5. On the Users screen, click Add User on the right.

6. Complete the form.

NOTE: The username must conform to Kubernetes and Pravega naming requirements as described in Naming

requirements on page 118.

7. Optionally click the Credentials tab to create a simple initial password for the new user.

Create a temporary password. Enable Temporary, which prompts the user to change the password on the next login.

8. To authorize the new user to perform actions and see data in SDP, make the user a member of projects.

Create a project Create a project on the SDP UI.

Steps

1. Log in to SDP as an admin.

2. Click the Analytics icon.

The Analytic Projects table appears.

3. Click Create Project at the top of the table.

4. In the Name field, type a name that conforms to Kubernetes naming conventions.

The project name is used for the following: Project name in SDP UI The Kubernetes namespace for the project An Artifact repository for hosting artifacts for applications defined in the project The project-specific Pravega scope Security constructs that allow any Flink Applications in the project to have access to all the Pravega streams in the

project-specific scope

5. In the Description field, optionally provide a short phrase to help identify the project.

6. Configure storage for the project.

For SDP Edge, long-term storage is NFS on either a PowerScale cluster or on node disks. The medium is configured during installation.

NOTE: For single-node deployments, long-term storage is always on node disks.

Long-term storage type

Field name Description

NFS Storage Volume Size Provide the size of the persistent volume claim (PVC) to create for the project. This value is the anticipated space requirement for storing streams that are associated with the project.

SDP provisions this space in the configured PowerScale file system or node disks, depending on how SDP Edge was configured during installation.

7. Under Features, choose the features to enable in the project.

Field name Description

Artifact Volume Size Provide the size of the PVC to create for the Artifact repository for the project. This value is the anticipated space requirement for storing application artifacts that are associated with the project.

Your selections depend on the intended applications that developers plan to deploy in the project.

Metrics link is available in the Features tab.

Manage SDP Edge and SDP Micro 63

8. Click Save. The new project appears in the Analytic Projects table in a Deploying state. It may take a few minutes for the system to create the underlying resources for the project and change the state to Ready.

9. Create streams in the project.

a. Go to Pravega > project-name. Provide necessary details, such as name, type, and retention. Save. b. Click Create stream. c. Provide stream configuration details such as name and type. d. Best practice is to set retention sizes on all streams.

Retention size is especially important when long-term storage is on node disks. By setting appropriate retention sizes, you ensure against space problems and resulting system downtime.

If you skip the retention size setting now, you can edit the stream later to set retention size.

For retention size calculation suggestions, see the next section called Set retention size on streams on page 64.

Set retention size on streams When long-term storage is defined on node disks, space is a limited resource and must be managed appropriately. Retention size enforces a limit on the space that a stream can consume.

About this task

When SDP Edge uses node disks for long-term storage, the following best practices are recommended: Realize that the disk space is shared by all projects. Allocate 50% of the available disk space for streams. The 50% should be distributed across all the streams for all the

projects on the platform. Set retention sizes on each stream. Set sizes that enable the system to enforce the preferred 50% allocation. For example,

if the disk size is 100GB and you have two projects, each with 1 stream, then we recommend setting retention sizes as 25GB for stream1 and 25GB for stream2.

Retention size is the number of MBytes to retain in the stream. The remainder at the older end of the stream is discarded.

Retention size can be set when a stream is created. However, developers typically create streams and may skip that step. Administrators should edit each stream definition to set retention sizes.

64 Manage SDP Edge and SDP Micro

Steps

1. To find the size of your configured long-term storage when that storage is on node disks:

df -h /desdp/lts

2. On the SDP UI, click Pravega.

A list of Pravega scopes appears. A scope represents a project in Pravega.

3. Click a Scope.

A list of all the streams in the scope appears.

4. Click the Edit action for a stream.

5. Scroll to view the Retention Policy section.

6. Make sure the toggle at the top is set to on (green).

7. Click Retention Size.

8. Type the size in MB, according to the guidelines above.

9. Click Save

10. Continue the process for each stream in each scope. Make sure that the total of all retention sizes equals about 50% of the total disk size.

Shutdown and restart the Kubespray cluster

Steps

1. To stop Kubespray on a three-node deployment, use the shutdown command on each node.

sudo shutdown

2. Restart the nodes to restart Kubespray.

3. To restart one node, you can use the reboot command:

sudo reboot

Add a node You can add a node to a Kubespray cluster.

About this task

Add a new node to an existing cluster in the following scenarios: Replace a nodeTo prevent disruption of current processing, add the new node and then remove the old node that is in a

failed or down state. See Remove a node on page 66. NOTE: Do not use this process to replace the node in a single-node deployment. If the replacement process failed for

any reason, the entire system would be down. To replace the node in a single-node deployment, start from the beginning

with a new deployment.

Steps

1. In a 3-node cluster, connect to the primary node. Run all steps on the primary node.

2. Set up the node inventory.

cp -r inventory/sample inventory/mycluster declare -a IPS=(192.10.1.3 192.10.1.4 192.10.1.5 192.10.1.6)

Where:

Manage SDP Edge and SDP Micro 65

the node IPs identify all the nodes in the cluster including the new one you are adding. In the above example, the first node is the primary node. The next two are also existing nodes in the cluster. The last node is the new one to add.

3. Set required environment variable.

CONFIG_FILE=inventory/mycluster/hosts.yml python3 contrib/inventory_builder/.py $ {IPS[@]}

4. Verify that passwordless ssh is possible from the primary node to the node being added.

5. On the primary node, run the Ansible playbook, specifying that execution should occur on the new node only.

ansible-playbook -i inventory/mycluster/hosts.yml scale.yml l <nodeN>

where:

-l <nodeN> limits the playbook run to the identified node. Continuing with the example in step 2, which lists four IPs, the new node is node4.

Remove a node Remove a failed or unreachable node from a 3-node Kubespray cluster.

Steps

Run the remove-node Ansible playbook, specifying the node to remove in the command line.

ansible-playbook -i inventory/mycluster/hosts.yml remove-node.yml extra-vars node=

Where:

identifies the number of the node to remove. If there are 4 nodes in the inventory and you want to remove the second one in the list, the command would be:

ansible-playbook -i inventory/mycluster/hosts.yml remove-node.yml extra-vars node=node2

Backup Use etcd to back up the SDP Edge control plane.

Steps

Run etcd backup on SDP endpoints.

sudo ETCDCTL_API=3 etcdctl -- endpoints=https://192.10.1.3:2379,https://192.10.1.4:2379,https://192.10.1.5:2379 -- cacert=/etc/ssl/etcd/ssl/ca.pem --cert=/etc/ssl/etcd/ssl/ca.pem --key=/etc/ssl/etcd/ssl/ ca-key.pem snapshot save snapshotdb

Recover the control plane

Steps

Run the recover-control-plane.yml playbook.

ansible-playbook -i inventory/mycluster/hosts.yml -e override_system_hostname=false -e dashboard_token_ttl=0 -e kubectl_localhost=true \ -e kubeconfig_localhost=true -e kube_version_min_required=n-2 --become --become-

66 Manage SDP Edge and SDP Micro

user=root recover-control-plane.yml \ --limit=etcd,kube-master -e ignore_assert_errors=yes --ask-pass --ask-become-pass -vvv e etcd_snapshot=/home/ubuntu/snapshotdb

Manage SDP Edge and SDP Micro 67

Install SDP Core

Topics:

Site Prerequisites Configuration Values Install SDP Core

II

68 Install SDP Core

Site Prerequisites

Topics:

Obtain and save the license file Configure SupportAssist Set up local DNS server Provision long-term storage on PowerScale Provision long-term storage on ECS

Obtain and save the license file SDP is a licensed product and requires the license file for installation. An evaluation license is available in the distributed images as a default. The license file is an XML file.

Prerequisites

Obtain the license activation code (LAC) for SDP. Customers typically receive the LAC in an email when they purchase a license.

Have your support.dell.com account credentials available.

Steps

1. In a browser, search for Dell Software Licensing Center.

2. Log in using your support.dell.com account credentials.

3. In the Software Licensing Center (SLC), perform a search for your LAC number.

4. Follow the SLC wizards to assign your license to a system and activate the license.

5. Use the SLC feature to download the license file.

6. Save the license file to a location accessible to the host on which you will run the installation commands.

The license file path is required information during installation.

7. Do not alter the license file in any way. Altering the file invalidates the signature and the product will not be licensed.

NOTE: Be careful if you FTP the license file. Always use a binary FTP; otherwise, FTP may cause a file alteration and

invalidate the license.

Configure SupportAssist See Configure SupportAssist for details.

Set up local DNS server A local DNS server enables convenient connections to SDP endpoints from external requests.

In a production deployment, a local DNS server enhances your user experiences. The local DNS server resolves requests that originate outside the cluster to endpoints running inside the Kubernetes cluster. Although it is not technically required to install SDP, a local DNS server is recommended. The installation and connection instructions later in this guide assume that you have a local DNS server set up.

NOTE: Do not use the corporate DNS server for this purpose.

9

Site Prerequisites 69

Local DNS installed in the internal network (CoreDNS or BIND)

Set up the local DNS server on the vSphere used by SDP or use another local DNS server elsewhere in your network. Save the connection information to the server for later use in the configuration values file.

Cloud DNS You may use a cloud DNS. To use cloud DNS solutions, such as AWS Route53 or Google Cloud DNS, you must have an account with the cloud provider. Save the account name and credentials to the account for later use in the configuration values file.

Provide the connection details for the local DNS server in the configuration values file, in the external-dns: section. See Configure connections to a local DNS on page 80.

Provision long-term storage on PowerScale To use a PowerScale cluster for SDP long-term storage, provision the file system and gather the relevant information.

Provision a file system on the PowerScale cluster using standard PowerScale procedures. Then gather the following information, which is required to configure SDP:

PowerScale cluster IP address Mount point Mount options

Add this information to the configuration values file, in the nfs-client-provisioner: section. See Configure long-term storage on PowerScale on page 81.

Provision long-term storage on ECS To use an ECS appliance for SDP long-term storage, provision the namespace and a bucket for Pravega, and gather the relevant information.

Prerequisites

Check with your Dell Technologies Support team to ensure that the ECS version you are using has incorporated the STORAGE-27535 fix.

SDP supports long-term storage on an ECS appliance, in a namespace containing S3 buckets. You may need a load balancer for HTTP traffic to the ECS nodes. ECS consists of data nodes that can handle HTTP

requests from external clients. Pravega uses the ECS Smart Client feature to balance traffic to the data nodes, and does not need a load balancer. Apache Flink is not compatible with the ECS Smart Client feature. If you intend to use SDP to run Apache Flink

applications, there must be a load balancer in front of ECS. Any Layer-4 load balancer (either hardware or software) may be used to load balance the HTTP traffic.

NOTE: For a software load balancer, such as HAProxy, the load balancer must be configured outside of the SDP cluster.

SDP does not provide a node for the ECS load balancer.

Steps

1. Define an ECS namespace for SDP.

Using standard ECS procedures, provision the namespace with the following attributes:

a. Assign a name that indicates the namespace is for a SDP installation. b. Enable replication on the namespace and add it to a Replication Group.

NOTE: GeoReplication is not supported.

c. Define an ECS administrator account with admin privileges to the namespace.

Internal components in SDP use these credentials to create buckets in the namespace as users create projects.

2. Define one bucket in the namespace.

Using standard ECS procedures, provision the bucket with the following attributes:

a. It must be an S3 bucket.

70 Site Prerequisites

b. The bucket must not have any data in it prior to SDP installation. c. Name the bucket. Pravega and its segment store use this bucket for general purposes. A name that includes pravega

provides context. d. Create SecretKey credentials to control access to the bucket.

NOTE: IAM credentials are not supported.

3. Gather configuration information for later use in the configuration values file. See Configure long-term storage on ECS on page 82.

Namespace name Replication group name ECS Object API endpoint (for the pravega.ecs_s3.uri field in the configuration values file)

ECS Management API endpoint ECS admin credentials Bucket name that you provisioned for Pravega SecretKey credentials to the Pravega bucket: Access key and secret key. If the ECS management or Object API endpoints use custom trusts (self-signed certificates), download the certificates.

Site Prerequisites 71

Configuration Values

Topics:

About configuration values files Prepare configuration values files Source control the configuration values files Validate the configuration values Configure global platform settings TLS configuration Configure connections to a local DNS Configure long-term storage on PowerScale Configure long-term storage on ECS Configure or remove connection to SupportAssist Configure remote support information Configure passwords for the default administrative accounts

About configuration values files Configuration values files contain configuration settings for SDP. These files are required input to the installation command.

Purpose SDP configuration and deployment options must be planned for and specified in configuration files before the installation takes place. The installer tool uses the configuration values during the installation process.

NOTE: Some settings cannot be changed after installation, requiring an uninstall and reinstall.

The configuration values serve the following purposes: Enable and disable features. Set high-level customer-specific values such as server host names and required licensing files. Set required secrets for component communication. Configure required storage for the platform. Configure features. Override default values that the installer uses for sizing and performance resources. Override default names that the installer users for some components.

Template See Template of configuration values file on page 171. The template contains the configuration settings for SDP installer.

File format A configuration values file contains key-value pairs in YAML syntax. Spacing and indentation are important in YAML files.

The sections in the values file are named according to the component that they are configuring. For example, the section that contains configuration values for the SRS Gateway is named srs-gateway.

If you copy from the template, notice that the entire template comments out all the sections. Be sure to remove the # characters from the beginnings of lines to uncomment sections that you copy.

Multiple configuration values files

The SDP installer accepts a comma-separated string of configuration value file names. Some sites prefer using one large file that contains all the values, and others prefer multiple files. With multiple files, you can isolate sensitive values and separate permanent values from values that might require more frequent updates.

Override values during installation

The SDP installation command provides several options that override the values in configuration values files. See the --set, --set-file, and --config options in the decks-install apply command description here.

10

72 Configuration Values

NOTE:

The password must be the same during installation and update.

The following values must be present in the values file at install and update time:

sdp-serviceability: dellemc-streamingdata-license: hooks: repository: nautilus-kubectl tag: 1.22.7 kahm: postgresql-ha: pgpool: adminPassword:

Prepare configuration values files This procedure describes the values that are essential to a successful SDP installation. You may optionally add other values that you see documented in the templates or elsewhere throughout the documentation.

Steps

1. Create one or more text files to hold the configuration values.

The installation command accepts multiple file names for values.

2. Add information to the configuration values files as described in the following sections.

3. Save the values files in a secure location. These files are required input to the installer command.

The files are typically named values.yaml or similar, but that name is not required.

Some secrets may be in plain text. For this reason, Dell Technologies recommends that you source-control the values files.

You can split the secrets into separate files and strictly control access to them. The installer tool accepts multiple configuration value file names in a comma-separated string.

Source control the configuration values files We recommend using your enterprise source control procedures to protect the configuration values files.

Access to the configuration values must be limited and protected for the following reasons:

The values files are your record of your current configuration. To make adjustments to your configuration, you will want to edit the current configuration values, making needed changes to the current configuration.

NOTE:

You can reuse the configuration values from a previous install by using the two installer flags --use-last-values and --values-from-secrets.

Every time the installer runs an apply command it saves the configuration values as a secret in Kubernetes cluster.

The saved secret can be reused at a later time when running updates on a cluster.

A running record of changes that were made to the configuration might be useful for research purposes when you are fine-tuning some of the operational values.

The values files may contain secrets.

Validate the configuration values SDP includes a script that validates the configuration values files before you include them in an installation command.

The validate-values.py script checks that required values are included and that values are specified in an acceptable format. The script will indicate if something required is missing, or if you are good to continue with installation. Resolve all missing items identified by the validate-values.py script and re-run it until it indicates you are ready to proceed with the installation.

Configuration Values 73

You cannot use the validate-values.py script until you set up your local environment with required tools and extract the product files, as described in the Installation chapter. That chapter also includes the procedure for validating configuration values by running the validate-values.py script.

See Run the validate-values script on page 93 for information about running validate-values.py on demand.

Configure global platform settings The global section of the configuration values file sets platform-wide installation choices.

This configuration is required to set external connection to the platform UI, set the type of long-term storage for the cluster, and other platform-wide settings.

Copy the global: section from the template, or copy the following example:

#global: # storageType: nfs | ecs_s3 # external: # host: "" #tld that services will use # clusterName: "" # tls: true # darksite: true | false # ingress: # annotations: # kubernetes.io/ingress.class: nautilus-nginx # kubernetes.io/tls-acme: "true" # # Custom CA trust certificates in PEM format. These certificates are injected into certain Java components. # The main use for them at the moment is when communicating with an ECS Object endpoint that uses custom trust, i.e. Self Signed Certificates # tlsCABundle: # ecsObject: |- # -----BEGIN CERTIFICATE----- # MIIDnDCCAoSgAwIBAgIJAOlxdGLbM3vBMA0GCSqGSIb3DQEBCwUAMBYxFDASBgNV # BAMTC0RhdGFTZXJ2aWNlMB4XDTIwMDIxOTE5MzMzNVoXDTMwMDIxNjE5MzMzNVow # ...

Table 8. Configure global settings

Name Description

storageType The long-term storage solution (Pravega Tier 2 storage) for this instance of SDP. Changing the storage type after installation is not supported.

NOTE: If this parameter is not in the configuration values files, the value defaults to nfs.

Choose one of the following values: nfs for a Dell Technologies PowerScale cluster. Then configure the nfs-

client-provisioner: section with connection details.

ecs_s3 for an S3 namespace on an ECS appliance. Then configure the pravega:ecs_s3: section with connection details.

external.host: Required. The top-level domain (TLD) name you want to assign to SDP master node. This value is visible to end users in the URLs they use to access the UI and other endpoints running in the cluster.

The format is "<name>.<host-fqdn>" Where:

<name> is your choice.

<host-fqdn> is the fully qualified domain name of the server hosting SDP.

For example, in xyz.desdp.example.com, xyz is <name> and desdp.example.com is the host-fqdn.

74 Configuration Values

Table 8. Configure global settings (continued)

Name Description

This field is setting the top-level domain name (TLD) from the perspective of SDP. The product UI is served off https:// . The Grafana UI is served off https://grafana. , and so on, for the other endpoints.

For example, a TLD of xyz.desdp.example.com serves the UI off https://xyz.desdp.example.com and Grafana off https:// grafana.xyz.desdp.example.com. The DNS server has authority to serve records for *.xyz.desdp.example.com.

external.clusterName: Required. The name that you plan to use for the SDP Kubernetes cluster. This value is the name of the cluster to create in Kubernetes.

NOTE: The SDP installer propagates this value into the Helm charts.

external.tls: NOTE: external.tls: value must always be set to true.

true: Values are required that specify the type of certificates to use and configures those certificates. See TLS configuration on page 75 for all TLS options and how to configure them.

external.darksite: true Defaults to false. Add this line and set to true if your installation does not have an ESE Gateway.

ingress: Leave the default annotations as shown. The first annotation specifies which ingress controller can act on platform

ingress. The SDP installer deploys the Nginx Ingress Controller with -- ingress-class=nginx-nautilus. The controller handles all ingresses that have this annotation.

The second annotation specifies that the platform Cert Manager should automatically provision the TLS certificate for the ingress endpoint.

tlsCABundle: This section contains a collection of custom CA or self-signed CA trust certificates in PEM format.

tlsCABundle.ecsObject: This field is required if your site uses a custom CA trust certificate for the object API endpoint for long-term storage on ECS. Copy the entire certificate contents and paste here. See the last step in Configure long-term storage on ECS on page 82.

tlsCABundle.internalCa This internal CA is used to sign certificates internal to the cluster that some SDP services communicate with.

TLS configuration SDP supports Transport Layer Security (TLS) for external connections to the platform.

TLS is mandatory.

These configuration items may be changed after initial installation by applying an installation update.

TLS version Type of certificate authority

Configure TLS version

When you install SDP to use TLS for communication over public endpoints, you can also specify which TLS protocol versions to enable.

About this task

The TLS version affects communications for the following components in SDP deployments:

Configuration Values 75

Server components that allow incoming connections into the installationThese components include the Pravega Controllers, the Pravega Segment Stores, the Schema Registry API, Keycloak, and all the UIs that are served to the browser.

Client component applications that run in the clusterClients include Flink and Spark applications and stand-alone Pravega client applications.

The following options are available for TLS version configuration:

Strict TLSv1.2 Strict TLSv1.3 Mixed mode, which supports both TLSv1.2 and TLSv1.3. Mixed mode is the default installation mode when you enable TLS

but do not configure a TLS mode.

Here are guidelines for choosing the TLS protocol version.

Mixed mode (default setting)

This mode provides a temporary path for upgrading applications. Browsers can take advantage of TLSv1.3 right away, and developers can recompile applications with the latest connectors at a convenient time.

Some environments do not have a requirement for all inbound traffic to be TLSv1.3.

Strict TLSv1.3 mode

In this mode, all clients are forced to use TLSv1.3 for successful connection. This mode is appropriate in the following circumstances: For security conscious customers To future-proof SDP operations for when TLSv1.2 disappears.

Strict TLSv1.2 mode (less common)

This mode is useful in some specific crypto export situations.

To configure the TLS version for the SDP platform:

Steps

Set global.external.tlsProtocols in the values.yaml file.

Example

The following entry sets mixed mode.

global: external: host: "myserver.abc.com" clusterName: "mycluster" tls: true tlsProtocols: "TLSv1.3,TLSv1.2"

The following entry sets strict TLSv1.3.

global: external: host: "myserver.abc.com" clusterName: "mycluster" tls: true tlsProtocols: "TLSv1.3"

The following entry sets strict TLSv1.2.

global: external: host: "myserver.abc.com" clusterName: "mycluster" tls: true tlsProtocols: "TLSv1.2"

76 Configuration Values

Switch the TLS version

You can switch between protocol versions on a running system after installation.

About this task

The typical scenario is to install SDP using mixed mode, and then switch to a strict mode later. If required, you can also switch from a strict mode to mixed mode.

Steps

1. In the values.yaml file, update the global.external.tlsProtocols field to the wanted value.

For valid values, see Configure TLS version on page 75.

2. Rerun the decks apply command with the updated values.yaml file.

This action changes the current configuration of the running system. For detailed instructions about changing the current configuration, see Update the applied configuration on page 107.

Changing the TLS version field causes the following: NGINX restarts, creating a brief connection loss for all exposed APIs and UIs.

The Pravega Controller segmentstores restart, causing some DU.

3. If your update is switching from mixed mode to strict TLSv1.3, the following additional actions are required:

a. Recompile all Flink, Spark, and Pravega client applications using SDP libraries from SDP 1.3 or later. These libraries contain connectors that are TLSv1.3-compliant.

b. Manually restart Flink, Spark, and Pravega client applications.

Application requirements when using strict TLSv1.3 installation

If you configure SDP for strict TLSv1.3 mode, all client applications must be compiled with TLSv1.3-compliant libraries.

The SDP 1.4 release includes updated libraries that are TLSv1.3 compliant for the Flink connector, Spark connector, and Pravega client. All three of these libraries are labeled as version 0.11.x as SDP 1.4 includes Pravega 0.11. For strict TLSv1.3 mode, your SDP applications must be compiled (or recompiled) using these 0.11.x versions.

For the Pravega Rust Client, 0.3.1 version is supported.

For Java8 stand-alone Pravega Client applications, the following requirements apply:

The minimum JRE versions are: Oracle: Build 261 and above (https://www.oracle.com/java/technologies/javase/8u261-relnotes.html). OpenJDK: Build 272 and above (https://mail.openjdk.java.net/pipermail/jdk8u-dev/2020-October/012817.html).

The following properties must be passed into the JVM:

-Djdk.tls.client.cipherSuites=TLS_AES_128_GCM_SHA256 \ -Djdk.tls.client.protocols="TLSv1.3"

Certificate authority configuration

You have the following certificate authority (CA) options.

Table 9. TLS certificate authority options

Option Description

Private CA You can generate a private certificate key and certificate, add the certificate to a trust store and make it available to the SDP installer.

Enterprise or well-known CA For these options, you extract the Certificate Signing Requests (CSRs) from the SDP installer and send them to the CA. The CA issues the signed certificates which you use to install. SDP provides a cli-issuer tool to facilitate handling of CSRs and installing signed certificates.

Configuration Values 77

Table 9. TLS certificate authority options (continued)

Option Description

NOTE: When deploying SDP with enterprise CA, add the following to the values file under cert-manager-resources to disable the cert issuer verification test when

using enterprise CA:

cert-manager-resources: verifyCertificateIssuing: skip: true

NOTE: Certain components need a certificate signed by an internally trusted CA in the K8s cluster. A public private

key pair must be generated and configured in the values file to enable this. The public part of the certificate must

be provided in the global.tlsCABundle.internalCa setting in the values file as well as in cert-manager- resources.internalCASecrets. Both tls.crt and tls.key secrets are required.

The next sections provide configuration details for each of the CA options.

Self-signed certificates

About this task

You must be further along in the installation process before you can complete all the recommended steps for using self-signed certificates. You can optionally look ahead for the relevant instructions:

Steps

1. See Prepare self-signed SSL certificate on page 92 for steps to create the certificates, add them to the configuration values file, and push them into the registry with the other SDP images.

2. See (Optional) Validate self-signed certificates on page 95 for steps to validate that the certificates are working after installation, and remove the certificate from the registry.

Configure TLS using signed certificates from a Certificate Authority

This procedure obtains a signed certificate from a CA and configures SDP to use the certificate.

About this task

The SDP installer creates certificate signing requests (CSRs). Dell Technologies provides a tool to extract the CSRs from the cluster and then later, to import the signed certificates into the cluster.

This task describes the following process:

1. Starts the installer so it can create the CSRs. 2. Stops the installer so you can extract the CSRs. 3. Instructs you to submit the CSRs to the CA and wait for the signed certificates. 4. Imports the signed certificates into the SDP cluster. 5. Restarts the installer.

Steps

1. Prepare the configuration values file as follows:

global: external: certOrganization: dellemc.com certOrganizationalUnit: uds-sdp tls: true certManagerIssuerName: cli wildcardSecretName: cluster-wildcard-tls-secret certManagerIssuerGroup: nautilus.dellemc.com

78 Configuration Values

cert-manager-resources: clusterIssuer: name: cli cert-manager: webhook: enabled: false ingressShim: defaultIssuerName: cli defaultIssuerKind: ClusterIssuer extraArgs: ['--dns01-recursive-nameservers-only=true']

2. Download the cli-issuer- .zip from the Dell Technologies Support Site.

Extract the cli-issuer- .zip archive and navigate into the expanded directory. There are three binary executables for different platforms, named cli-issuer- . For convenience, create a symlink or rename the appropriate executable to cli-issuer.

3. Start the installation using the decks-install apply command as described in the Install SDP on page 94.

4. In another window, monitor for CSR generation.

Enter the following command. The watch command before the cli-issuer command monitors every two seconds.

In the output, you are looking for messages that state Certificate signing request (CSR) is available.

$ ./cli-issuer list -A --cluster --insecure-skip-tls-verify NAMESPACE NAME SUBJECT STATUS MESSAGE nautilus-pravega pravega-native-tls-certificate- *.pravega.stane.fr.sdp.hop.lab.emc.com Pending Certificate signing request (CSR) is available nautilus-pravega wildcard-tls-certificate- *.stane.fr.sdp.hop.lab.emc.com Pending Certificate fetched from issuer successfully nautilus-system wildcard-tls-certificate- *.stane.fr.sdp.hop.lab.emc.com Pending Certificate signing request (CSR) is available

You need a CSR for each of the SDP namespaces: nautilus-pravega and nautilus-system.

5. When the two CSRs are available, return to the install window and stop the installation by using CTRL-C.

6. Use the cli-issuer export command to export the three CSRs from the cluster.

./cli-issuer export -n -f

Where and are from the output of the cli-issuer list command. For example:

$ ./cli-issuer export pravega-native-tls-certificate-763176170 -n nautilus-pravega -f pravega-native.csr $ ./cli-issuer export wildcard-tls-certificate-80022116 -n nautilus-system -f wildcard.csr

7. Submit or upload the two CSR files to your selected well-known CA or follow internal procedures for an enterprise CA.

8. When you receive the signed certificates from the CA, save them locally. The files include:

Signed Certificates (.pem) filesThere is a file for each CSR that you submitted. You should have a pravega file and a wildcard file.

The root certificateThe root is the end of the chain of certificates on the customer side.

9. Validate the certificates. The filename should match the certificate CN.

$ openssl x509 -in pravega-native.pem -noout -text | grep CN # the CN in the output should match the pem filename

$ openssl x509 -in wildcard-sabretooth.pem -noout -text | grep CN # the CN in the output should match the pem filename

10. Use the cli-issuer tool to import the signed certificates into the cluster.

$ cli-issuer import -A -f --ca -n

Configuration Values 79

Where: path/to/cert is where the certificate issued by the well-known CA or internal CA is saved on your desktop.

path/to/ca-cert is where the ca-cert is saved on your desktop. The /path/to/cert and /path/to/ca-cert are typically the same value because the ca-cert is typically bundled with the issued certificate.

namespace is the namespace that is listed in the output from the cli-issuer list command. (See step 4.)

For example:

$ ./cli-issuer import -A -f pravega-native.pem --ca ../certs/lab.cacert.pem -n nautilus-pravega Imported a certificate for resource "nautilus-pravega/pravega-native-tls- certificate-763176170"\n $ ./cli-issuer import -A -f wildcard.pem --ca ../certs/lab.cacert.pem -n nautilus- system Imported a certificate for resource "nautilus-system/wildcard-tls- certificate-80022116"\n

11. Validate that certificates are successfully imported.

$ ./cli-issuer list -A NAMESPACE NAME SUBJECT STATUS MESSAGE nautilus-pravega pravega-native-tls-certificate-763176170 *.pravega.cluster1.desdp.dell.com Issued Certificate fetched from issuer successfully nautilus-system wildcard-tls-certificate-80022116 *.cluster1.desdp.dell.com Issued Certificate fetched from issuer successfully

12. Resume the install using the same command that you used to start the install.

a. Update SDP values to include all three PEM files with its chain under external.tlsCABundle.sdp.

b. Run the decks-installer apply command again. c. Run the following command after SDP-operator deployment pods are ready when reapplying with the enterprise CA

certificates.

kubectl get cm -A --field-selector metadata.name=linux-trust-store --no-headers | awk '{ print $1, $2 }' | xargs -n 2 sh -c 'echo "Deleting config map $1 in namespace $0"; kubectl delete cm "$1" -n "$0"' kubectl get cm -A --field-selector metadata.name=java-trust-store --no-headers | awk '{ print $1, $2 }' | xargs -n 2 sh -c 'echo "Deleting config map $1 in namespace $0"; kubectl delete cm "$1" -n "$0"'

Next steps

The signed certificates are imported into the corresponding TLS secrets in SDP. Dell certificates can be imported into monitoring namespace using cli-issuer.

Configure connections to a local DNS Configure connections to the local DNS server.

This configuration is required.

You should have a local DNS server that is already set up. For information about the local DNS server and the various options for setup, see Set up local DNS server on page 69 in the Site Prerequisites chapter.

The following examples show configuration settings for three types of local DNS server. Copy one of the following external- dns: section examples as appropriate for your setup and supply the required values.

AWS Route53 option external-dns:

aws:

80 Configuration Values

credentials: secretKey: " " accessKey: " "

CoreDNS option external-dns: provider: coredns coredns: etcdEndpoints: "http://192.142.NN.NNN:2379" extraArgs: ['--source=ingress','--source=service','-- provider=coredns','--log-level=debug'] rbac: # create & use RBAC resources create: true apiVersion: v1 # Registry to use for ownership (txt or noop) registry: "txt" # When using the TXT registry, a name that identifies this instance of ExternalDNS txtOwnerId: " . " ## Modify how DNS records are sychronized between sources and providers (options: sync, upsert-only ) policy: sync domainFilters: [ . ] logLevel: debug

Bind option external-dns: provider: rfc2136 rfc2136: host: "192.142.NN.NNN" port: 53 zone: "mytest-ns.lss.dell.com" tsigSecret: "ooDG+GsRmsrryL5g9eyl4g==" tsigSecretAlg: hmac-md5 tsigKeyname: externaldns-key tsigAxfr: true rbac: # create & use RBAC resources create: true apiVersion: v1 # Registry to use for ownership (txt or noop) registry: "txt" # When using the TXT registry, a name that identifies this instance of ExternalDNS txtOwnerId: " . " ## Modify how DNS records are sychronized between sources and providers (options: sync, upsert-only ) policy: sync domainFilters: [ . ] logLevel: debug

Configure long-term storage on PowerScale Configure the connection to a PowerScale cluster for long-term storage for SDP.

This configuration is required if you are using a PowerScale cluster for long-term storage. You can configure only one source for long-term storage, either PowerScale or ECS.

You should already have the file system configured on the PowerScale cluster. For reference, see Provision long-term storage on PowerScale on page 70 in the Site Prerequisites chapter.

Make sure that the global: storageType: value is set to nfs.

global: storageType: nfs

Configuration Values 81

Then copy the nfs-client-provisioner: section from the template, or start with the following example:

nfs-client-provisioner: nfs: server: 1.2.3.4 path: /data/path mountOptions: - nfsvers=4.0 - sec=sys - nolock storageClass: archiveOnDelete: "false"

Table 10. Configure NFS storage

Name Description

nfs.server The NFS server hostname or address. This is the PowerScale cluster IP address.

nfs.path The NFS export path.

nfs.mountOptions The NFS mount options (in fstab format).

storageClass.archiveOnDelete Indicates how to handle existing data in the following circumstances: If SDP is uninstalled, whether to delete all of SDP data including stream data If a project is deleted, whether to delete all the project data including stream

data .

Values are:

false does not save any data.

true archives the data. However, this archive is not readable in a new installation of SDP or in a new project.

The default is true.

Configure long-term storage on ECS This configuration is required if you are using an ECS appliance for long-term storage. It configures the connection to an ECS S3 appliance and the bucket plans for project-specific buckets.

Prerequisites

You can configure only one source for long-term storage, either PowerScale or ECS. You should already have the namespace and one S3 bucket (the Pravega bucket) configured on the ECS appliance. For

reference, see Provision long-term storage on ECS on page 70 in the Site Prerequisites chapter.

About this task

The ECS namespace contains the following S3 buckets:

Pravega bucketAs mentioned above, this bucket is preprovisioned before SDP installation. Pravega connects to this bucket on startup using credentials that you configure in this task. The Pravega segmentstore component uses this bucket.

Project-specific bucketsWhen a user creates a project, SDP provisions a project-specific bucket. The project streams are stored in its project bucket. Each project bucket has unique credentials that are autogenerated for it. The ECS Broker performs the provisioning. The ECS Broker gains access to ECS based on connection information that you configure in this task.

Other supporting bucketsSDP provisions additional supporting buckets as needed. For example, it provisions a registry bucket to help manage the project buckets.

SSL certificates might be required:

If ECS uses standard CAs for connection to both its management port and its object API port, certificates are not required. If either the management or the object API endpoints require custom trusts (self-signed certificates), you must provide the

certificates in the configuration values file. The steps to do so are in this task.

82 Configuration Values

In the ecs_service_broker: section of the configuration values file, you configure attributes of the project-specific buckets.

bucket plans When users create a project, they select a bucket plan for the project-specific bucket that the broker provisions. A bucket plan defines policies for managing the bucket, such as size, quotas, warning settings, and access type.

Bucket plans are optional because there is a default bucket plan that is defined internally in the product. You can redefine the default plan and define additional bucket plans in the ecs-service-broker: section of the configuration values file.

bucket reclaim policy

A reclaim policy is the action that the ECS Broker takes on a project bucket when the project is deleted. Reclaim policy is set per bucket plan. The available reclaim policies are: Detach(The default if you do not override with another value.) The broker leaves the project

bucket and all data intact in ECS but removes the bucket from SDP. DeleteWipes data from the bucket and deletes the bucket from ECS and SDP.

CAUTION: The Delete policy is dangerous for data safety. Consider using Fail, which

only deletes empty buckets.

FailThe broker attempts to delete the ECS bucket but the operation fails if the bucket contains data.

default reclaim policy

The default reclaim policy for all buckets is Detach. You may override that default in the configuration values file by adding the following setting: ecs-service-broker.DefaultReclaimPolicy: .

allowed reclaim policy

When users create projects using the command line, they can specify extra parameters, one of which is reclaim-policy. This reclaim-policy would override the reclaim policy for the bucket plan as defined (or defaulted) in the bucket plan configuration. The AllowedReclaimPolicies setting in ecs_service_broker configuration limits the reclaim policies that users are permitted to specify on the command line. For example, you can ensure that the Delete reclaim policy is not allowed for any project.

Some important points about configuring plans:

SDP comes preconfigured with a default plan. You may change the definition of the default plan by configuring a plan using the name default.

You may configure additional plans. The plan names that you configure appear in a drop-down menu on the UI when users create a project.

You cannot change plan definitions after installation. You cannot add or delete plan definitions after installation.

Use the following steps to configure ECS connections, the ECS broker, S3 bucket plans, and bucket reclaim policies.

Steps

1. Set the global.StorageType: value to ecs_s3.

global: StorageType: ecs_s3

2. Configure the pravega.ecs_s3: section.

Pravega connects to the preconfigured ECS namespace and bucket that you describe in this section. Copy the section from the template, or start with the following example:

pravega: ecs_s3: uri: https://192.0.5.1:9021/ bucket: pravega-tier2 namespace: "sdp-pravega" prefix: "/" accessKey: green secretKey: XXXX

Table 11. Configure pravega.ecs_s3

Name Description

uri The ECS Object API endpoint, in the following format :

Configuration Values 83

Table 11. Configure pravega.ecs_s3 (continued)

Name Description

Typical port values are 9020 for HTTP endpoints and 9021 for HTTPS endpoints.

bucket The bucket name that was previously provisioned on ECS for this SDP installation instance.

namespace The ECS namespace that was previously provisioned on ECS for this SDP installation instance.

prefix A prefix to use on the Pravega bucket name.

accessKey secretKey

The access key and secret that were previously provisioned on ECS for the namespace.

NOTE: Pravega uses these credentials to gain access. However, each project has its own unique bucket. Unique system-generated credentials protect those buckets.

3. Configure the ecs-service-broker: section.

The ECS Service Broker connects to ECS using the information configured in this section. This section also configures S3 bucket plans. Copy the ecs-service-broker: section from the template, or start with the following example.

NOTE: This example overrides the default bucket plan and defines two additional plans. Plan definitions are optional.

See the table for more information.

ecs-service-broker: namespace: mysdp prefix: green- replication-group: RG api: endpoint: "http://192.0.5.1:9020" ecsConnection: endpoint: "https://192.0.5.1:4443" username: mysdp-green password: ChangeMe # certificate required only for self-signed certs certificate: |- -----BEGIN CERTIFICATE----- MIIDCTCCAfGgAwIBAgIJAJ1g36y+tM0RMA0GCSqGSIb3DQEBCwUAMBQxEjAQBgNV BAMTCWxvY2FsaG9zdDAeFw0yMDAyMTkxOTMzMjVaFw0zMDAyMTYxOTMzMjVaMBQx EjAQBgNVBAMTCWxvY2FsaG9zdDCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoC ... -----END CERTIFICATE----- s3Plans: # Unique UUID is required - id: 7e777d49-0a78-4cf4-810a-b5f5173b019d name: default settings: quota: limit: 10 warn: 7 - id: 9e777d49-0a78-4cf4-810a-b5f5173b019d name: small settings: quota: limit: 5 warn: 4 allowed-reclaim-policies: Fail reclaim-policy: Fail - id: 10777d49-0a78-4cf4-810a-b5f5173b019d name: noQuota settings: allowed-reclaim-policies: Delete, Fail, Detach

84 Configuration Values

Table 12. Configure the ECS Service Broker

Name Description

namespace The namespace on ECS that was provisioned for SDP.

prefix Optional prefix to use when naming project buckets that the ecs-broker provisions.

The broker names project buckets using this format:

-

The is system-generated. NOTE: The broker inserts a dash between the and the

.

For example, if prefix is sdp/, the bucket name for project videoproject2 would look similar to:

sdp/ videoproject2-4a7f2baf-4226-4bfc-8b7a-57609ce450b6

replication-group The ECS Replication Group name that the namespace was assigned to during provisioning.

api:endpoint The ECS API endpoint.

ecsConnection.endpoint: The ECS Management endpoint. The port is typically 4443 for HTTPS.

ecsConnection.username Admin credentials into ECS. This user account must have permission to create buckets in the namespace. These credentials are converted into Kubernetes secrets before they are passed to the ECS Broker.ecsConnection.password:

ecsConnection.certificate Required if the ECS management endpoint requires a custom self-signed certificate. Obtain the certificate and add it here.

NOTE: For trusted CAs, no configuration is required.

One way to obtain a certificate is on a browser.

a. Enter the ECS management endpoint in the browser URL field, click the lock icon, click View Certificate, and then the Details tab.

b. Click Copy to File, Next, and then choose to export in Base-64 format. c. Copy the contents of the downloaded certification into this field. Ensure

to preserve all indents as they exist in the exported certificate.

s3Plans: Configures the S3 bucket plans that are available to users when they create a project.

This section is optional. Comment out the s3Plans: section if you want to use the default plan that comes with the product for all buckets. The default plan has no quota and only allows the DefaultReclaimPolicy.

To redefine the default plan, include a plan with the name default in this section.

s3Plans: - id:

Unique id for the plan.

The unique ID must stay constant. UUIDs work well for this purpose. For an easy way to generate a unique id, see https://www.uuidgenerator.net/. V1 or V4 is acceptable.

s3Plans: - id: name:

Name for the plan. This name appears in the Plan drop-down list on the UI when a user creates a project.

Configuration Values 85

Table 12. Configure the ECS Service Broker (continued)

Name Description

s3Plans: - id: settings.quota.limit:

Quotas are optional. Without a quota, there is no limit on the bucket size.

The quota information is sent to ECS and used by ECS to configure the bucket. ECS enforces the bucket quota.

This value sets the hard limit on the number of GB in the bucket. Specify the number of GB. For example, the value 5 sets a hard limit of 5 GB on each bucket that uses this plan.

s3Plans: - id: settings.quota.warn:

Optional. This value sets a soft limit on the number of GB in the bucket. When the bucket reaches this size, ECS starts generating warning messages in the logs.

For example, if warn is set to 4 and limit is set to 5, ECS generates warning messages when a bucket reaches 4 GB in size and enforces the hard limit at 5GB.

s3Plans: - id: settings: allowed-reclaim- policies:

Optional. Defines the reclaim policies that users are allowed to specify when they create a project on the command line.

The default is to allow users to specify any of the reclaim policies. A typical setting is to allow only Detach and Fail, disallowing the use of Delete. See the introduction to this task for more about reclaim policies.

s3Plans: - id: settings: reclaim-policy:

Optional. Sets the default reclaim policy for the plan. If not provided, the platform-wide default reclaim-policy applies to the plan. See the introduction to this task for more about the default reclaim-policy.

4. Configure the pravega-cluster section.

The installer passes these settings to the Pravega cluster. The settings tune the cluster appropriately for interaction with ECS long-term storage. Most pravega-cluster settings default to pretested values. Depending on your use case, you might want to add the pravega_options shown here.

pravega-cluster: pravega_options: writer.flushThresholdMillis: "60000" extendeds3.smallObjectSizeLimitForConcat: "104857600" writer.maxRolloverSizeBytes: "6442450944"

5. If the ECS object API endpoint requires a self-signed certificate, obtain the certificate and add it into the values file.

NOTE: This certificate goes into the global section of the configuration values file because several platform

components require it.

a. To export the certificate from a browser, enter the ECS object API endpoint in the browser, click the lock icon, click View Certificate, and then the Details tab.

b. Click Copy to File, Next, and then choose to export in Base-64 format. c. Copy the contents of the downloaded certification into the global.tlsCABundle.ecsObject: field. Ensure to

preserve all indents as they exist in the exported certificate. See Configure global platform settings on page 74 for an example.

Configure or remove connection to SupportAssist Most production deployments depend on SupportAssist for support from Dell Technologies and for telemetry collections. A dark site in production or deployments for testing purposes may not have SupportAssist.

This configuration is required. Do one of the following:

86 Configuration Values

Configure SupportAssist connection information, or Remove SupportAssist from the configuration

Both tasks are described below.

Configure SupportAssist

NOTE: There can be multiple (up to eight) gateways with varying priority.

Copy the following example:

sdp-serviceability: supportassist: gateways: - hostname: port: <9443> priority: <1>

Table 13. Configure SupportAssist Gateway

Name Description

hostname: The fully qualified domain name or IP address

port: The value must be 9443.

priority: Priority of the gateway (the lower the value, the higher the priority).

Disable SupportAssist remote access

Insert the global.external.darksite: true value into the configuration values file.

For example:

global: external: darksite: true

Configure remote support information Configure the login credentials for a Dell Technologies customer support representative to perform a remote login to your SDP cluster.

Copy the following example and add it to the configuration values file:

sdp-serviceability: service-pod: sshCred: user: "svcuser" group: "users" password: "ChangeMe"

Table 14. Configure SupportAssist remote login password

Name Description

password: Password that a Dell Technologies support representative can use to gain access to the cluster for troubleshooting purposes. For password value requirements, see Password policy for SDP user accounts on page 117.

Configure passwords for the default administrative accounts Decide how to assign passwords for the default administrative accounts. You can explicitly configure the passwords, or you can allow the system to generate passwords.

The two default administrative accounts are:

Configuration Values 87

Type User Name Description

Keycloak nautilus realm administrator

desdp This user is authorized to create analytic projects. This user has wildcard access to all projects and all their associated resources in SDP. Access to those resources is granted for both the Kubernetes cluster and the SDP UI for this user.

This user is Admin in the nautilus realm. The nautilus realm is where SDP users and service accounts exist.

This user is not permitted to log in to the Keycloak Administrator Console for the master realm.

Keycloak master realm administrator

admin This user is authorized to log in to the Keycloak Administration Console in either the master or nautilus realm, and create users in Keycloak.

Your options are:

You can allow the installer to autogenerate passwords. After installation, see Obtain default admin credentials on page 104 to retrieve the generated values.

You can provide the initial password values by adding the keycloak.secrets section into the configuration values file, as described here.

To provide specific password values, copy the keycloak: section from the template, or copy the following example:

keycloak: secrets: admin-creds: stringData: user: admin password: desdp-creds: stringData: user: desdp password:

Table 15. Provide initial password values

Name Description

admin-creds.stringData.password: ""

Add the password value for the admin user account. Enclose the value in double quotes.

desdp-creds.stringData.password: ""

Add the password value for the desdp user account. Enclose the value in double quotes.

For password policy rules, see Password policy for SDP user accounts on page 117.

88 Configuration Values

Install SDP Core

Topics:

Download installation files Install required infrastructure Unzip installation files Prepare the working environment Push images into the registry Run the prereqs.sh script Prepare self-signed SSL certificate Run pre-install script Run the validate-values script Install SDP Run the post-install script (Optional) Validate self-signed certificates Obtain connection URLs Install the GPU operator in OpenShift environments

Download installation files This procedure includes links for downloading the Red Hat CoreOS and the Dell Technologies SDP installation files.

Prerequisites

You need 16 GB free disk space to download these files. You need a valid Dell Technologies support account linked to your customer site. You need a Red Hat OpenShift account with valid credentials.

Steps

1. Go to https://www.dell.com/support/home/en-us/product-support/product/streaming-data-platform/drivers.

2. Log in with your Dell Technologies support account.

3. Navigate to 1.4 > 1.4 Core.

4. Download all files in the list.

5. Use the following links to download the required Red Hat Core OS files.

OpenShift installer and client download link:

https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.10.9/

OpenShift ISO download link:

https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.10/4.10.3/rhcos-4.10.3-x86_64-live.x86_64.iso

Install required infrastructure SDP Core requires Red Hat Enterprise Linux and Red Hat OpenShift (Core OS) installed on bare metal hardware.

Steps

1. Read an installation overview at https://docs.openshift.com/container-platform/4.10/architecture/architecture- installation.html.

11

Install SDP Core 89

2. Perform the installation using instructions at https://docs.openshift.com/container-platform/4.10/installing/ installing_bare_metal/installing-bare-metal.html.

3. Perform on openshift cluster.

a. Perform ingress patch for nautilus:

$ oc patch -n openshift-ingress-operator ingresscontrollers.operator.openshift.io default --type=merge -p '{"spec":{"namespaceSelector":{"matchExpressions":[{"key":"app.kubernetes.io/ managed-by","operator":"NotIn","values":["nautilus"]}]}}}'

b. Make sure that time service is installed on all nodes of OpenShift.

For details, see https://docs.openshift.com/container-platform/4.10/post_installation_configuration/machine- configuration-tasks.html#installation-special-config-chrony_post-install-machine-configuration-tasks.

c. Configure networking on OpenShift for external access.

For MetalLb configuration see, https://docs.openshift.com/container-platform/4.10/networking/metallb/metallb- operator-install.html.

d. Deploy OpenShift data foundation.

See https://access.redhat.com/documentation/en-us/openshift_container_platform/4.10/html/installing/index.

e. Make ocs-storagecluster-cephfs storage class as default.

$ oc patch storageclass ocs-storagecluster-cephfs -p '{"metadata": {"annotations": {"storageclass.kubernetes.io/is-default-class":"true"}}}'

Unzip installation files

Steps

1. Move all SDP files that were downloaded from the Dell Technologies Support site to one of the nodes.

This node will become the management node.

2. Unzip the decks-installer- .zip file into a separate folder.

3. Create a link to the executable and set permissions.

For example, on Linux::

$ link decks-install-linux-amd64 decks-install $ chmod +x decks-install

On Windows, create an alias.

4. Unzip the nautilus-baremetal-master file into a separate folder.

Prepare the working environment Your working environment requires some software tools and interfaces before you can begin the installation steps. The working environment is the command-line environment that you intend to use for the installation. It could be your laptop or workstation, one of the SDP nodes, or a management node.

Steps

1. Install a local Docker daemon and CLI.

See Docker documentation at https://docs.docker.com/install/ to get the correct daemon for your local operating system.

2. Install a modern desktop browser such as Chrome, Firefox, or Edge for accessing the SDP UI.

3. Install PuTTY or other software that provides a way to connect to the intended SDP host machine and establish a Linux shell.

90 Install SDP Core

Push images into the registry This procedure uploads the installation images into the docker registry.

About this task

The last step in this procedure (uploading images to the registry) can take some time (up to an hour) to complete.

Steps

1. Create a repository in Docker to hold SDP images.

2. Navigate to the location where you extracted and created a link to the SDP installer tool.

3. Configure the installer to use the Docker registry and the repository you just created:

$ ./decks-install config set registry

where identifies the registry server and the repository you created for SDP images.

4. Verify the configured registry path:

$ ./decks-install config list

5. NOTE:

There are multiple tar files provided for SDP 1.4.

The tar files has the naming schema as sdp--1.4.0.0- - - images.tar.gz

Push required images to the registry.

$ ./decks-install push --input

where:

is the path to the decks-images- .tar file included in the original set of installation files. This is a separate file, not part of the zip file that was extracted previously.

NOTE: Do not extract the contents from decks-images- .tar. The installer tool works directly with

the .tar file.

This push operation may take up to an hour to complete.

Run the prereqs.sh script The prereqs.sh script ensures that your local environment and the Kubernetes cluster have all the tools that are required for a successful installation.

About this task

The decks-install apply command runs this script automatically. Regardless, Dell Technologies recommends that you run this script before running the decks-install apply command the first time or the first time on a new local machine. You may run the script at any time.

The script does the following types of checks:

It checks your local environment for the required tools and required minimum versions of those tools. For some tools, the script attempts to install the missing software. The script checks the SDP cluster for a default storage class definition. It generates messages describing what is missing.

Steps

1. Go to the folder where you extracted the decks-installer- .zip file.

Install SDP Core 91

2. Run the script using either of the following commands: This command saves the output to a log for later use:

./scripts/prereqs.sh 2&>1 prereqs.log This command prints the output on the command line:

$ ./scripts/prereqs.sh

3. Check the script output.

If the output contains errors about incorrect minimum versions of components or missing software, you must correct the condition before proceeding with SDP installation.

Prepare self-signed SSL certificate This procedure describes how to create SSL self-signed certificates for TLS connections and add them to SDP.

About this task

This procedure generates the key and self-signed certificate, adds them into the configuration values file, and loads them into your SDP registry.

Steps

1. Generate a private key.

Here is an example using openssl.

openssl genrsa -out tls.key 2048

2. Create a certificate.

The following example uses openssl to create a certificate:

openssl req -nodes -new -x509 -keyout tls.key -subj "/CN= " -out tls.crt

3. In the configuration values file, configure the relevant fields as shown in the example below.

Add the key information from above into the tls.key entry.

Add the certificate information from above into the tls.crt entry.

global: external: tls: true certManagerIssuerName: selfsigned-ca-issuer cert-manager-resources: clusterIssuer: name: selfsigned-ca-issuer certManagerSecrets: - name: tls.crt value: | -----BEGIN CERTIFICATE----- -----END CERTIFICATE----- - name: tls.key value: | -----BEGIN PRIVATE KEY----- -----END PRIVATE KEY----- cert-manager: webhook: enabled: false ingressShim: defaultIssuerName: selfsigned-ca-issuer defaultIssuerKind: ClusterIssuer

92 Install SDP Core

extraArgs: ['--dns01-recursive-nameservers-only=true','--feature- gates=CertificateRequestControllers=true']

4. Copy the crt file to a certificates directory and rename the file to ca.crt.

NOTE: The filename must be ca.crt. Otherwise, the image push command does not work.

$ cp tls.crt mycerts/ca.crt

5. Push the ca.crt to the truststore in the SDP images in the Docker registry.

./decks-install push --input ../decks-images-xxxx.tar --ca-certs-dir ../mycerts/

Run pre-install script The pre-install.sh script must be run one time before installing SDP.

About this task

This script creates credentials required for the internal communication of SDP components. It creates a values.yaml file containing these credentials. This yaml file is a required input to every execution of the decks-install apply command. The generated yaml file must be listed as one of the values files in the --values parameter of decks-install apply.

Steps

1. Navigate to the folder where you unzipped the decks-installer- .zip file.

2. Run the pre-install.sh script.

$ ./scripts/pre-install.sh

3. Verify that the script ran successfully and that the values.yaml file exists.

The output shows the pathname of the generated values.yaml file. It exists in a directory called pre-install. For example, scripts/pre-install/values.yaml.

The output also shows a passwd file which you may safely delete.

The output initially shows results from Pravega indicating user names and passwords that will look unfamiliar. You can ignore this output. This script is, in essence, replacing hardcoded Pravega credentials with secure credentials.

4. Consider renaming the generated values.yaml file and moving it to the same location where all the other configuration values files are stored.

For example, rename values.yaml to preinstall.yaml.

Results

The generated yaml file must be listed as one of the values files in the --values parameter of decks-install apply, along with all of your other configuration values files.

Run the validate-values script The validate-values.py script reads the configuration values files provided to it and validates the values over certain criteria. The script validates the values used for external connectivity and serviceability, in addition to many other validations.

About this task

The decks-install apply command runs this script automatically. We recommend that you run it independently before running the decks-install apply command, so you have an opportunity to resolve any errors and correct any warnings prior to installation. Warnings found by the installer will not stop the installer from continuing. You should review the output of the validate-values script at least once prior to running the installation.

Install SDP Core 93

You may run this script at any time.

Steps

1. Navigate to the folder where you unzipped the decks-installer- .zip file.

2. Run the script, providing file path names for all of the configuration values files that you plan to use in the actual installation command.

For example:

$ ./scripts/validate-values.py preinstall.yaml,values.yaml

Note the following: Separate the file path names with commas, no spaces. The yaml file generated by the pre-install script is required. The files are processed in the order listed. When the same field is included in more than one of the values files, the value

in the later file overrides the value in any earlier files in the list.

3. If the script indicates errors or warnings in your configuration values files, edit the files to correct the problems and rerun the script.

Install SDP Install SDP into the OpenShift environment.

Prerequisites

This procedure assumes that you prepared configuration values as described in Configuration Values on page 72. One or more values.yaml files are required by the SDP installer.

Steps

1. Save your customer-specific permanent SDP license file in the ~/desdp/ directory. The file name must be license.xml.

cp ~/desdp/license.xml

See Obtain and save the license file on page 69 for information about obtaining your SDP license file.

2. Run the SDP installation command.

cd ~/desdp ./decks-install-linux-amd64 apply --kustomize ./manifests/ --repo ./charts/ \ --values

Where the <list of values files> includes:

The pre-install.yaml file that was generated by the pre-install script during initial installation.

Other values files that you have prepared according to instructions in Configuration Values on page 72. Separate the values file path names with commas and no spaces.

The files are processed in the order listed. When the same field is included in more than one of the values files, the value in the later file overrides the value in any earlier files in the list.

If license content is not provided in the values.yaml file, you can run the below option to provide license.xml in command line.:

./decks-install-linux-amd64 apply --kustomize ./manifests/ --repo ./charts/ --values pre-install.yaml,values.yaml --set-file sdp-serviceability.dellemc- streamingdata-license.licensefile=license.xml

The Apply Update screen appears, and continuously redisplays to show progress. The installation takes about 10 to 30 minutes to complete.

When the command is finished, the Apply Update screen stops refreshes and shows final state for all components.

94 Install SDP Core

Run the post-install script Run this script after you run the decks-install apply command.

About this task

This script confirms that your latest run of the decks-install apply command left the cluster in a healthy state. This script invokes the health check script.

Steps

1. Wait for all pods to report a status of Completed or Succeeded.

It may take some time (up to 10 minutes) for the installation and synchronization of components to complete. If you proceed before the system settles into a stable state, the post-install script is likely to generate false errors. False errors disappear if you wait for the system to synchronize.

2. Go to the folder where you extracted the decks-installer- .zip file.

3. Run the script.

$ ./scripts/post-install.sh

4. If the script indicates errors in your cluster, fix the issues, rerun decks-install apply, and then rerun this script.

(Optional) Validate self-signed certificates Use this procedure to validate that self-signed certificates are in correctly installed and ready to handle connection requests.

Steps

1. Validate that the certificates are ready.

$ kubectl get certificate -A NAMESPACE NAME READY SECRET AGE nautilus-pravega nautilus-pravega-grafana-tls True nautilus-pravega-grafana- tls 46h nautilus-pravega pravega-controller-api-tls True pravega-controller-api- tls 46h nautilus-pravega pravega-controller-tls True pravega-controller-tls 46h nautilus-pravega pravega-native-tls-certificate True pravega-tls 46h nautilus-pravega selfsigned-cert True selfsigned-cert-tls 46h nautilus-system keycloak-tls True keycloak-tls 46h nautilus-system nautilus-ui-tls True nautilus-ui-tls 46h

2. Get the certificate from the configuration values file.

$ kubectl get secret -n nautilus-system cert-manager-secrets -o jsonpath="{.data.tls\.crt}" | base64 -d -----BEGIN CERTIFICATE----- MIIDKTCCAhGgAwIBAgIUTBlCLINSVvL0zFUzngveXeKL2scwDQYJKoZIhvcNAQEL BQAwJDEiMCAGA1UEAwwZc2FicmV0b290aC5zYW1hZGFtcy5sb2NhbDAeFw0yMDA1 ... -----END CERTIFICATE-----

3. Check that you can connect to Keycloak.

Install SDP Core 95

a. Get the keycloak endpoint for use in other commands.

$ kubectl get ingress -n nautilus-system keycloak NAME HOSTS ADDRESS PORTS AGE keycloak keycloak.mycluster.com 10.243.42.132 80, 443 45h

b. Connect to Keycloak.

$ openssl s_client -showcerts -servername keycloak.mycluster.com -connect 192.2.0.7:443

c. Get the certificate from Keycloak.

$ kubectl get secret -n nautilus-system keycloak-tls -o jsonpath="{.data.tls\.crt}" | base64 -d -----BEGIN CERTIFICATE----- MIIDfzCCAmegAwIBAgIRAJxV4jFmB9HXULTTwrbwQZgwDQYJKoZIhvcNAQELBQAw JDEiMCAGA1UEAwwZc2FicmV0b290aC5zYW1hZGFtcy5sb2NhbDAeFw0yMDA1MTkx OTU3MDlaFw0yMDA4MTcxOTU3MDlaMFIxFTATBgNVBAoTDGNlcnQtbWFuYWdlcjE5 MDcGA1UEAxMwa2V5Y2xvYWsuc2FicmV0b290aC5zYW1hZGFtcy5zZHAuaG9wLmxh Yi5lbWMuY29tMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAqthOws62 6rkRO6P1MMB9SCZgJGU3o0wtW/I/HpPlIbqrRkJsbTBG4a0MjpiJmFlUOM5neDyN qzcWPy2r9alYX7SS1cv3oBufHTRTpTtZVJ4RXQvBPtfo9+x0VgxrFkwhhhia0hgw ZLHSQXhrxBh2fD5vTmYL9y0E28mm9Rt1dnhawa07Vr0ajdQLJ0stFi8Q0C4I3x7B GlYOzBL8u4XzvHzGERXLdbO/RLrRPQ24WpYNtrfsrKtC4Zz3nhSMVPdq7rWwJ7OL mXGF0bufsSrXdg0jhM+ns0MvUPf25irG/imgqbWa5uswW+6/3nTUejngZq9UbIwq Dz5riHdU9oIxRwIDAQABo34wfDAOBgNVHQ8BAf8EBAMCBaAwDAYDVR0TAQH/BAIw ADAfBgNVHSMEGDAWgBSkuPFCdT341Xhl6GU+WaGQAH4ZhzA7BgNVHREENDAygjBr ZXljbG9hay5zYWJyZXRvb3RoLnNhbWFkYW1zLnNkcC5ob3AubGFiLmVtYy5jb20w DQYJKoZIhvcNAQELBQADggEBADhdDefyjQJgqhXRAG3ITlWgHv0JCIWUdNkatKun unrSoJPwqRddCZeTr2o8BoMnvMZwNEoYqKVV/lYIhKnZKjqRwbOqcupTCP27Ejby U3DaiRa0aGWHp6tm9XWdDeZ0lzzbIURE27+GFFkd0m+0+iq1NLFUQsziZN72prIr zF1ygzb4cGVOglTh0Ma8nWO0VW4/opCks1fLLELFpoLPPPeHv8NpxGqGY2uj07KY ptV8OuaI3PIp7ELjWHZ7OZm/WuhkPK0YGvIWERtgHZLk7kkafXZH7ZtOabmtKroK OfYGOidSIzcFlKfgkySsa1f2PJjFFw5I7J7O/Iu9zhFcjao= -----END CERTIFICATE-----

d. Check if the certificate that is returned in the secret is the same as the certificate that Keycloak returns.

4. Update the root certificates in your browser.

Obtain connection URLs Cluster administrators can obtain the connection URLs using kubectl.

Steps

1. Log in as described in Log in to OpenShift for cluster-admins on page 114.

2. List access points into the cluster. Run kubectl get ingress --all-namespaces to list all access points into the cluster.

For example:

kubectl get ingress --all-namespaces NAMESPACE NAME HOSTS ADDRESS PORTS AGE my-project my-flinkcluster my-flinkcluster.my-project.test-psk.abc- lab.com 192.0.2.8... 80, 443 6d my-project repo repo.my-project.test-psk.abc-lab.com 192.0.2.8... 80, 443 6d nautilus-pravega nautilus-pravega-grafana grafana.test-psk.abc-lab.com 192.0.2.8... 80, 443 8d nautilus-pravega pravega-controller pravega-controller.test-psk.abc-lab.com 192.0.2.8... 80 8d nautilus-pravega pravega-controller-api pravega-controller-api.test-psk.abc- lab.com 192.0.2.8... 80 8d

96 Install SDP Core

nautilus-system keycloak keycloak.test-psk.abc-lab.com 192.0.2.8... 80, 443 8d nautilus-system nautilus-ui test-psk.abc-lab.com 192.0.2.8... 80, 443 8d cluster-monitoring cluster-monitoring-grafana monitoring.test-psk.abc- lab.com 190.0.2.8... 80, 443 8d

All the values in the HOSTS column are valid access points for authorized users.

In the NAME column, locate nautilus-ui, and take note of the value in the HOSTS column. Values in the HOSTS column are URLs for external connections to the User Interface, and is the value to use in the configuration values file.

For example, from the list above, users can connect to the UI from external locations with the following URL:

https://test-psk.abc-lab.com

Install the GPU operator in OpenShift environments If any applications that run on SDP Core use GPU functionality, you must install the GPU operator.

The NVIDIA GPU Operator updates the SDP node base operating system and the Kubernetes environment with the appropriate drivers and configurations for GPU access. It automatically configures nodes that contain GPUs and validates the installation.

In summary, the GPU Operator performs the following tasks in SDP:

1. Deploys Node Feature Discovery (NDF) to identify nodes that contain GPUs 2. Installs the GPU Driver on GPU-enabled nodes 3. Installs the nvidia-docker runtime on GPU-enabled nodes

4. Installs the NVIDIA Device Plugin onto GPU-enabled nodes 5. Launches a validation pod to ensure that installation is successful.

For information about the NVIDIA GPU Operator, see https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/getting- started.html.

Supported environment

SDP supports OpenShift on the following operating system:

Red Hat Enterprise Linux CoreOS 4.10.3

The required environment for installing the GPU Operator with OpenShift OperatorHub is:

OpenShift: 4.10.9 Kubernetes Version: v1.19 to v1.23 Operating System: Red Hat Enterprise Linux Version: 8.4 Container Runtime: Docker

Red Hat Entitlement

About this task

NOTE: Red Hat Entitlement is not required for OpenShift 4.10.

Steps

To install GPU Operator on OpenShift, see https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/openshift/install- gpu-ocp.html#install-nvidiagpu.

Install SDP Core 97

GPU Operator Installation

Steps

1. Connect to the OpenShift web console UI at https://console-openshift-console.apps. . {basedomain}.

2. Install the Node Feature Discovery operator.

a. On the Administrator menu, click operators > OperatorHub. b. Under All Items, search for NFD. c. Click the Node Feature Discovery tile. d. In the right pane, click Install.

3. Create an instance of NFD.

a. On the Administrator menu, click Operators > Installed Operators > Node Feature Discovery . b. Click the Node Feature Discovery tab, and then click the Create NodeFeatureDiscovery button on the right.

c. On the Create NodeFeatureDiscovery page, click Create.

4. Verify that the NFD operator and daemon set pods are running. You can use the UI or the CLI. On the Administrator menu, click Workloads > Pods. On the OpenShift CLI, enter oc get pods -n openshift-operators.

oc get pods -n openshift-operators NAME READY STATUS RESTARTS AGE nfd-master-49vr8 1/1 Running 0 57s nfd-master-hmk67 1/1 Running 0 57s nfd-master-jd7r4 1/1 Running 0 57s nfd-operator-5748ddd66c-qfvgj 1/1 Running 0 9m48s nfd-worker-k4744 1/1 Running 0 58s nfd-worker-nplkd 1/1 Running 0 58s nfd-worker-z56j8 1/1 Running 0 58s

5. Create a namespace for the GPU resources.

a. On the Administrator menu, click Home > Projects.

98 Install SDP Core

b. In the right pane, click Create Project. c. Use the project name gpu-operator-resources.

6. Install the NVIDIA GPU Operator.

a. On the Administrator menu, click operators > OperatorHub. b. Under All Items, search for nvidia.

c. Click the NVIDIA GPU tile. d. In the right pane, click Install.

7. Create the cluster policy for the NVIDIA GPU Operator.

a. Click Operators > Installed Operators > > NVIDIA GPU Operator. b. Click the ClusterPolicy tab, and then click Create ClusterPolicy. c. On the Create ClusterPolicy page, click Create.

8. Verify that the GPU Operator pods are running or completed. On the Administrator menu, click Workloads > Pods. On the OpenShift CLI, enter oc get pods -n gpu-operator-resources.

oc get pods -n gpu-operator-resources NAME READY STATUS RESTARTS AGE gpu-feature-discovery-bsxr4 1/1 Running 0 9m10s gpu-feature-discovery-cpxqp 1/1 Running 0 9m10s gpu-feature-discovery-plkdc 1/1 Running 0 9m10s nvidia-container-toolkit-daemonset-5jt29 1/1 Running 0 9m10s nvidia-container-toolkit-daemonset-7ddfl 1/1 Running 0 9m10s nvidia-container-toolkit-daemonset-md2qv 1/1 Running 0 9m10s nvidia-cuda-validator-26tgh 0/1 Completed 0 5m35s nvidia-cuda-validator-79m8f 0/1 Completed 0 5m35s nvidia-cuda-validator-q6w5l 0/1 Completed 0 5m37s nvidia-dcgm-exporter-24g9z 1/1 Running 0 9m10s nvidia-dcgm-exporter-knp2s 1/1 Running 0 9m10s nvidia-dcgm-exporter-r86g9 1/1 Running 0 9m11s nvidia-device-plugin-daemonset-9z756 1/1 Running 0 9m11s nvidia-device-plugin-daemonset-m7c59 1/1 Running 0 9m11s nvidia-device-plugin-daemonset-xzpgr 1/1 Running 0 9m11s nvidia-device-plugin-validator-dxmqj 0/1 Completed 0 5m19s nvidia-device-plugin-validator-fp478 0/1 Completed 0 5m12s nvidia-device-plugin-validator-tf8gl 0/1 Completed 0 5m19s nvidia-driver-daemonset-2smjs 1/1 Running 0 9m11s nvidia-driver-daemonset-kb76q 1/1 Running 0 9m11s nvidia-driver-daemonset-xrwlz 1/1 Running 0 9m11s nvidia-operator-validator-4djbd 1/1 Running 0 9m11s nvidia-operator-validator-5qbsn 1/1 Running 0 9m11s nvidia-operator-validator-sr6g4 1/1 Running 0 9m11s

Verify GPU Operator installation

These steps verify the GPU Operator installation and test correct functioning with a workload.

Steps

1. Run the following command:

kubectl get pods -A

Here is example output:

NAMESPACE NAME READY STATUS RESTARTS AGE default gpu-operator-1616579493-node-feature-discovery- master-74dc7krj6 1/1 Running 0 2m18s default gpu-operator-1616579493-node-feature-discovery-worker-mb7wk 1/1 Running 0 2m18s default gpu-operator-1616579493-node-feature-discovery-worker-qtn78 1/1 Running 0 2m18s default gpu-operator-1616579493-node-feature-discovery-worker-sddj4 1/1 Running 0 2m18s

Install SDP Core 99

default gpu-operator-74c595fc57-6bnnn 1/1 Running 0 2m18s gpu-operator-resources gpu-feature-discovery-b2895 1/1 Running 0 2m7s gpu-operator-resources gpu-feature-discovery-bcnhs 1/1 Running 0 2m7s gpu-operator-resources gpu-feature-discovery-hj762 1/1 Running 0 2m7s gpu-operator-resources nvidia-container-toolkit-daemonset-8pfsk 1/1 Running 0 2m7s gpu-operator-resources nvidia-container-toolkit-daemonset-shg88 1/1 Running 0 2m7s gpu-operator-resources nvidia-container-toolkit-daemonset-xpf6f 1/1 Running 0 2m7s gpu-operator-resources nvidia-dcgm-exporter-8hpgz 1/1 Running 0 2m7s gpu-operator-resources nvidia-dcgm-exporter-9kkh2 1/1 Running 0 2m7s gpu-operator-resources nvidia-dcgm-exporter-zzd5q 1/1 Running 0 2m7s gpu-operator-resources nvidia-device-plugin-daemonset-7qwfj 1/1 Running 0 2m7s gpu-operator-resources nvidia-device-plugin-daemonset-qggkm 1/1 Running 0 2m7s gpu-operator-resources nvidia-device-plugin-daemonset-tznzv 1/1 Running 0 2m7s gpu-operator-resources nvidia-device-plugin-validation 0/1 Completed 0 2m7s gpu-operator-resources nvidia-driver-daemonset-5n2g4 1/1 Running 0 2m7s gpu-operator-resources nvidia-driver-daemonset-gswhw 1/1 Running 0 2m7s gpu-operator-resources nvidia-driver-daemonset-jg6fd 1/1 Running 0 2m7s

2. In the output, ensure that all the containers are in a Running status.

3. In the output, ensure that the correct pods exist.

On each Kubernetes node, there should be one pod for each of the following services: In the default namespace, there should be several gpu-operator services.

In the gpu-operator-resources namespace, there should be one pod per node for each of the following services. The example is showing output for a 3-node SDP Kubespray cluster, so there are three replicas for each service. gpu-feature-discovery serivce nvidia-container-toolkit-daemonset nvidia-dcgm-exporter nvidia-device-plugin-daemonset nvidia-driver-daemonset gpu-operator-resource

4. Verify a GPU workload.

a. Examine the pod log for the nvidia-device-plugin-validation pod.

The service runs a vecadd example for the validation, and a successful installation results in the following logged entry.

device-plugin-validation device-plugin validation is successful

5. Optionally, you can validate by manually running the following Kubernetes pod specs.

cat << EOF | kubectl create -f - apiVersion: v1 kind: Pod metadata: name: vectoradd spec: restartPolicy: OnFailure containers: - name: vectoradd image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.2.1 resources:

100 Install SDP Core

limits: nvidia.com/gpu: 1 EOF

The pod should complete without error and log the following:

[Vector addition of 50000 elements] Copy input data from the host memory to the CUDA device CUDA kernel launch with 196 blocks of 256 threads Copy output data from the CUDA device to the host memory Test PASSED Done

Uninstall the GPU Operator

You can uninstall the GPU Operator if it is not needed in your environment.

Steps

1. Delete the GPU operator helm chart.

helm delete gpu-operator -n default

2. Delete clusterpolicy crd.

kubectl delete crd clusterpolicies.nvidia.com

Fore details, see https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/openshift/clean-up.html.

Install SDP Core 101

Manage SDP

Topics:

Top-level navigation in the UI Post-install Configuration and Maintenance Manage Connections and Users Manage Projects Manage runtime images Monitor Health Use Pravega Grafana Dashboards Expand and Scale the Infrastructure Troubleshooting

Top-level navigation in the UI The SDP UI banner contains icons for browsing the major portions of the interface.

Table 16. Summary of top-level navigation in the UI

Icon Description

Overview For administrators only.

Analytics The Analytics icon shows the Project page. This page lists the projects that your user account has permission to access. From this page, you can: View summary information about your analytic projects. If you are an administrator, create, edit, or delete projects. Go to a project page. From there, you can:

Create Spark, Flink, and Pravega Search clusters. If the project has Metrics as a Feature, you can go to the Grafana dashboards associated

with the project. If the project has Artifact Repository as a Feature, you can upload files and Maven

Artifacts to the repository. If the project has Video Server as a Feature, you can create, edit, delete, start, or stop

Camera Recorder Pipelines, Gstreamer Pipelines, play video streams. If the project has Pravega MQTT Broker as a feature, you can connect MQTT clients to

publish events to Pravega streams. If the project has JupyterHub as a feature, you can have a web-based interactive

development environment for Jupyter notebooks. The Zookeeper Project Feature is required for the project to support Flink. The Pravega Ingest Gateway allows you to ingest events using a rest interface instead of

GRPC.

See Manage projects on page 118.

PravegaSystem The Pravega icon shows the Scopes page. This page lists the scopes that your login account has permission to access. From this page, you can: View summary information about scopes. Drill into a scope page, and from there, into streams. Create streams. Create a schema group for a stream. If you are an administrator, manage Cross Project Scope Sharing.

See Manage scopes and streams on page 124.

III

102 Manage SDP

Table 16. Summary of top-level navigation in the UI (continued)

Icon Description

The System icon shows the following tabs: ComponentsThis tab lists all software components in the platform, their state,

namespace, K8s resources, and version numbers. NodeThis tab shows information about each node in the SDP cluster. You can view

information about nodes, CPUs, memory, ephemeral storage, GPUs, and the CUDA Driver that resides on the node.

StorageThis tab shows information about configured storage for Bookkeeper, NFS, and Pravega long-term storage.

NOTE: The Storage tab is not accessible from the Storage page. To access the Storage tab, go to https:// /system/storage.

RuntimesThis tab lets you manage runtime images for Flink, GStreamer, and Spark. Image names and version numbers are visible. Action buttons let you add runtimes, edit existing runtime attributes, or delete runtimes. Runtimes can have environment variables and properties, and you can edit these values. See Manage runtime images on page 129.

Issues This tab shows the current state of issues that are present in the system. EventsThis tab lists events. The page includes filtering and search features. See Monitor

and manage events on page 134.

Settings The Settings icon shows the following tabs: LicenseShows information and status for Streaming Platform Cores. SupportAssistShows information about ESE connection details. Allows you to create,

edit, or delete SupportAssist deployment and enable, or disable RemoteAccess. Pravega ClusterShows Pravega segmentstore, bookkeeper, controller, and zookeeper

version and status.

Pravega Metrics Available to SDP admins from any page in SDP UI at the navigation bar.

Click the Pravega Metrics link.

This link opens a new browser tab showing the Grafana UI. The graphs in the predefined Grafana Dashboards provide more detail, for longer time spans, than the summary dashboard on the SDP Dashboard. The predefined Grafana Dashboards shows metrics about platform activity, storage, and alerts. See Use Pravega Grafana Dashboards on page 139.

Monitoring Metrics Available to SDP admins from any page in SDP UI at the navigation bar.

Click the Monitoring Metrics link.

This link opens a new browser tab showing the Grafana UI. The graphs provide health details about the system, K8s nodes, GPUs, and projects that have metrics feature enabled.

User The User icon shows a drop-down menu with these options: The username that you used to log in to the current session. (This item is not clickable.) Edit AccountOpens the Keycloak UI (for testing situations only). See Change password

in Keycloak on page 116.

NOTE: When LDAP federation is configured, users manage their accounts in LDAP.

Product SupportOpens the SDP Documentation Hub. From there, you can access: The product documentation The Product Support page, where you can download product code and get help. The SDP Code Hub, where you can download sample applications, code templates,

Pravega connectors, and view demos. Logout.

Manage SDP 103

Post-install Configuration and Maintenance

Topics:

Obtain default admin credentials Configure an LDAP identity provider Verify telemetry cron job Update the default password for ESE remote access Ensure system availability when a node is down Update the applied configuration Graceful shutdown and startup Uninstall applications Reinstall into existing cluster Change ECS credentials after installation

Obtain default admin credentials

About this task

The installation process creates two default administrator accounts.

Type User Name Description

Keycloak nautilus realm administrator

desdp This user is authorized to create analytic projects. This user has wildcard access to all scopes and projects and all their associated resources in SDP.

This user is Admin in the nautilus realm. The nautilus realm is where SDP users and service accounts exist.

This user is not permitted to log in to the Keycloak Administrator Console for the master realm.

Keycloak master realm administrator

admin This user is authorized to log in to the Keycloak Administration Console in either the master or nautilus realm, and create users in Keycloak.

Passwords for these accounts are either defined in the configuration values file or are generated by the installer, as follows:

If password values are specified in the configuration values file, the installer uses those values. You can skip steps 2 and 3 in the procedure below. The remaining steps are still important.

If password values are not specified in the configuration values file, the installer automatically generates passwords and inserts those values into secrets. The following steps describe how to obtain the secrets and extract the passwords.

Steps

1. Obtain the autogenerated password for the desdp user in the nautilus realm:

kubectl get secret keycloak-desdp-creds -n nautilus-system -o jsonpath='{.data.password}' | base64 --decode

2. Obtain the autogenerated password for the admin user in the master realm:

kubectl get secret keycloak-admin-creds -n nautilus-system -o jsonpath='{.data.password}' | base64 --decode

12

104 Post-install Configuration and Maintenance

3. Verify that you can log in to both the Keycloak Administrator Console and the SDP UI.

See Obtain connection URLs on page 96.

4. As a security precaution, discard the two secrets that contain the passwords. Do this only after you have verified that you can log in to both Keycloak realms.

The two K8 secrets that contain the admin and desdp user passwords are only created once at install time. Any modifications of the user accounts (such as changing their passwords, deleting them, or renaming them) and product upgrades do not update these secrets. They are only used as an initial means to retrieve the passwords for bootstrapping purposes.

5. (Optional) Change the passwords.

You may change the default passwords later in Keycloak. See Change password in Keycloak on page 116. You can use that change procedure for passwords that were system-generated or explicitly provided.

Configure an LDAP identity provider Federation with an external LDAP identity provider is supported.

For information about federation options and how to configure SDP and Kubernetes to integrate with an external LDAP identify provider, see the Dell Technologies Streaming Data Platform Security Configuration Guide .

Verify telemetry cron job When telemetry is enabled, the SDP cluster contains a cronjob that runs every 12 hours and uploads configuration information to Dell support ID SupportAssist configured.

Steps

To verify the cronjob schedul, run the following command:

$ kubectl get cronjob -n nautilus-system NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE monitoring-gathering * * * * * False 0 31s 34h monitoring-sending 0 * * * * False 0 21m 34h supportassist-streamingdata-capacity 0 * * * * False 0 21m 31h supportassist-streamingdata-health */10 * * * * False 0 91s 31h supportassist-streamingdata-insideiq 57 23 * * * False 0 15h 31h supportassist-streamingdata-inventory 0 * * * * False 0 21m 31h supportassist-streamingdata-outgoing */30 * * * * False 0 21m 31h supportassist-streamingdata-performance */10 * * * * False 0 91s 31h

Update the default password for ESE remote access

Steps

Set the credentials for remote access by adding the following values to the installer:

sdp-serviceability: service-pod: sshCred: user: svcuser

Post-install Configuration and Maintenance 105

group: users password: ChangeMe

NOTE:

To get the external IP for remote access, run:

kubectl get svc -n nautilus-system streamingdata-remote-service

To get the current credentials, run:

kubectl get secret -n nautilus-system service-pod-service-pod-secrets -o go- template='{{ index .data "credentials.conf" | base64decode }}'

Use the above credentials and the assigned external IP to access the service-pod via SSH:

ssh @

Ensure system availability when a node is down An alert with an id of KCLUSTER-005 or KCLUSTER-006 requires immediate attention to ensure system availability.

About this task

The following procedure ensures that pods that were running on a node that goes down are scheduled to run on another available node.

Steps

1. Run the following command and check state to see if any node is down.

kubectl get nodes

2. If a node is down, attempt to restart it immediately.

3. If a node remains down and restarting does not bring it online immediately, check to see if any pods are stuck in a Terminating state.

kubectl get po -A -o wide | grep Terminating

NOTE: Pods do not move into Terminating state until 330 seconds after the node goes down.

4. Evict the pods stuck in Terminating state.

kubectl drain --delete-local-data --ignore-daemonsets --force

5. Force delete the pods in Terminating state.

NOTE: Do this step only for pods in the following namespaces: nautilus-system, catalog, nautilus-pravega,

and customer created project namespaces.

kubectl delete po -n --grace-period=0 --force

Deleted pods are automatically scheduled to start up on available nodes.

6. Verify that the pods are running and that they are bound with persistent volumes.

kubectl get po -n

106 Post-install Configuration and Maintenance

Update the applied configuration You can change some configuration values after installation by changing and reapplying the values files. This configuration change process is called an update.

Prerequisites

Consult with Dell Technologies support personnel about the values that you want to change. Some values cannot or should not be changed using these steps.

You must have cluster-admin role on the SDP Kubernetes cluster.

Typically, you start with the configuration values files that were used for the last configuration and edit those files with the changes you want to make. Values are never carried over from the existing configuration.

Include the gen-values-1.4.yaml file in the list of values files in the commands. The pre-install.sh script, which you ran before installing SDP, generates the gen-values-1.4.yaml file.

About this task

To change the current configuration, run the decks-install apply command to reapply edited configuration values to the cluster. Every time that you run the decks-install apply command, the entire configuration is reconstructed using:

Override values that you supply in the configuration values files, and Default values in the installer

If a value is not supplied in the configuration values files, the default values are used.

NOTE: The edited configuration files become your new configuration. Use the edited files as the baseline for future

configuration changes.

NOTE: If the original installation used multiple values files, be sure to specify all the files in the reapply procedure.

While it is possible to use kubectl or other Kubernetes tools to update the resources running on the cluster, that process is not recommended. When you use tools outside of the installer and its values file, you have no record of the current configuration. The next decks-install apply command overrides whatever changes you made using other tools.

Kubernetes handles the reconfiguration. The administrator does not have to manually stop or restart anything. The changed configuration is applied across the cluster as fast as the Kubernetes reconcile loop can apply it. The results may take some time to complete.

Depending on which values you change for which components, some services may be restarted for reconfiguration. As a result, short interruptions in service may occur. For example, if a configuration change causes some Pravega components to restart, Pravega stream ingestion could stop processing while the reconfiguration occurs.

Use the following procedure to change the configuration.

Steps

1. Prepare the configuration files, remembering that the new configuration is entirely reconstructed from the values files that you provide.

2. If the script indicates errors in your configuration values file, edit the files to correct the problems and rerun the script.

3. Run the validate-values.py script.

a. Go to the folder where you extracted the decks-installer- .zip file.

b. Run the following command:

$ ./scripts/validate-values.py <list of values files>

Where the <list of values files> includes:

Values files that you prepared according to instructions in Configuration Values on page 72. Ensure to include all the values files that you plan to use in the decks-install apply command, in the same

order. The gen-values-1.4.yaml file that was generated by the pre-install script before initial installation. This file

contains nondefault credentials for Pravega components. It is required and must be the last values file in the list.

Separate the values file path names with commas and no spaces. When the same field occurs in more than one of the values files, the value in the right-most file overrides the values in left-most files.

Post-install Configuration and Maintenance 107

For example:

$ ./scripts/validate-values.py values.yaml,gen-values-1.4.yaml

c. If the script indicates errors in your configuration values file, edit the files to correct the problems and rerun the script.

4. Log in to the cluster. See Log in to OpenShift for cluster-admins on page 114.

5. Run the decks-install apply command.

The ---values option specifies the configuration values files. Include all the same values files as described in step 3 above.

For example:

$ ./decks-install apply --kustomize ./manifests/ --repo ./charts/ --values values.yaml,gen-values-1.4.yaml

6. Run the post-install.sh script.

a. Go to the folder where you extracted the decks-installer- .zip file.

b. Run the script.

$ ./scripts/post-install.sh

c. If the script indicates errors in your cluster, fix the issues, rerun decks-install apply, and then rerun this script.

Graceful shutdown and startup Use this procedure to shut down SDP in an orderly manner.

Prerequisites

The following utilities are required: awk kubectl (with context set to the SDP cluster)

sh xargs

Steps

1. (Optional but recommended) Stop all running Flink applications with a savepoint.

Use either kubectl or the Streaming Data Platform UI to issue savepoints.

2. Save a copy of the PodDisruptionPolicy files.

kubectl get pdb --all-namespaces -o json > pre-shutdown-pdb.json

3. Create a file with substitute availability controls.

SDP attempts to maintain a level of availability. For a graceful shutdown, you do not want the system to enforce the availability criteria. This step updates the PodDistruptionPolicy files with different content that permits the shutdown.

a. Create a file with the following content:

{ "spec":{ "maxUnavailable":null, "minAvailable":0 } }

b. Save the file with the name patch-pdb.json.

4. Edit the PodDisruptionPolicy files with the patch content.

108 Post-install Configuration and Maintenance

Run the following command:

$ kubectl get pdb --all-namespaces --no-headers | awk '{print $1,$2}' | xargs -n2 sh -c \ 'kubectl patch pdb $2 -n $1 --patch "$(cat patch-pdb.json)"' sh

The command gets all PodDisruptionPolicy files, gets the name and namespace of each, and then updates them.

5. Transition all Kubernetes worker nodes into maintenance mode.

Use the following commands:

$ NUM_NODES=$(kubectl get nodes --no-headers | grep -c ".")

$ kubectl get nodes --no-headers | awk '{print $1}' | xargs -n1 kubectl cordon

$ kubectl get nodes --no-headers | awk '{print $1}' | xargs -n1 -P $NUM_NODES sh -c \ 'kubectl drain $1 --delete-local-data --ignore-daemonsets' sh

NOTE: The command uses the option --ignore-daemonsets. In a later step that shuts down the cluster, the

daemonsets are drained.

6. Shut down the cluster using recommendation of the Kubernetes service provider that you are running.

7. Restart the cluster using recommendation of the Kubernetes service provider that you are running.

8. Enable the nodes to schedule pods (uncordon them).

NOTE: This step is not required if you used the force commands in the previous steps. The nodes are already

uncordoned.

$ NUM_NODES=$(kubectl get nodes --no-headers | grep -c ".")

$ kubectl get nodes --no-headers | awk '{print $1}' | xargs -n1 -P $NUM_NODES sh -c \ 'kubectl uncordon $1' sh

9. Monitor the startup.

$ watch kubectl get pods --all-namespaces

Give the cluster some time to become stable.

10. When the cluster is stable, restore the original PodDisruptionPolicy files.

$ kubectl replace --force -f pre-shutdown-pdb.json

Uninstall applications Use the decks-install unapply command to uninstall specified platform applications and their associated resources from the SDP cluster. These are applications mentioned in the Kubernetes manifests.

Prerequisites

Consult with Dell Technologies support personnel about your intended outcomes before uninstalling applications from the SDP cluster.

WARNING: If you need to delete the Flink, Spark, or Pravega application, be aware that existing Flink, Spark, or

Pravega data will be marked for deletion as well.

When you delete a project, all its resources (such as Flink, Spark, and PSearch clusters) are deleted with the project. To perform these deletions, use either of the following methods: The SDP user interface, go to the Analytics page and delete projects. The /scripts/uninstall/clean_projects.sh script, supplied with the distribution, deletes all projects.

Post-install Configuration and Maintenance 109

If the Pravega application is listed for removal, be aware that existing streams in Pravega will not be readable by a newly installed Pravega instance. Even if the nfs-client-provisioner.storageClass.archiveOnDelete setting is "true" in the current configuration, the archived data will not be readable by a new installation of the Pravega application.

About this task

The decks-install unapply command marks applications for removal from the Kubernetes cluster, based on a specified manifest bundle. One reason to perform an unapply for an application is to prepare to reinstall it with a different set of configuration values.

To uninstall all SDP applications and resources from cluster, so that you can start over with a new installation, use the decks-install unapply command with the same manifest that was used for the installation.

NOTE: Only resources that were initially created by SDP are removed. Other resources are not affected by the uninstall

procedure.

Steps

1. Identify or edit the manifest bundle. If you are uninstalling all SDP applications and resources from the cluster, so that you can start over with a new

installation, there is no need to update the manifest bundle. Use the same manifest bundle that you used with the decks-install apply command.

If you are uninstalling a few selected applications, you need a different manifest bundle. However, contact Dell Technologies support for advise. Some SDP resources depend on other resources.

2. Run the decks-install unapply command.

$ ./decks-install unapply --kustomize

For example:

$ ./decks-install unapply --kustomize ./unapplymanifest/

The decks-install unapply command does the following:

Marks applications and resources in the provided manifest bundle for deletion, in a pending state. By default, starts the synchronization process, which reconciles the cluster to the desired terminal state. An optional

parameter can defer the synchronization.

See decks-install unapply on page 190 for optional command parameters.

3. Check to ensure that the synchronization completes successfully.

4. If the synchronization procedure fails for whatever reason, use the following command to start it again. It is safe to restart the synchronization procedure at any time.

$ ./decks-install sync --kustomize

Reinstall into existing cluster In testing and development scenarios, you may want to uninstall the SDP software from the Kubernetes cluster and start over with a fresh software installation.

Steps

1. Run the decks-install unapply command using all configuration values files that were used for the installation.

2. If long-term storage is on an ECS appliance, manually clean the Pravega bucket before performing another install that uses that bucket.

3. Clear old DNS entries from the external DNS server.

This step applies only if you are using a local DNS provider for external connections (such as CoreDNS). With those types of DNS providers, old entries are not automatically updated. This results in the DNS query returning both old and new entries. The workaround is to manually delete the old DNS entries for the cluster. Use the etcdctl tool.

110 Post-install Configuration and Maintenance

NOTE: If you are using a cloud DNS provider for external connections (such as AWS Route53), the removal of old

entries is done for you. However, the installation is likely to take more time than the first installation. Propagating new

entries takes time, and depends on the DNS cache configuration in intermediate DNS servers.

a. Identify the entries:

> ETCDCTL_API=3 etcdctl get --prefix=true "" --endpoints http:// :2379| grep

b. Delete the entries:

> ETCDCTL_API=3 etcdctl del --endpoints http:// :2379

Here is an example session:

> ETCDCTL_API=3 etcdctl get --prefix=true "" --endpoints http://10.247.XX.XXX:2379| grep gladiator /skydns/com/dell/desdp/gladiator/keycloak/194bb2b2 {"host":"10.247.NNN.NNN","text":"\"heritage=external-dns,external-dns/ owner=gladiator.desdp.dell.com,external-dns/resource=ingress/nautilus-system/ keycloak\"","targetstrip":1} > ETCDCTL_API=3 etcdctl del /skydns/com/dell/desdp/gladiator/ keycloak/194bb2b2 --endpoints http:// :2379

4. Run the decks-install apply command using the configuration values files for the new installation.

Change ECS credentials after installation If ECS credentials are compromised, an SDP administrator can change the credentials.

About this task

It is required to run the commands in the next two tasks in a UNIX environment or in Windows Subsystem for Linux (WSL) on Windows.

Update ECS credential used by the ecs-service-broker This task updates the Management User password that the ecs-service-broker uses to access ECS. For minimal interruption to service, use the following steps.

Steps

1. Calculate the base64 representation of the new Management User password that you intend to use in Step 3.

$ echo -n ChangeMe2 | base64 Q2hhbmdlTWUy

2. Update the secret value using the base64 value.

$ kubectl patch secret -n nautilus-system ecs-broker-connection-auth \ --type='json' -p='[{"op":"replace","path":"/data/password","value":"Q2hhbmdlTWUy"}]'

3. In the ECS UI, update the password for the Management User.

4. Restart the ecs-service-broker pods.

$ kubectl get pods -n nautilus-system | grep ecs-service ecs-service-broker-55d545785c-vxsxd 1/1 Running 0 3d6h

$ kubectl delete pod ecs-service-broker-55d545785c-vxsxd -n nautilus-system pod "ecs-service-broker-55d545785c-vxsxd" deleted

Post-install Configuration and Maintenance 111

$ kubectl get pods -n nautilus-system ecs-service-broker-55d545785c-qdpfv 1/1 Running 0 9m53s <<<<<<<<<<<

Update ECS credential used by Pravega

This task updates the Object User Secret Key that Pravega uses.

Steps

1. Using the ECS UI, add a new SecretKey for the Object User.

NOTE: Do not set an expiration on the old key.

2. Calculate the base64 representation of the new SecretKey value.

$ echo -n w32cVABwifvVwwAm2HvIwAHsDn0mtvBCDlMMuggD | base64

dzMyY1ZBQndpZnZWd3dBbTJIdkl3QUhzRG4wbXR2QkNEbE1NdWdnRA==

3. Update the Pravega Secret.

$ kubectl patch secret -n nautilus-pravega nautilus-pravega-tier2-ecs \ --type='json' -p='[{"op":"replace","path":"/data/ SECRET_KEY","value":"dzMyY1ZBQndpZnZWd3dBbTJIdkl3QUhzRG4wbXR2QkNEbE1NdWdnRA=="}]'

secret/nautilus-pravega-tier2-ecs patched

4. Restart each SegmentStore pod, one at a time.

NOTE: Wait for each SegmentStore pod to fully start up, validating each pod with the logs command before

attempting to restart the next pod.

$ kubectl get pods -n nautilus-pravega | grep segment-store nautilus-pravega-segment-store-0 1/1 Running 0 7m8s

$ kubectl delete pod nautilus-pravega-segment-store-0 -n nautilus-pravega pod "nautilus-pravega-segment-store-0" deleted

# POD will be recreated. Wait until it's fully started up $ kubectl logs nautilus-pravega-segment-store-0 -n nautilus-pravega

5. When all SegmentStore pods are using the new SecretKey, you may delete the old SecretKey in the ECS UI.

112 Post-install Configuration and Maintenance

Manage Connections and Users

Topics:

Obtain connection URLs Connect and login to the web UI Log in to OpenShift for cluster-admins Log in to OpenShift command line for non-admin users Create a user Assign roles User password changes Password policy for SDP user accounts

Obtain connection URLs Cluster administrators can obtain the connection URLs using kubectl.

Steps

1. Log in as described in Log in to OpenShift for cluster-admins on page 114.

2. List access points into the cluster. Run kubectl get ingress --all-namespaces to list all access points into the cluster.

For example:

kubectl get ingress --all-namespaces NAMESPACE NAME HOSTS ADDRESS PORTS AGE my-project my-flinkcluster my-flinkcluster.my-project.test-psk.abc- lab.com 192.0.2.8... 80, 443 6d my-project repo repo.my-project.test-psk.abc-lab.com 192.0.2.8... 80, 443 6d nautilus-pravega nautilus-pravega-grafana grafana.test-psk.abc-lab.com 192.0.2.8... 80, 443 8d nautilus-pravega pravega-controller pravega-controller.test-psk.abc-lab.com 192.0.2.8... 80 8d nautilus-pravega pravega-controller-api pravega-controller-api.test-psk.abc- lab.com 192.0.2.8... 80 8d nautilus-system keycloak keycloak.test-psk.abc-lab.com 192.0.2.8... 80, 443 8d nautilus-system nautilus-ui test-psk.abc-lab.com 192.0.2.8... 80, 443 8d cluster-monitoring cluster-monitoring-grafana monitoring.test-psk.abc- lab.com 190.0.2.8... 80, 443 8d

All the values in the HOSTS column are valid access points for authorized users.

In the NAME column, locate nautilus-ui, and take note of the value in the HOSTS column. Values in the HOSTS column are URLs for external connections to the User Interface, and is the value to use in the configuration values file.

For example, from the list above, users can connect to the UI from external locations with the following URL:

https://test-psk.abc-lab.com

13

Manage Connections and Users 113

Connect and login to the web UI The SDP User Interface is a web interface, available for external connections over HTTPS.

Steps

1. Type the URL of the SDP User Interface in a web browser. The SDP login window appears.

2. Log in to SDP.

If your administrator provided local user credentials, use those credentials. If LDAP is integrated, use your enterprise credentials.

3. Click Log In.

If your username and password are valid, you are authenticated to SDP. One of the following windows appears:

Window Explanation

The Overview page on the UI. The username has authorizations that are associated with it. The Overview page is displayed to SDP administrators.

The Analytics page The Analytics page is displayed to users with authorized access to at least one project.

A welcome message. The welcome message is displayed to users without authorization to see any data.

4. If you need authorizations, ask an Administrator to make you a member of one or more projects.

Log in to OpenShift for cluster-admins This procedure is for admin users to manage the SDP Kubernetes cluster.

Prerequisites

You must have installed the OpenShift CLI on your local working platform. See https://docs.openshift.com/container- platform/4.6/cli_reference/openshift_cli/getting-started-cli.html.

You must have credentials with cluster-admin role for the SDP cluster in OpenShift.

Steps

1. Run the login command.

$ oc login -u kubeadmin -p $(cat ~/startline/ocp/auth/kubeadmin-password) The server uses a certificate signed by an unknown authority. You can bypass the certificate check, but any data you send to the server could be intercepted by others. Use insecure connections? (y/n): y

Login successful.

You have access to 68 projects, the list has been suppressed. You can list all projects with ' projects'

Using project "default". $

Where: The file used in the -p option is standard. Every OpenShift installation creates a kubeadmin password in <openshift

installation folder>/auth/kubeadmin-password.

To use certificates, find them in the kubeconfig file here: <openshift installation folder>/auth/ kubeconfig

2. Answer the prompts for the server URL and your credentials.

114 Manage Connections and Users

See https://docs.openshift.com/container-platform/4.6/cli_reference/openshift_cli/getting-started-cli.html#cli-logging- in_cli-developer-commands for an example.

Results

After successful login, you can manage the SDP cluster. Underlying scripts have copied the kube config files for you.

Log in to OpenShift command line for non-admin users This procedure is for non-admin users (project members) to interact with the SDP K8s cluster on the OpenShift command line.

Steps

1. The non-admin user who wants to use the OpenShift command line may need to log into the SDP UI one time with their LDAP account.

A Welcome screen indicates that the user account was authenticated but there are no authorizations associated with the account.

In the background, the user account is added into Keycloak. This addition enables the next step.

NOTE: This step is not required if LDAP federation is configured with the Synchronize all users option.

2. An SDP admin makes the user a member of projects. See Add or remove project members on page 122.

When users are assigned to projects, RBAC rules are created. Those rules apply to the SDP UI and OpenShift command line access.

3. The user can log in to the OpenShift command line with the LDAP credentials.

oc login -u

Create a user

When LDAP federation is enabled, create a new user in LDAP.

When federation is not enabled, you may create local users in Keycloak and make them project members. However, these users can access project resources only through the SDP UI. Local users cannot log in to OpenShift.

NOTE: When federation is not enabled, there is no access by non-admin users to the kubectl plane. Federation is required

for that to be possible.

Add new local user on the Keycloak UI

Without LDAP federation, administrators use the Keycloak dashboard to create usernames for access to the SDP UI.

Steps

1. In a browser window, go to the Keycloak endpoint in the SDP cluster.

To list connection endpoints, see Obtain connection URLs on page 96. If the SDP UI is open, you can try prepending keycloak. to the UI endpoint. For example, http://

keycloak.sdp.lab.myserver.com. Depending on your configuration, this might not always work.

2. On the Keycloak UI, click Administration Console.

3. Log in using the Keycloak administrator username (admin) and password.

See Obtain default admin credentials on page 104.

4. Click Manage > Users.

5. On the Users screen, click Add User on the right.

Manage Connections and Users 115

6. Complete the form.

NOTE: The username must conform to Kubernetes and Pravega naming requirements as described in Naming

requirements on page 118.

7. Optionally click the Credentials tab to create a simple initial password for the new user.

Create a temporary password. Enable Temporary, which prompts the user to change the password on the next login.

8. To authorize the new user to perform actions and see data in SDP, make the user a member of projects.

Assign roles SDP supports admin and project member roles. Admins assign roles, as shown in the following table.

User request Role required Procedure

Become a project member

Administrators add users to a project.

See Add or remove project members on page 122.

If federation is enabled, the project member role is granted in Keycloak and in Kubernetes.

Become an Administrator

Administrators assign the admin role to other users.

1. Create the new user in Keycloak. 2. Assign the admin realm role to the user, using the Keycloak UI.

3. Consider whether you want to give this user cluster-admin role in Kubernetes. With or without federation, the corresponding role on the Kubernetes side is not granted.

User password changes

Scenario Description

Federation is enabled. All user accounts are managed in the identity provider, such as LDAP. The user changes a password in LDAP. Even though there is a shadow account in Keycloak, the user can ignore it. A password is not needed there.

Federation not used. All user accounts are managed in the local Keycloak instance. The user changes a password in Keycloak.

Change password in Keycloak

Use this procedure to change a password or other profile attributes in the local Keycloak system.

Steps

1. Log in to the SDP UI with the username whose password or other profile attributes you want to change.

2. In the banner, click the User icon.

3. Verify that the username at the top of the menu is the username whose profile you want to change.

4. Choose Edit Account.

5. To change the password, complete the password-related fields.

6. Edit other fields in the profile if needed.

7. Click Save.

8. To verify the password change:

a. Click the User icon and choose Logout. b. Log back in using the new password.

116 Manage Connections and Users

Password policy for SDP user accounts

The password policy applies to all SDP user accounts, regardless of where the user account is defined. The rules apply to all the following accounts:

Local user accounts defined in Keycloak Default administrative accounts defined by the system at installation

The default administrative accounts are desdp and admin. The passwords that are associated with these accounts may be explicitly specified in a values.yaml file or permitted to default to system-generated values. For more information about these accounts, see Configure passwords for the default administrative accounts on page 87.

The password policy rules are:

A password must have a minimum of 12 characters. A password must include the following: At least one lowercase alpha character

At least one uppercase alpha character At least one number At least one special character

Special characters are limited to the following character set:

@#$%^&+=

An example password is:

myPa$$w0rd123

Manage Connections and Users 117

Manage Projects Administrators create, delete, and configure analytic projects.

Topics:

Naming requirements Manage projects Manage scopes and streams

Naming requirements These requirements apply to user names and resource names in SDP.

User name and project name requirements

User names and project names must conform to the following Pravega naming conventions:

The characters allowed in project names and user names are: digits ( 0-9 ), lower case letters ( a-z ), and hyphen ( - ).

The names must start and end with an alphanumeric character (not a hyphen). Project names are limited to 15 characters in the UI. For user names, the first two points apply to any user name that will become a project member or

admin, regardless of the registry in which they are defined (Keycloak, LDAP database, or other user database).

Other resource names

All other resource names must conform to Kubernetes naming conventions:

The characters allowed in names are: digits ( 0-9 ), lower case letters ( a-z ), hyphen ( - ) , and period ( . ).

The UI enforces character limitations on some resource names.

Manage projects This section defines the SDP concept of projects and describes administrative tasks for managing them.

About projects

Projects are important resources in SDP.

All analytic processing capabilities are contained within projects. Projects provide support for multiple teams working on the same platform, while isolating each team's resources from the others. Project members can collaborate in a secure way. SDP manages resources for each project (each team) separately.

An SDP project is a Kubernetes custom resource of kind Project. The Project resource is a Kubernetes namespace enhanced with the following resources and services:

Resources and services Function

Maven repository Stores artifacts for analytic applications in the project

Project storage A persistent volume claim (PVC) for analytic applications

Pravega credentials Allows analytic jobs to communicate with Pravega

Pravega scope Represents a top level construct for grouping all the project streams. The Pravega credentials are configured to have access to this scope.

14

118 Manage Projects

Resources and services Function

Features Artifact repository, Metrics, Zookeeper, JupyterHub, Pravega Ingest Gateway, Pravega MQTT Broker, Video Server

Developers and data analysts work within projects. Each project has its own Maven repo, its own set of cluster resources for analytic processing, its own scope and streams, and its own set of project members. Only project members (and platform administrators) are authorized to view the assets in a project, access the streams, upload application artifacts, and run analytic jobs. Project isolation is one way that SDP implements data protection and isolation of duties.

Project features are components that can be added to a Project, such as the Artifact Repository, Metrics Stack, Jupyter Hub, Pravega Ingestion Gateway. If a feature is selected, SDP will provision the components within the Project namespace to provide that service.

A project must be created by an SDP administrator. The administrator can use either of the following methods to create a project:

SDP UIThis is the quickest and most convenient method. Kubernetes commands and a resource fileUse this method if the default configurations employed by the UI do not satisfy

the project team's needs.

Create a project

Create a project on the SDP UI.

Steps

1. Log in to the UI as an admin.

2. Click the Analytics icon.

The Analytic Projects table appears.

3. Click Create Project at the top of the table.

4. In the Name field, type a name that conforms to Kubernetes naming conventions.

The project name is used for the following: Project name in SDP UI The Kubernetes namespace for the project A local Maven repository for hosting artifacts for applications defined in the project The project-specific Pravega scope Security configurations and settings that allow all analytic applications in the project to have access to all the Pravega

streams in the project scope

5. In the Description field, optionally provide a short phrase to help identify the project.

6. Provide storage attributes for the project.

The fields that appear are different depending on the type of long-term storage that is configured for SDP.

Long-term storage type

Field name Description

NFS Storage Volume Size Provide the size of the persistent volume claim (PVC) to create for the project. This value is the anticipated space requirement for storing all the streams in the project.

SDP provisions this space in the configured PowerScale file system or node disks.

Namespace on ECS Bucket Plan Choose the plan for provisioning the S3 bucket for the project. Plans are defined in the configuration values file. There is always a default plan.

The system provisions the bucket in the configured ECS namespace. You can view the project bucket name on the project page in the UI.

Manage Projects 119

7. Under Metrics, choose whether to enable or disable project-level analytic metrics collection.

The option is enabled by default. Data duration policy is set to two weeks.

For more information about Metrics, see the Dell Technologies Streaming Data Platform Developer's Guide .

8. Under Features, choose the features to enable in the project.

Your selections depend on the intended applications that developers plan to deploy in the project.

9. Click Save. The new project appears in the Analytic Projects table in a Deploying state. It may take a few minutes for the system to create the underlying resources for the project and change the state to Ready.

Create a project manually

Use this task to create a project on the command line. With this method, you can alter more of the configuration settings.

About this task

An SDP project is a Kubernetes namespace with a single Project resource in it. Project is a custom Kubernetes resource.

In this task, you first create a namespace and then add the resource of kind Project to that namespace. The Project resource triggers deployment of project-related artifacts and services, such as zookeeper, maven, and so on.

NOTE: The following rules are important:

The names of the namespace and the Project resource must match.

Only one Project resource can exist in a namespace.

Steps

1. On the command line, log in to the SDP Kubernetes cluster as an administrator (admin role).

2. Create a Kubernetes namespace, using the project name as namespace name.

$> kubectl create namespace

Where conforms to the Kubernetes naming conventions.

3. Create a yaml file that defines a new resource of kind Project.

a. Copy the following resource file as a template.

apiVersion: nautilus.dellemc.com/v1alpha1 kind: Project metadata: name: spec: maven: persistentVolumeClaim: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi storageClassName: ""

#for PowerScale long-term storage storage: persistentVolumeClaim: accessModes: - ReadWriteMany resources: requests: storage: 10Gi storageClassName: nfs

#for ECS long-term storage storage: plan: default parameters: reclaim-policy: Delete

120 Manage Projects

zookeeper: size: 3 metrics: enabled: true

b. Edit the following values in the yaml file.

Label Description

metadata.name: The name that you assigned to the namespace

This name and the namespace name must match.

spec.maven.persistentVolumeClaim .resources.requests.storage:

The size of PVC storage for the Maven repository

This repository holds artifacts for the project. The UI uses a default of 10Gi. Increase this value if a large number of artifacts are expected for the project.

spec.maven.persistentVolumeClaim .storageClassName:

The PVC storage class name created during infrastructure setup for maven

You may leave this setting blank.

#for PowerScale long-term storage

storage: persistentVolumeClaim: accessModes: - ReadWriteMany resources: requests: storage: 10Gi storageClassName: nfs

Defines long-term storage resources.requests.storage is the size of PVC storage for

shared storage between all clusters in the project. This space stores all checkpoints and savepoints. Consider the expected state size. If you create a Project on the UI, the default value is 10Gi.

#for ECS long-term storage

storage: plan: default parameters: reclaim-policy: Delete

Defines long-term storage plan is optional. If not provided, the default plan is used.

parameters is optional.

zookeeper.size: The number of nodes in the Zookeeper cluster

The Flink clusters use Zookeeper to provide high availability. Under typical conditions, a setting of 3 is sufficient.

metrics.enabled:true Enables the creation of project-specific metrics stacks.

c. Check that the syntax is valid yaml and save the file.

4. Apply the resource file.

$> kubectl create -n -f .yaml

5. Check the project for readiness.

$> kubectl get Project -n

The response indicates the status of the resource. The Status:Ready: flag changes to true when the project resource is ready for use. It may take several minutes for the framework to prepare the supporting infrastructure for the project.

Manage Projects 121

Delete a project

When an administrator deletes a project, SDP cleans up all resources associated with the project.

Steps

1. Log in to the UI as an admin.

2. Click the Analytics icon.

The Analytic Projects table appears, listing all projects.

3. Click Delete in the project row.

SDP deletes all resources in the project namespace, including: Flink applications and clusters Spark applications PSearch queries and the PSearch cluster The project's Maven repository The project's analytic metrics stack The project's scope and all streams in it

The actual stream data might be archived on the storage media, depending on configuration settings. Regardless, the data is not accessible by SDP after the project is deleted. For ECS, archival depends on the Reclaim Policy setting for the bucket plan, which is established when the project is

created. If reclaim policy is Detach, the project bucket is left intact even though SDP cannot access it. Plans are set up in the configuration values file. The default used is Detach. (The data is archived.)

For NFS, the underlying Kubernetes PVC is deleted. Archival depends on the nfs-client- provisioner.storageClass.archiveOnDelete: setting in the configuration values file. The default used is true. (The data is archived.)

Jupyter Notebooks Video applications

For more information about archive settings, see Provision long-term storage on PowerScale on page 70 and Provision long-term storage on ECS on page 70

Add or remove project members

Project members have permission to create, modify, or view the streams and applications within the project.

Prerequisites

The username to add to a project must exist as an SDP username. This task does not create user accounts.

About this task

NOTE: Never add admin users as project members. Admin users can always access all projects.

Steps

1. Log in to the SDP UI as an admin.

2. Go to Analytics > > Members.

A table of existing project members appears, with a textbox for entering a new username in the table header.

3. To add a user to the project, type an existing SDP username in the Username textbox, and click Add Member.

NOTE: Do not add admin users as project members. They have full access to all projects.

The username appears in the refreshed table of members.

4. To remove a member, locate the member name in the table, and click Remove in that row.

122 Manage Projects

List projects and view project contents

Administrators can view summary information about all projects. Other users can view information only about the projects of which they are members.

Steps

1. Click the Analytics icon.

The project table lists the projects that your user credentials give you permission to view.

2. Click a project name to drill into that project.

The page header includes identifying information about the project. Creation Date Description Storage information:

For ECS long-term storage, the system-generated bucket name for the project and the access credential are available.

For PowerScale long-term storage, the project volume size is displayed.

For more information about long-term storage configuration, go to System > Storage.

The remainder of the page is a dashboard showing: Number of Flink clusters and applications that are defined in the project Number of Spark applications defined in the project Number of camera secrets, camera pipelines, and GStreamer Pipelines in the project Number of application artifacts uploaded (for admins only) Number of members that are assigned to the project Number of streams in the project Number of PSearch clusters defined, number of streams that are configured as searchable, and number of continuous

queries registered

Under the icons, a Messages section shows Kubernetes event messages pertaining to the project, if any are available.

NOTE: Kubernetes events are short-lived. They only remain for one hour.

3. To drill further into aspects of the project, click the tabs along the top of the dashboard:

Tab Description

Dashboard Returns you to the dashboard of summary tiles.

Artifacts Manage project artifacts. There are two types of artifacts:

Manage Projects 123

Tab Description

MavenUpload, download, and delete artifacts in the project Maven repository. FilesUpload, download, and delete files directly.

Members Manage project members. Administrators can view current members, add members, and delete members.

This tab does not appear for nonadmin users.

Flink View, create, and manage Flink clusters and Flink applications.

Spark View, create, and manage Spark applications.

Pravega Search View, create, and manage PSearch clusters. Make streams searchable. Register continuous queries. This screen also includes a link to the Pravega Search Kibana integration.

Video View, create, and manage GStreamer Pipelines, Camera Recorder Pipelines, and Camera secrets, and manage video streams.

Features View information about the features that are deployed in the project. The features were selected during project creation.

What's next with projects

After project creation, users have various interests and responsibilities towards projects depending on their roles. Administrators maintain the project's member list. A project team usually consists of developers and data analysts. Platform

administrators have access to all projects by default. Administrators should also monitor resources associated with application processing and stream storage. They may also

monitor stream ingestion and scaling. Developers typically create Flink clusters, Spark applications, Video applications, and Jupyter notebooks, upload their

application artifacts, create the required streams associated with the application, and run and monitor applications. Data analysts may run and monitor applications. They may also need to monitor or analyze metrics for the project. Developers may create PSearch queries. Administrators and project members may use the metrics in the project's Grafana UI for analysis and troubleshooting.

More information

The Dell Technologies Streaming Data Platform Developer's Guide describes how to add Flink and Spark applications toSDP, associate streams to applications, start, stop, restart and monitor applications. It also describes how to use the embedded Flink and Spark UIs.

Monitor Health on page 133 and Use Pravega Grafana Dashboards on page 139 describe administrative tasks for ensuring that adequate storage and processing resources are available to handle stream volume and analytic jobs.

Manage scopes and streams This section defines Pravega scopes and streams and describes administrative tasks for managing them.

About scopes and streams

Scopes and streams are Pravega constructs that exist within the context of SDP projects.

Pravega streams A Pravega stream is an unbounded stream of bytes or stream of events. Pravega writer REST APIs write the streaming data to the Pravega store. Before that can happen, the stream must be created and configured.

Administrators or project members can create and configure streams in the SDP UI. Applications may also create new streams or be associated with existing streams.

A stream must be created within a scope.

124 Manage Projects

Pravega scopes A Pravega scope provides a name for a collection of streams. The full name for a stream is scope- name/stream-name.

In SDP, each Analytic project has its own Pravega scope. The scope name is the same as the project name. SDP creates the scope for the project when a project is created, and the new scope appears in the list of scopes on the Pravega page of the UI.

The new scope is registered as a protected resource in Keycloak. Also, a project-wide service account is created with authorization to access this scope and all the streams under it.

Access to scopes and streams

Project members and applications running in a project have READ and UPDATE access to the streams in their project's scope.

The cross project scope sharing feature provides applications with READ access to scopes in other projects. In that way, applications in one project can read stream data from other projects. A project scope can be shared in a read only mode with many other projects.

Administrators manage cross project scope sharing on the UI, granting and removing access.

Pravega schema registry

The Pravega schema registry manages schemas, policies and codecs associated with Pravega streams. The registry supports schema groups that let you evolve schemas as streams are enhanced or changed over time. All schemas in the group are associated with the stream and Pravega can apply them as appropriate. For more about using the schema registry, see the Dell Technologies Streaming Data Platform Developer's Guide .

Create and manage streams

Use this procedure to define a new stream.

Steps

1. Log in to the SDP UI as admin or a project member.

2. Click the Pravega icon in the banner.

A list of scopes appears. Scope names are the same as project names.

3. Click a scope.

4. Click Create Stream.

5. Complete the configuration screen as described in Stream configuration attributes on page 125.

Red boxes indicate errors.

6. When there are no red boxes, click Save. The new stream appears in the Pravega Streams table. The new entry includes the following available actions: Edit Change the stream configuration. Delete Delete the stream from the scope (project).

Stream configuration attributes

The following tables describe stream configuration attributes, including segment scaling attributes and retention policy.

General

Property Description

Name Identifies the stream. The name must be unique within the scope and conform to Kubernetes naming conventions. The stream's identity is:

scopename/streamname

Scope The scope field is preset based on the scope you selected on the previous screen and cannot be changed.

Manage Projects 125

Segment Scaling

A stream is divided into segments for processing efficiency. Segment scaling controls the number of segments used to process a stream.

There are two scaling types: Dynamic With Dynamic scaling, the system determines when to split and merge segments for optimal performance.

Choose dynamic scaling if you expect the incoming data flow to vary significantly over time. This option lets the system automatically create additional segments when data flow increases and to decrease the number of segments when the data flow slows down.

Static In Static scaling, the number of segments is always the configured value. Choose static scaling if you expect a uniform incoming data flow.

You can edit any of the segment scaling attributes at any time. It takes some minutes for changes to affect segment processing. Scaling is based on recent averages over various time spans, with cool down periods built in.

Scaling type Scaling attributes Description

Dynamic Trigger Choose one of the following as the trigger for scaling action: Incoming Data Rate Looks at incoming bytes to determine when segments

need splitting or merging. Incoming Event Rate Looks at incoming events to determine when

segments need splitting or merging.

Minimum number of segments

The minimum number of segments to maintain for the stream.

Segment Target Rate

Sets a target processing rate for each segment in the stream. When the incoming rate for a segment consistently exceeds the specified

target, the segment is considered hot, and it is split into multiple segments. When the incoming rate for a segment is consistently lower than the specified

target, the segment is considered cold, and it is merged with its neighbor. Specify the rate as an integer. The unit of measure is determined by the trigger choice. KB/sec when Trigger is Incoming Data Rate. The default value in the UI is set

to 5120 KB/s. You can refine your target rate after performance testing. Events/sec when Trigger is Incoming Event Rate. Settings would depend on

the size of your events, calculated with the MB/sec guidelines above in mind.

To figure out an optimal segment target rate (either MB/sec or events/sec), consider the needs of the Pravega writer and reader applications. For writers, you can start with a setting and watch latency metrics to make

adjustments. For readers, consider how fast an individual reader thread can process the

events in a single stream. If individual readers are slow and you need many of them to work concurrently, you want enough segments so that each reader can own a segment. In this case, you need to lower the segment target rate, basing it on the reader rate, and not on the capability of Pravega. Be aware that the actual rate in a segment may exceed the target rate by 50% in the worst case.

Scaling Factor Specifies how many colder segments to create when splitting a hot segment.

Scaling factor should be 2 in nearly all cases. The only exception would be if the event rate can increase 4 times or more in 10 minutes. In that case, a scaling factor of 4 might work better. A value higher than 2 should only be entered after performance testing shows problems.

Static Number of segments

Sets the number of segments for the stream. The number of segments used for processing the stream will not change over time, unless you edit this attribute. The value can be increased and decreased at any time.

We recommend starting with 1 segment and increasing only when the segment write rate is too high.

126 Manage Projects

Retention Policy

The toggle button at the beginning of the Retention Policy section turns retention policy On or Off. It is Off by default.

Off (Default) The system retains stream data indefinitely. On The system discards data from the stream automatically, based on either time or size.

Retention Type Attribute Description

Retention Time Days The number of days to retain data. Stream data older than Days is discarded.

Retention Size MBytes The number of MBytes to retain. The remainder at the older end of the stream is discarded.

Manage cross project scope sharing

Administrators can grant a project read access to scopes in other projects.

Steps

1. Log in to the SDP UI as admin.

2. Click the Pravega icon in the banner.

A list of scopes appears. Scope names are the same as project names.

3. Click the scope that needs read access added to (or removed from) other projects.

4. Click Cross Project Access.

5. To grant read access to other projects:

a. In Grant READ access on ..., choose one or more project names from the drop-down list.

Your selections appear in the text box.

b. Click Save.

6. To remove read access to one or all previously granted projects:

a. Navigate back to the Cross Project Access page. b. In Grant READ access on ..., click the X next to a project name to remove it from the list. c. Click Save.

Next steps

These administrative actions complete the setup for read only access for cross project scope sharing.

Project applications must be configured to talk to a particular stream or set of streams in a shared scope. For more information, see the Dell Technologies Streaming Data Platform Developer's Guide .

Manage Projects 127

Start and stop stream ingestion

Stream ingestion is controlled by native Pravega applications, Flink applications, Pravega, Flink, Spark, or GStreamer applications.

The SDP UI creates and deletes scope and stream entities, and monitors various aspects of streams. The UI does not control stream ingestion.

Monitor stream ingestion

You can monitor performance of stream ingestion and storage statistics using the Pravega stream page in the SDP UI.

Steps

1. Log into the SDP UI as a project member or admin.

2. Go to Pravega > > .

This page shows: Ingestion rates General stream parameter settings Segment heat charts show segments that are hotter or colder than the current trigger rate for segment splits. For

streams with a fixed scaling policy, the colors on the heat chart can indicate ingestion rates. The redder the segment is, the higher the ingestion rate.

The heat chart provides visualization of the data flow based on the routing key.

128 Manage Projects

Manage runtime images Many components in SDP, such as Flink and Spark, require specific runtime images. SDP administrators can manage the available runtime images, including adding new images, deleting images, and editing attributes of existing images.

Topics:

The RuntimeImage resource View runtime images Create a runtime on the SDP UI Create runtime on the command line

The RuntimeImage resource The Kubernetes RuntimeImage resource represents runtime images in SDP. SDP administrators can manage RuntimeImage resources using the SDP UI or the Kubernetes command line.

View runtime images

Steps

1. On the SDP UI, click System > Runtimes. The screen shows all RuntimeImage resources that are defined in SDP.

2. On the command line, use kubectl commands on the RuntimeImages resource.

$ kubectl get RuntimeImages NAME DISPLAY NAME VERSION SDP REPO IMAGE DESCRIPTION flink-1.12.5 Flink 1.12.5 1.12.5 true flink:1.12.5-2.12-1.2-14-57c8cc5 Flink 1.12.5 with Scala 2.12 flink-1.13.2 Flink 1.13.2 1.13.2 true flink:1.13.2-2.12-1.2-14-57c8cc5 Flink 1.13.2 with Scala 2.12

15

Manage runtime images 129

gstreamer-1.18.4 GStreamer 1.18.4 1.18.4 true gstreamer:1.3-W15-3- c8b8ac7 GStreamer 1.18.4 spark-2.4.8 Spark 2.4.8 2.4.8 true spark:2.4.8-2.7.3-1.2-14-57c8cc5 Spark 2.4.8 with Python 3.7 spark-3.1.2 Spark 3.1.2 3.1.2 true spark:3.1.2-3.2.0-1.2-14-57c8cc5 Spark 3.1.2 with Python 3.7

Create a runtime on the SDP UI You can make a new runtime image available in SDP using the SDP UI.

Steps

1. Click System > Runtimes.

2. Click Create Runtime in the upper right of the screen.

3. Complete the screen using the following information, and click Save.

Field Description

Name Required. The resource name. This value is the name to use on the Kubernetes command line to access the runtime.

Type Required. Indicates the intention of this runtime resource. For example, SDP looks for RuntimeImages marked flink or spark when validating the image creating a cluster

Version Required. The action version of whatever the RuntimeImage represents. For example, in Flink this value is the Flink version that is used in application scheduling decisions.

Display Name Required. The name to show on dropdown lists in the SDP UI.

Description Optional. A description of the runtime.

Add Environment Variable Optional. The RuntimeImages resources can inject environment variables into target containers that use the {runtime: } image. Any environment variables that are specified are injected.

Add Property Optional. A RuntimeImage can contain properties. For example:

spec: ... ... properties: JVM_OPTIONS: "- Djdk.tls.client.protocols=TLSv1.3,TLSv1.2"

When it is injecting the RuntimeImage docker image, SDP looks for {property: } tags in container arguments and in existing environment variables. If the property value is found, SDP replaces the tag with the value from the RuntimeImage. If the property value is not found, SDP removes the tag.

Use properties to inject values from a RuntimeImage as arguments. JVM arguments are an example.

Specifying JVM Arguments

SDP uses a custom CA Certificate injection mechanism which requires the exact location of JAVA_HOME for Java-based images. For this reason, Java-based runtimes must contain the JAVA_HOME environment variable. The variable defines the JAVA_HOME on the underlying Docker image.

NOTE: Failure to specify JAVA_HOME can cause communication problems between a

pod and Pravega.

Docker: Image Required. The image name in Docker.

130 Manage runtime images

Field Description

Docker: SDP Registry Required. Determines where to look for the image. The default is true. TrueUse the image that is in the SDP Docker Registry. FalseUse the image from DockerHub.

Create runtime on the command line Runtimes are expressed in SDP using the RuntimeImage resource. Use kubectl commands to create a RuntimeImage resource.

Following is an example definition of a RuntimeImage. See field descriptions in the previous section.

apiVersion: nautilus.dellemc.com/v1beta1 kind: RuntimeImage metadata: name: test-java spec: type: flink version: 1.10.1 displayName: Flink 1.10.1 with Scala 2.11 description: Flink 1.10.1 docker: sdpRegistry: false image: openjdk:8-jre environment: FLINK_VERSION: 1.10.1 API_VERSION: v1 DEPLOYER_VERSION: v2

How Runtimes work

Any pod in SDP can use a RuntimeImage by specifying the special value of {runtime: } in the image section of a container pod. SDP intercepts all pod creation requests and replaces the tag with the correctly resolved Docker image URL from the specified RuntimeImage.

The docker section of the RuntimeImage is used to determine the target Docker image location.

If the docker.sdpRegistry flag is true, the specified image is resolved to the same Docker registry that SDP was installed from. This Docker registry address is appended to the Docker image field before it is replaced in the pod. For example, if the docker.image field specifies mything:v1.0 and the docker.sdpRegistry flag is true, the final image URL could be similar to harbor.dell.com/mything:v1.0.

If the docker.sdpRegistry flag is false or not specified, the docker.image field is used as is. For example, if the image field specifies openjdk:8-jre and the docker.sdpRegistry field is false, the final image in the pod container is: openjdk:8-jre. Kubernetes attempts to pull the image from Docker Hub.

Example

Consider the following RuntimeImage:

apiVersion: nautilus.dellemc.com/v1beta1 kind: RuntimeImage metadata: name: java-app-1.2.1 spec: type: java-app version: 1.2.1 displayName: Custom Java App 1.2.1 description: Custom Java App docker: sdpRegistry: true image: openjdk:8-jre environment: JAVA_HOME: /usr/bin/java

Manage runtime images 131

Consider the following pod definition:

apiVersion: v1 kind: Pod metadata: name: myflink spec: containers: - name: flink image: "{runtime: java-app-1.2.1}"

The final pod definition is:

apiVersion: v1 kind: Pod metadata: name: myflink spec: containers: - name: test-app image: harbor.dell.com/openjdk:8-jre environment: JAVA_HOME: /usr/bin/java

132 Manage runtime images

Monitor Health

Topics:

Monitor licensing Monitor and manage issues Monitor and manage events View detailed system-wide metrics Run health-check Monitor Pravega health Monitor stream health Monitor Apache Flink clusters and applications Monitor Pravega Search resources and health Logging

Monitor licensing The SDP UI shows licensing status.

To view the status of your SDP licenses, log onto the UI and go to Settings > License.

NOTE: If you installed the product with an evaluation license, no licenses are listed.

The following table describes information on the License screen.

Section Field Name Description

Header Entitlement SWID The SDP product Software ID

Instance SWID The SupportAssist Activation Software ID

Body Name One type of license is tracked within the SDP product license: Streaming Platform CoresTracks the number of virtual CPUs dedicated to other

processing within the platform.

Type Licenses for SDP are subscription licenses.

Start Date Shows the date when the license was obtained.

End Date Shows the date when the subscription ends. On this date, you begin to receive warning events about an expired license. Contact Dell Technologies to renew a subscription.

Grace Period Shows the date when the grace period ends. On this date, you begin to see only critical events collected on the events screen. The product does not shut down. Dell Technologies contacts you about subscription renewal.

Quantity Shows the number of cores in your subscription

Usage This metric tracks usage on the cores in each category. If your usage rises above the threshold, you may be required to increase the number of cores in the subscription.

NOTE: The product does not shut down because of an expired subscription. However, if you upload an expired license or

alter the license file, the signature is invalidated and your product is no longer licensed.

NOTE: Be careful when performing network transfers of the license file with tools such as FTP. To avoid any signature

changes when using FTP, use the binary option.

16

Monitor Health 133

Monitor and manage issues The SDP UI displays and provides convenient features for managing the issues.

To see issues, log in to the UI and go to System > Issues.

The issues can be filtered by Severity, Component, Application, and Reason.

You can use the Search button to perform the search operation.

The Severity levels for Issues are:

Critical Warning Info Normal Error

NOTE: PRAVEGA-1008 is an Issue which is caused when Pravega ingestion goes over the pre-set threshold that is

determined by the license. This issue is auto-cleared when the ingestion goes down below the threshold for at least an hour.

Monitor and manage events The SDP UI displays collected events and provides convenient features for managing events.

To show collected events, log in to the UI and go to System > Events. Events are messages that are collected from the Streaming Data Platform applications and their associated k8s resources.

The events can now be filtered by Severity, Component, Application, and Reason.

You can use the Search button to perform the search operation.

The Severity levels for Events are:

Critical Warning

134 Monitor Health

Info Normal Error

View detailed system-wide metrics Administrators can view detailed metrics about the entire system, the K8s nodes, all projects, all GPUs, and all Node Exporters.

Steps

1. Log on to the SDP UI as an administrator.

2. Click Dashboard > Monitoring Metrics. A Grafana homepage opens in a new tab.

3. On the Grafana page, click Home.

4. In the list of dashboard names, click the one you want to investigate.

The following dashboards are available:

Name Description

System Shows health and status metrics for the K8S cluster, Pravega, and SDP.

K8S Nodes Shows various usage and network I/O metrics for the K8S nodes. You can select to view metrics for all nodes or by host.

Projects Shows health and status metrics for the K8S cluster and for individual projects. You can select the project name to view.

GPUs Shows metrics for GPUs for the entire system, or by project and workload (Flink or Spark applications).

5. To return to the SDP UI, click the SDP tab in your browser.

Run health-check This script checks the state of various components in the SDP cluster. It may be run at any time after SDP is installed.

Steps

1. Navigate to the folder where you unzipped the decks-installer- .zip file.

2. Run the script.

$ ./scripts/health-check.py

Monitor Health 135

The output looks similar to the following:

Starting health check... - Checking pod health - Checking pod health for namespace : nautilus-system - Checking pod/container state - All pods/containers seem healthy - Checking container restarts - No containers have high restart counts - Checking pod health for namespace : longevity-0 - Checking pod/container state - All pods/containers seem healthy - Checking container restarts - No containers have high restart counts - Checking pod health for namespace : catalog - Checking pod/container state - All pods/containers seem healthy - Checking container restarts - No containers have high restart counts - Checking pod health for namespace : nautilus-pravega - Checking pod/container state - All pods/containers seem healthy - Checking container restarts - No containers have high restart counts - Checking pravega cluster health - Pravega-cluster state is healthy - Check for failed helm deployments - No failed helm deployments were detected - Check Tier2 - Tier2 check succeeded

Monitor Pravega health Dashboards in the SDP UI and predefined dashboards in Pravega Grafana UI provide information about Pravega operations.

Monitor for the following issues concerning Pravega:

Network issues Slow or stopped throughput without an obvious reason might indicate a network issue. You can monitor throughput on these dashboards: The first chart on the Dashboard screen on the SDP UI The Pravega Operation Dashboard on the Grafana UI

Adequate memory

The Pravega System Dashboard on the Grafana UI shows various memory-related metrics.

NOTE: Pravega InfluxDB only keeps data for a month. Data older than one month is not available.

Pravega Alerts The Pravega Alerts Dashboard shows the metrics on any Exceptions and Warnings logged by Pravega, and generates Grafana alerts if Pravega components get into certain states. When any of the alerts are raised, they are displayed in the SDP UI on the Events page. The licensing alert (Pravega ingestion surpassing preset threshold) are displayed on the Issues page of the SDP UI.

Monitor stream health Monitor streams for the following issues.

Hot streams Segment heat charts show segments that are hotter or colder than the current trigger rate for segment splits.

For streams with a fixed scaling policy, the colors on the heat chart can indicate ingestion rates. Red segments indicate high ingestion rates.

To view the segment heat chart, on the SDP UI, click Pravega > scope-name > stream-name and then View Stream.

Unusual activity Monitor streams for unusual fluctuations on the following Grafana dashboards:

136 Monitor Health

The Pravega Stream Dashboard shows stream-specific metrics at a higher level than the heat charts.

The Pravega Scope Dashboard lets you compare streams in the same scope.

Pravega storage View metrics that identify problems with Pravega interacting with storage on the Grafana UI: Pravega Alerts Dashboard Pravega Operation Dashboard

Monitor Apache Flink clusters and applications Monitor the status and details of Apache Flink applications running in SDP.

Application events

To monitor applications, in the SDP UI, click Analytics > project-name > Dashboard. Events for all application in the project appear in the Message section below the dashboard tiles. Events include scheduled, started, savepointed, stopped, and canceled.

Job status Details about running Flink applications are available on the Apache Flink Web UI. SDP contains direct links to the Apache Flink Web UI in two locations: Click Analytics > project-name > Flink Clusters > cluster-name. The cluster name is a link to the

Apache Flink Web UI which opens in a new browser tab. It displays the Overview screen for the Flink cluster you clicked. From here, you can drill into status for all jobs and tasks.

Figure 17. Apache Flink Web UI Analytics > project-name > Apps > Flink > application-name. Each application name is a link to a

Flink Web UI page that shows the running Flink Jobs in that application. These pages also appear in a new browser tab.

Flink cluster health

For projects with Metrics enabled, you can monitor Flink cluster health with the help of Grafana dashboards available in the project metrics stack. 1. In the SDP UI, click Analytics > project-name > Metrics. 2. On the Dashboard that appears, click Home. Then click the Flink cluster name that you want to

investigate. There is a dashboard for each Flink cluster, each Spark application, and each Pravega Search cluster.

Monitor Health 137

Monitor Pravega Search resources and health Monitor the resource availability and other health metrics of a Pravega Search cluster.

Efficiency and resources

See the Dell Technologies Streaming Data Platform Developer's Guide for details about checking on the health of the cluster, allocating resources for efficiency, and scaling the cluster.

Pravega Search cluster health

For projects with Metrics enabled, you can monitor Pravega Search cluster health with the help of Grafana dashboards available in the project metrics stack. 1. In the SDP UI, click Analytics > project-name > Pravega Search. 2. In the Pravega Search tab, click Search Metrics. 3. On the Dashboard that appears, click the dashboard for the Pravega Search cluster.

Logging Administrators can access platform logs.

Pravega Errors and Warnings that are logged by Pravega are reported as metrics and available on the Pravega Alerts Dashboard in the Grafana UI.

Kubernetes logs SDP generates all the standard logs in native Kubernetes. Users with cluster-admin role on the SDP cluster can access these logs using native Kubernetes commands.

138 Monitor Health

Use Pravega Grafana Dashboards

Topics:

Grafana dashboards overview Connect to the Pravega Grafana UI Retention policy and time range Pravega Alerts dashboard Pravega Controller Dashboard Pravega Operation Dashboard Pravega Scope dashboard Pravega Segment Store Dashboard Pravega Stream dashboard Pravega System dashboard Custom queries and dashboards InfluxDB Data

Grafana dashboards overview The Grafana dashboards show metrics about the operation and efficiency of Pravega.

The Streaming Data Platform installer deploys the following metrics stack in the same Kubernetes namespace (nautilus- pravega) with Pravega.

InfluxDB is an open-source database product. Grafana is an open-source metrics visualization tool.

The InfluxDB instance that is deployed in SDP contains a preconfigured pravega database. The database is defined with four retention policies and a set of continuous queries to move aggregated data from shorter retention policies to the longer ones.

Pravega reports metrics automatically and continuously into InfluxDB. SDP adds processes to continuously aggregate and delete the metrics according to the defined retention policies. The result is a self-managing database.

The predefined dashboards are:

Dashboard Description

Pravega Alerts Dashboard Monitors the health of Pravega in the cluster

Pravega Controller Dashboard Shows operational metrics for the Pravega Controller

Pravega Operation Dashboard Shows various operational latencies and read/write throughput

Pravega Scope Dashboard Shows scope total throughput rates, throughput by stream, and maximum per segment rates

Pravega Segment Store Dashboard Shows segment store operational metrics

Pravega Stream Dashboard Shows stream-specific throughput, segment metrics, and transaction metrics

Pravega System Dashboard Shows details about heap and non-heap memory, buffer memory, garbage collection memory, and threads

You may create additional customized dashboards using any of the metrics that are stored in InfluxDB.

Some of the Pravega metrics are shown in the SDP UI, on the main Dashboard and the Pravega Stream pages. Administrators can inspect the reported data in more detail on the Grafana dashboards. Administrators can identify developing storage and memory problems by monitoring the dashboards. The dashboards also help identify stream-related inefficiencies, and provide a way to drill into problems.

17

Use Pravega Grafana Dashboards 139

The dashboards are available only to users with admin role.

Connect to the Pravega Grafana UI The Grafana dashboards are available to SDP users with admin role.

Steps

1. Choose one of the following ways to access the Grafana dashboards: If you are already logged on to the SDP UI as an admin, click Pravega Metrics.

NOTE: The link appears only for admin users.

Use the Grafana endpoint URL in your browser. See Obtain connection URLs on page 96. On the login screen that appears, enter your SDP admin credentials.

The Grafana UI appears.

2. In the Tools strip on the left, click Dashboards > Manage. A list of predefined dashboards appears. The dashboard names are links.

3. Click a dashboard name to display that dashboard.

4. Most dashboards have controls, in the form of drop-down menus, that let you fine-tune the data to display.

For example, some dashboards have a Retention control that lets you choose the retention policy from which to pull the data.

Retention policy and time range On the Pravega dashboards, the time range and retention policy settings work together to define the data that is displayed.

Time range

The time range control is a standard Grafana feature. In any dashboard banner, click the clock icon on the right side of the banner to display the time range choices. Click a range to select it, or define your own absolute range.

140 Use Pravega Grafana Dashboards

Figure 18. Time range on Grafana dashboards

Retention

The retention control is specific to SDP. It selects the aggregation level of the data to display. The following table shows the internally defined retention policies and associated aggregation levels.

Retention policy Aggregation level Description

two_hour Original metrics reported by Pravega every 10 seconds

The original 10-second metrics are deleted after 2 hours.

Use with time ranges that are between 10 seconds and 2 hours. If you want to examine metrics older than 2 hours, use one of the other retention choices.

one-day 1-minute periods, aggregrated from the 10- second metrics.

The 1-minute aggregated metrics are deleted after 1 day.

Use with time ranges that are between 1 minute and 1 day.

one_week 30-minute periods, aggregated from the 1- minute metrics.

The 30-minute metrics are deleted after 1 week.

Use with time ranges that are between 30 minutes and 1 week.

one_month 3-hour periods, aggregated from the 30-minute metrics.

The 3-hour aggregated metrics are deleted after 1 month.

Use Pravega Grafana Dashboards 141

Retention policy Aggregation level Description

Use with time ranges that are between 3 hours and 1 month.

Interactions between time range and retention

Some time range and retention combinations may not show any data. If the time range specified is less than the aggregation period in the retention choice, the combination results in no data. As examples:

The two_hour retention choice shows data that exists in the data base for a maximum of two hours. A time range of Last 12 hours can only show data for the last two hours.

The one_week retention choice shows data in 30-minute periods. A time range of Last 5 minutes does not show any data. Any range of 30 minutes or less will not show any data. A time range of Last month can only slow data for the last week.

The one-month retention choice shows data in 3-hour periods. A time range of Last hour does not show any data. Any range of 3 hours or less does not show any data. A time range of Last year can only show data for the last month.

Pravega Alerts dashboard The Pravega Alerts Dashboard reports critical and warning conditions from various checks on Pravega health.

Controls

retention Choose a retention policy, which controls the aggregation periods of the displayed data. NOTE: The retention control applies only to the Pravega Errors and Warnings charts. The Critical

Alerts and Warning Alerts charts always use the two_hours retention policy.

retention aggregation period

two_hours 10 seconds

one_day 1 minute

one_week 30 minutes

one_month 3 hours

Be sure to choose a time range that is compatible with the retention choice. See Retention policy and time range on page 140 for more information.

Description

This dashboard has three sections that you can expand and contract with drop-down arrows. The Pravega Logged Errors and Warnings section shows summary metrics about numbers of logged errors and warnings. The Critical Alerts section shows counts for specific critical messages that were issued over the time range. The Warning Alerts section shows counts for specific warnings that were issued over the time range.

142 Use Pravega Grafana Dashboards

Obtain condition descriptions for the alert charts

In the Critical Alerts and Warning Alerts sections, each chart includes an information icon in its upper left corner. Hover your cursor over the icon to view a description of the condition that the chart is monitoring.

Pravega Controller Dashboard Use this dashboard to view metrics about the Pravega Controller.

Controls

retention Choose a retention policy, which controls the aggregation periods of the displayed data.

Retention Aggregation period

two_hours 10 seconds

one_day 1 minute

one_week 30 minutes

one_month 3 hours

Be sure to choose a time range that is compatible with the retention choice. See Retention policy and time range on page 140 for more information.

Description

This dashboard has four sections that you can expand and contract with drop-down arrows.

The Controller Transaction Operations section shows counts for created transactions, commits, cancels, and open transactions.

The Controller Stream Operations section shows metrics for stream creation, stream seals, stream deletes, stream truncation, and stream retention.

The Controller Scope Operationssection shows metrics for create scope count, create scope latency, delete scope count, and delete scope latency.

Use Pravega Grafana Dashboards 143

The Controller Container Operations section shows metrics for segment containers within segment stores, including segment store instance failures.

Pravega Operation Dashboard The Pravega Operation Dashboard shows the operational metrics for multiple components that are involved in Pravega operations, including segmentstores and the Pravega Bookkeeper client.

Controls

host Choose a specific segment store or choose All.

retention Choose a retention policy, which controls the aggregation periods of the displayed data.

Retention Aggregation period

two_hour 10 seconds

one_day 1 minute

one_week 30 minutes

one_month 3 hours

Be sure to choose a time range that is compatible with the retention choice. See Retention policy and time range on page 140 for more information.

Description

This dashboard has six sections that you can expand and contract with drop-down arrows.

The Current Latency Stats section shows the current values of different levels of Read/Write latencies. These values are color-coded and turn red if their value goes above 50 millisecond.

NOTE: Monitoring for red values can help you catch problems.

The Throughput Rates section shows the total throughput rates for Tier 1 (Bookkeepers) and long-term storage. For more information about Pravega tiered storage, see this section in the Pravega documentation. This section includes both the user-created streams and the system streams needed for Pravega operation.

The Segmentstore - Segment Latencies section reports Tier 1 read/write latencies.

The latency graphs show percentile groups, as follows:

144 Use Pravega Grafana Dashboards

Legend indicator Percentile

p0.1 10% percentile

p0.5 50% percentile

p0.9 90% percentile

p0.99 99% percentile

p0.999 99.9% percentile

p0.9999 99.99% percentile

The Segmentstore - Storage Latencies section shows Read/Write latencies about long-term storage.

NOTE: Monitoring these metrics can provide hints about communication problems with long-term storage.

The Segmentstore - Container Latencies section shows metrics for Pravega segment container entities (not to be confused with the Docker containers running the segment stores). The following metrics are included:

Container Processors in Flight Distribution Container Operation Queue Size Distribution Container Batch Size Distribution Container Operation Commit Count Distribution

Container Operation Processor Delay Latency (grouped by Throttler) Container Queue Wait Time Latency Container Operation Commit Latency Container Operation Commit Memory Latency Container Operation Latency

The Segmentstore - Bookkeeper section contains Bookkeeper client metrics. The native Bookkeeper metrics are not available here.

Pravega Scope dashboard The Pravega Scope dashboard shows the total throughput rates and maximum per segment rates for user streams in a Pravega scope.

Controls

scope Choose the scope name that you want to see metrics for.

stream type Choose to show metrics for system streams, user-defined streams, or all streams.

retention The retention choice defines the aggregation level of the displayed data. The default retention is two- hours. It shows data in 10-second intervals.

Also choose a compatible time range.

See Retention policy and time range on page 140 for more information.

Description

This dashboard has 3 sections that you can expand and contract using the dropdown arrows. Write bytes Read bytes Write events

All three sections are organized in a similar way.

The panels on the left show individual throughput rates for each stream in the scope, plus a total for the scope.

NOTE: These charts show which streams have high load and which ones do not have any load.

Use Pravega Grafana Dashboards 145

The panels on the right show the write or read rate for the segment with the highest rate within the scope.

NOTE: If you see something alarming at the scope level, you can drill down into the problem on the Pravega Stream

dashboard.

Pravega Segment Store Dashboard Use this dashboard to view segment store activity.

Controls

retention Choose a retention policy, which controls the aggregation periods of the displayed data.

Retention Aggregation period

two_hours 10 seconds

one_day 1 minute

one_week 30 minutes

one_month 3 hours

Be sure to choose a time range that is compatible with the retention choice. See Retention policy and time range on page 140 for more information.

Description

This dashboard has the following sections that you can expand and contract with drop-down arrows.

The Segment Store Cache Metrics section provides insight into space usage in the segment store cache. The Segment Store Storage Writer Metrics section shows the activity of segment stores moving data to long-term

storage. The Segment Store Table Segments Metrics section shows the performance and rate of requests that are related to

table segments. (The Controller uses table segments to store Pravega metadata.) The Segment Store Container Metrics section shows segment activity, such as segment counts, segment creation,

deletion, and merges, and log file size. The Segment Store SLTS Metrics section.

146 Use Pravega Grafana Dashboards

Pravega Stream dashboard The Pravega Stream dashboard shows details about specific streams.

Controls

stream Choose a stream name within the selected scope. When the scope selection changes, the stream dropdown menu is repopulated with appropriate stream names.

stream type Choose to show metrics for system streams, user-defined streams, or all streams.

scope Choose a scope name.

retention Choose a retention policy, which controls the aggregation periods of the displayed data.

Retention Aggregation period

two_hour 10 seconds

one_day 1 minute

one_week 30 minutes

one_month 3 hours

Be sure to choose a time range that is compatible with the retention choice. See Retention policy and time range on page 140 for more information.

Description

This dashboard contains a row of metrics followed by five sections that you can expand and contract using the drop-down arrows.

The row of metrics at the top shows the latest values available for this stream in the chosen retention policy. For example, if you choose one_month retention, the values can be as old as three hours ago because the data points are aggregated only every three hours for that retention policy.

The Segments section shows the number of segments, segment splits, and segment merges over time.

NOTE: The Pravega controller reports these metrics. When no changes are happening, the controller does not report

metrics, and this could be reflected in the charts if there are no metrics reported during the time period selected. You can

always view the current metrics on the Stream page in the SDP UI. Those metrics are collected using the REST API rather

than relying on reported metrics from the controller. Another advantage of the SDP UI's Stream page is the heat charts for

stream segments. Those are not available in Grafana.

The following three sections appear next.

Write Bytes Read Bytes Write Events

Use Pravega Grafana Dashboards 147

These sections are all organized in the same way. The panels on the left show totals for the stream. The panels on the right show maximum segment rates.

NOTE: Inspecting the maximum per segment rate is complementary to using the heat charts in the SDP UI.

The Transactions section appears last. This section contains data only if the stream performs transactional writes.

NOTE: In the left panel, monitor the number of aborted transactions. Too many aborted transactions could indicate a

networking problem or a problem in the business logic of the Flink or Pravega application.

Pravega System dashboard The Pravega System Dashboard shows the JVM metrics for Pravega controllers and segment stores, one host container at a time.

Controls

host Choose the reporting container.

retention Choose a retention policy, which controls the aggregation periods of the displayed data.

Retention Aggregation period

two_hours 10 seconds

one_day 1 minute

one_week 30 minutes

one_month 3 hours

Be sure to choose a time range that is compatible with the retention choice. See Retention policy and time range on page 140 for more information.

Description

This dashboard has three sections that you can expand and contract using the drop-down arrows.

The Totals section shows the memory usage by the host JVM for heap and non-heap areas.

NOTE: Watch for Used or Committed memory approaching the Max memory. If this happens, you might need to tweak

the Pravega deployment parameters. Either increase the memory per container or increase the number of the component

replicas, as your K8s environment permits.

148 Use Pravega Grafana Dashboards

The GC section of the dashboard shows garbage collector metrics.

The Threads section shows thread counts and states.

Use Pravega Grafana Dashboards 149

Custom queries and dashboards You can create custom queries or new dashboards using any data in InfluxDB.

Queries

You can explore the Pravega metrics available in InfluxDB by creating ad hoc queries. This feature gives you a quick look at metrics without having to define an entire dashboard.

Click the Explore icon on the left panel of the Grafana UI. For datasource, choose pravega-influxdb.

Create your query against any measurement available in the database.

You cannot save the queries. Create a custom dashboard to save queries.

Custom Dashboards

You may create new, custom dashboards from any data available in the pravega-influxdb datasource. See the next section for an introduction to the metrics structure.

If you want to customize the predefined dashboards, Dell Technologies strongly recommends that you save the changes as custom dashboards, rather than overwriting the original ones. You are logged in as a Grafana Editor which enables you to edit and overwrite the dashboards.

NOTE: If you overwrite the original dashboards, your changes are lost if the Pravega Dashboards are updated in a

subsequent SDP release.

InfluxDB Data This section provides an overview of the metrics that are stored in InfluxDB.

Pravega metrics

Pravega metrics are stored in InfluxDB according to the naming conventions described in the MetricsNames.java file, with underscores ( _) replacing the periods (.). For example, segmentstore.segment.write_bytes is stored as segmentstore_segment_write_bytes. All metrics are tagged with their host, which is the Pravega pod reporting the metric. Some of the metrics are tagged with scope, stream, segment, or container (if applicable).

The original metrics from Pravega are prefixed with pravega_. Most of the metrics on the Grafana dashboards do not have that prefix because they represent an aggregation over the original Pravega metrics. For example, typical metrics in the dashboards are rates that are calculated on the originally reported counts.

For more information about Pravega metrics, see Pravega documentation.

150 Use Pravega Grafana Dashboards

Calculated rates

In addition to the original Pravega metrics, the database contains some precalculated rates to enable faster InfluxDB queries for certain inquiries.

Segment Read/Write rates are tagged with scope, stream, and segment. They are stored in the following measurements with the Rate field:

segmentstore_segment_read_bytes segmentstore_segment_write_bytes segmentstore_segment_write_events

Stream-level Read/Write rate aggregates are tagged with scope and stream and stored in the following:

segmentstore_stream_read_bytes segmentstore_stream_write_bytes segmentstore_stream_write_events

Global Read/Write rate aggregates over all segments, streams, and scopes are tagged with the segmentstore instance in the host tag. They are stored in the following:

segmentstore_global_read_bytes segmentstore_global_write_bytes segmentstore_global_write_events

Pravega long-term storage Read/Write rates are available as storage rates:

segmentstore_storage_read_bytes segmentstore_storage_write_bytes

Bookkeeper client write rate is stored here:

segmentstore_bookkeeper_write_bytes

Transactional rates are available at the stream level. They are tagged with scope and stream. They are reported only if transactional writes are happening on the stream.

controller_transactions_aborted controller_transactions_created controller_transactions_committed

There are also two gauges for transactions:

controller_transactions_opened controller_transactions_timedout

Use Pravega Grafana Dashboards 151

Expand and Scale the Infrastructure

Topics:

Difference between expansion and scaling Expansion Scaling

Difference between expansion and scaling As your SDP accommodates more projects, more streams, and more analytic applications, it may require expansion and scaling.

NOTE: Dell Technologies recommends that you engage Dell Technologies Customer Support before using these

instructions. The support team can clearly determine the bottlenecks in your current setup and your expansion needs.

Expansion Expansion means to add resources to the underlying infrastructure. For SDP, expansion activities are: 1. Determine expansion needs. 2. Add capacity by adding new hosts (or racks) to the underlying cluster of hosts. 3. Add the new hosts to the supporting distributed switches. 4. You may also add capacity vertically by adding more disks if slots are available.

Scaling Scaling means to configure the Kubernetes cluster and SDP components to take advantage of the new resources. Scaling tasks are: 1. Determine scaling recommendations. 2. Scale the Kubernetes cluster. 3. Scale components in SDP.

Expansion The following sections describe how to determine SDP expansion needs and how to perform the expansion on an existing platform.

Determine expansion requirements

If the SDP infrastructure needs expanding with additional resources, contact Dell Technologies technical support to discuss specific requirements for your use cases.

Some indicators that the underlying infrastructure may need additional resources are:

At the Pod LevelAll the pods may have high utilization (CPU, memory, or disk) and scaling up replicas fails. At the host levelHosts show high utilization when you compare current and past utilizations.

The OpenShift cluster administrator should engage with Dell Technologies technical support to determine the capacity to add. The technical support team uses current usage reports from SDP to analyze resource usage and make recommendations.

18

152 Expand and Scale the Infrastructure

Add new rack

To add a rack, contact the Dell Technologies support team for guidance. The support team can help determine sizing and configuration requirements.

Add nodes to the OpenShift cluster

You can adjust the number of worker machines in your OpenShift Container Platform cluster. You scale the worker machines by increasing the number of replicas that are defined in the worker machine set.

About this task

To add nodes to a Kubespray cluster, see Manage SDP Edge and SDP Micro on page 62.

Steps

1. Engage with the Dell Technologies support team for guidance. The support team can help determine sizing and configuration requirements.

2. See https://docs.openshift.com/container-platform/4.6/scalability_and_performance/recommended-cluster-scaling- practices.html.

Add supporting storage

You can expand the capacity of the storage that SDP uses for project artifacts by expanding the OpenShift cluster.

Steps

See https://docs.openshift.com/container-platform/4.6/installing/installing_bare_metal_ipi/ipi-install-expanding-the- cluster.html.

Scaling The following sections describe how to determine appropriate scaling values and how to perform the scaling tasks.

Get scaling recommendations

The provisioner.py script makes scaling recommendations for the OpenShift cluster and the SDP internal components. You provide information about newly added hosts, and the script outputs appropriate scaling recommendations.

Prerequisites

Expand the underlying infrastructure with additional hosts before using this procedure.

Steps

1. Go to the folder where you extracted the decks-installer- .zip file.

2. Run the provisioner.py script.

NOTE: The provisioner.py script must run from inside the scripts directory.

Change directories to the scripts directory and then run the script with the --help option to see all arguments.

$ cd scripts $ python3 provisioner.py --help

Expand and Scale the Infrastructure 153

Run the script with only the --outfile argument, and receive prompts for all the other arguments. The log file saves the prompts and answers.

$ cd scripts $ python3 ./scripts/provisioner.py --outfile mydir/provisioner

Run the script with all or some arguments specified. The --outfile argument is always required.

You receive prompts only for the missing arguments. Skip the input prompts completely by supplying all arguments. Skipping prompts is useful for automating the script.

Option Description

--num-hosts-present n Number of hosts initially in the OC cluster.

--num-hosts-added n Number of hosts added.

--num-host-failures n Number of host failures to tolerate among newly added hosts. 0 is typical and acceptable. During the initial installation, some failures

to tolerate were already considered. If you are doubling the size of the cluster, then 1 is recommended.

--num-physical-cpu n Number of physical CPU cores per each host. This value depends on the underlying infrastructure deployment plan used.

--mem-gb n Memory in GB per each host.

--local-disk-count n Number of local disks used for Bookkeeper per host.

--percent-analytics n Provide the percentage of added resources to allocate to the analytics engine. The system assigns the remainder of added resources to Pravega. Enter a number from 0 to 100.

--outfile pathname Required. Provide the pathname for the output file from this command.

3. Save the output for later use with the scaling.py script.

The output is similar to the following:

bookkeeper: 1 controller: 1 failures_to_tolerate: 0 metadata_heavy_workload: false segment_store: 1 segment_store_cache_max_size: 28991029248 segment_store_jvm_options_direct_memory: 28 segment_store_jvm_options_mx: 4 vm_cpus: 8 vm_local_drives: 0 vm_ram_gb: 32 vms: 3 worker_vm_count_for_applications: 3 worker_vm_count_for_pravega: 3 zookeeper: 1

Scale the K8s cluster

You can scale the cluster by changing the number of worker nodes in the cluster.

About this task

First determine the optimal number of worker nodes to configure and then resize.

Steps

1. Calculate the new number of worker nodes to configure.

154 Expand and Scale the Infrastructure

The new number of worker nodes is the current number plus the recommended increases. The increases are for Pravega and for applications as reported in the output from the provisioner.py script.

= existing + worker_vm_count_for_applications + worker_vm_count_for_pravega

NOTE: For baremetal deployments, read worker_vm_count as worker_count (worker nodes in Kubernetes).

2. Run the following command to change the number of worker nodes in the Kubernetes cluster:

oc resize <cluster-name> --num-nodes

Where: cluster-name is the SDP Kubernetes cluster name. is from the previous step.

Scale SDP

Run the scale.py script to scale internal components in SDP.

About this task

This script uses the following as input: The sizing recommendations file that the provisioner.py script generated. The values.yaml files containing the current configuration settings.

The script generates a file with adjusted values for some configurations. Dell Technologies recommends that you always run the script twice:

The first time, use the --dry-run option. This option generates a file of proposed changes but does not apply them. You can review the changes.

The next time, omit the --dry-run option. In addition to generating a file of proposed changes, the script creates a job on the cluster to track scaling. If scaling is blocked, it sends alerts.

Using the output file from this script, the decks-install apply command performs the scaling work.

Steps

1. Go to the folder where you extracted the decks-installer- .zip file.

2. Run the script with the --help option to view all options.

$ python3 ./scripts/scale.py --help

3. Run the scale.py script with the --dry-run option.

$ python3 ./scripts/scale.py --dry-run -p -i <yamlfile1,yamlfile2,...

Where:

is the sizing output from the provisioner.py script.

values are the configuration values filenames that were used during platform installation. Provide all the filenames. Separate filenames with commas and do not include spaces.

4. Review the script output.

The output is a summary of resources that would occur after scaling. If the output is not as you expected, rerun the provisioner.py script with different values.

5. When the dry-run output is acceptable, rerun the scale.py script, this time omitting the --dry-run option.

$ python3 ./scripts/scale.py -p -i <yamlfile1,yamlfile2,...

Expand and Scale the Infrastructure 155

In addition to the resource summary, the scale.py script without the --dry-run option produces the following output:

The resource summary as in the dry run A yaml file that contains the scaling changes to apply to your configuration.

On-screen information about the scaling changes and the location of the generated yaml file.

INFO - Current configuration, zookeeper: 3, bookkeeper: 6, segmentstore: 4, controller: 4 INFO - Proposed configuration, zookeeper: 3, bookkeeper: 7, segmentstore: 5, controller: 5 INFO - {'zookeeper_status': 'None', 'bookkeeper_status': 'None', 'segment_store_status': 'None', 'controller_status': 'None'} INFO - Run decks-install with file /Users/mydir/Downloads/decks-installer-1/ scaling_values_04-19-2020_06-17-24_PM.yaml added at the end of your values file list

6. Run decks-install apply, adding the generated file from the last step onto the end of your standard list of yaml filenames.

For example:

./decks-install-darwin-amd64 apply -k ./manifests/ --repo ./charts/ -f ./my-values1.yaml,./my-values2.yaml,./my-values3.yaml,/Users/mydir/Downloads/decks- installer-1/scaling_values_04-19-2020_06-17-24_PM.yaml

This command applies the configurations that trigger scaling actions.

7. Monitor the configuration changes on the UI.

The changes take time. Give Kubernetes time to cycle through the synchronizations and settle. To monitor the changes, go to Settings > Pravega cluster.

Scale Apache Flink resources

You can scale the resources available for processing Apache Flink jobs.

Prerequisites

This procedure assumes the following: The underlying infrastructure was expanded by adding additional hosts. The underlying SDP cluster was sized and scaled up, creating additional worker nodes.

About this task

Apache Flink supports changing the parallelism of a streaming job. It provides this support by restoring the job from a savepoint using a different parallelism. It supports changing the parallelism for the entire job and the operator parallelism.

You can change any of the following attributes. You can change these attributes while jobs are running.

The number of Task Managers (replica count) for a Flink cluster The default parallelism for a Flink application The parallelism specification for a Flink application

156 Expand and Scale the Infrastructure

Steps

1. To change the number of Task Managers (replica count) for a Flink cluster:

a. Log in to the SDP UI as an admin or project member. b. Click Flink Clusters > Flink > Clusters. c. Locate the cluster name and click Edit in the Action column. d. In the Task Managers section, change the Number of Replicas. e. Click Save.

The Flink operator scales the Task Managers to the requested number.

2. To update the default parallelism for an application:

NOTE: Scaling applications interrupts service.

a. Log in to the SDP UI as an admin or project member. b. Click Apps > Flink > Apps. c. Locate the application name and click Edit in the Action column. d. Click Properties. e. In the Configuration section, change the Parallelism field. f. Click Save.

NOTE: An application does not necessarily use the changed default parallelism. Usage depends completely on how the

user application is developed. A suggestion is that the application accepts a parameter that defines the parallelism for a

particular step. Changing the parameter would force the application to redeploy.

After changing any properties, Flink automatically does the following: Stops affected applications. If required, uses a Savepoint. Redeploys the affected applications using the new values. If a Savepoint was used, redeploys from the Savepoint.

3. To update the parallelism defined in an application specification (an uploaded artifact):

a. Edit the application artifact. b. Log in to the SDP UI as an admin or project member. c. Click Analytics > project-name > Apps. d. Locate the application name and click Edit in the Action column. e. Click Properties. f. In the Configuration section, upload the updated specification.

After uploading the new specification, Flink automatically does the following: Stops affected applications. If required, uses a Savepoint. Redeploys the affected applications, using the new specification. If a Savepoint was used, redeploys from the Savepoint.

Impact of cluster expansion and scaling

Scaling Pravega

Scaling Pravega segment stores results in a rebalance of segment containers. This rebalance moves some segment containers to segment stores on new hosts.

The impact on ingest is limited to the client timeout value. The timeout occurs only for connections to segment stores that are handling a stream whose containers were moved to a new segment store. The retry after a timeout succeeds, because the client would reconnect to the new segment store.

Scaling Analytics

Scaling Pravega stores has some impact on readers. For example, cache on a new segment store does not have any entries from the previous segment store. Tail readers become historical readers (for a short time).

To leverage newly added resources, you may change the number of replicas in a Flink cluster or the application parallelism. These updates cause applications running in the affected Flink cluster to restart.

Expand and Scale the Infrastructure 157

Troubleshooting

Topics:

View versions of system components Troubleshooting tools Access the troubleshooting tools Kubernetes resources Log files Useful troubleshooting commands FAQs Application connections when TLS is enabled Online and remote support

View versions of system components The SDP UI shows version numbers for the installed software components in the platform.

About this task

For troubleshooting and maintenance tasks, it is useful to know the SDP version and versions of software components.

Steps

1. Log in to the SDP UI as an admin.

2. To view the SDP version, click System > Product.

3. To view versions of components, click System > Components.

The Version column shows the installed version for each service, broker, and other software components.

Troubleshooting tools

sdp-debug The sdp-debug command is a diagnostic tool that helps to uncover issues on the SDP cluster. The tool runs a list of available checks and reports detected anomalies.

Use sdp-debug --help for usage information.

sdp-support The sdp-support command is a data collection tool. It creates a support bundle that includes information about the SDP cluster state, configuration, logs, and heap dumps. Redaction rules are applied to the collected data. Then the data is archived to sdp-support- .zip. The support bundle can assist in troubleshooting an issue.

Use sdp-support --help for usage information.

Other tools Other tools and scripts that are available inside the service container include: kubectlKubernetes command-line tool that can be used to inspect and edit the cluster resources.

helmHelm command-line tool for managing Helm releases.

See Useful troubleshooting commands on page 163 for some help in getting started with kubectl and helm commands.

19

158 Troubleshooting

Access the troubleshooting tools Use any of the following methods to run troubleshooting tools.

Steps

1. From outside the cluster:

a. ssh to the remote access service:

SERVICE_POD_IP=$(kubectl get svc -l=app=streamingdata-service-pod -n=nautilus- system -o=jsonpath='{.items[0].status.loadBalancer.ingress[0].ip}')

ssh svcuser@SERVICE_POD_IP

b. Respond to the prompt for a password based on service-pod configuration.

2. Start a service container in your local Docker using the sdp-tools.sh tool.

a. Locate sdp-tools.sh in the script directory inside the decks-installer bundle:

./scripts/sdp-tools.sh -r

NOTE: The docker registry must contain the remote access image that is shipped with the distribution.

b. An alternate way to obtain the sdp-tools.sh script is to load the image from sdp- -decks_kahm- images.tar.gz.

./scripts/sdp-tools.sh -i

c. After the image is pulled or loaded successfully to your local Docker, run ./scripts/sdp-tools.sh.

Kubernetes resources This section describes the Kubernetes resources in an SDP cluster.

Namespaces

The SDP cluster contains the following namespaces .

catalog Contains the service catalog for the cluster.

cluster- monitoring

Contains services that monitor Kubernetes cluster components and SDP nodes. Sends alerts through the KAHM monitoring service to Grafana dashboards.

nautilus-system Contains SDP software.

nautilus-pravega Contains the Pravega store and Pravega software.

Project-specific namespaces

Each user-created project has its own namespace. The namespace name is the project name.

Kubernetes- specific namespaces

There are many other namespaces specific to the Kubernetes environment.

nautilus-system- operators

Contains the SDP operator.

logging Contains the logging component that saves the logs from SDP resources to persistent storage.

Troubleshooting 159

Components in the nautilus-system namespace

The nautilus-system namespace contains components to support SDP functions.

Components in nautilus-system

Subsystem Name Description

Core SDP Operator

Cert Manager Provisions and manages TLS certificates.

External DNS Dynamically registers DNS names for platform services and ingress connections.

Metrics Operator Manages InfluxDB and Grafana metrics stack.

Nautilus UI Provides the web UI for managing the platform.

NFS Client Provisioner or

ECS service broker

Provisions persistent volumes within the configured NFS server.

Provisions ECS storage buckets.

NGINX Ingress Ingress controller and load balancer

Zookeeper Zookeeper-operator Manages the Pravega Zookeeper cluster and all the Zookeeper clusters for all projects.

Security Keycloak Provides identity and access management for applications and services.

Keycloak-webhook Injects Keycloak credentials into relevant pods.

Keycloak-postgresql Handles Keycloak roles and Keycloak clients.

Flink services Flink-operator Manages Flink clusters and Flink applications.

Project-operator Manages analytic projects.

Spark services spark-operator Runs the Spark engine in the cluster.

Serviceability DECKS Manages SupportAssist registration, call-home, and licensing.

KAHM Provides event and health management services.

Monitoring Provides monitoring of resource usage.

PSearch psearch-operator Creates PSearch resources as needed.

GStreamer gstreamer-operator Creates GStreamer resources if needed.

CRs in nautilus-system

The nautilus-system namespace defines the following custom resources (CRs). Their operators are included in the list of components above. ProjectSystem ZookeeperCluster Project FlinkCluster FlinkApplication FlinkClusterImage InfluxDB Grafana Telegraf InfluxDBDatabase GrafanaDashboard GrafanaDashboardTemplate

160 Troubleshooting

SparkApplication PravegaSearch FlinkSavepoint ProjectSystems ProjectFeatures CAbundles ArtifactRepositories RuntimeImages CameraRecorderPipelines PravegaVideoServers GStreamerPipelines

Components in the nautilus-pravega namespace

The nautilus-pravega namespace contains components to support Pravega functions within the SDP platform.

Components in nautilus-pravega

Component name Description

pravega-operator Manages Pravega clusters

pravega-cluster Pravega software

pravega-service-broker Provisions Pravega scopes

bookkeeper-operator Manages the bookkeeper resource

bookkeeper-cluster Manages the bookies

schema-registry Manages the Pravega schema registry

schema registry pods

Pravega InfluxDB pod

Pravega Grafana pod

Pravega Grafana gatekeeper pod

zookeeper-cluster Manages zookeeper pods needed by pravega.

Custom resources in nautilus-pravega

PravegaCluster BookkeeperCluster

Components in project namespaces

Each analytic project has a dedicated Kubernetes namespace.

A project's namespace name is the project name. For example, a project that you create with the name test has a namespace name of test.

For additional information about project namespaces, see Manage projects on page 118.

Troubleshooting 161

Components in cluster-monitoring namespace

The cluster-monitoring namespace supports SDP monitoring of node health and the Kubernetes cluster health.

Components in cluster-monitoring

Component name Description

cluster-monitoring Monitors the following components: K8s deployments missing replicas K8s statefulset missing replicas K8s daemons missing replicas K8s nodes disk health K8s nodes CPU/mem and file system resources

It sends alerts through KAHM to the SRS or SCG backend.

Components in the catalog namespace

The catalog manages service instances and service bindings. It manages instances for resources such as keycloak client, keycloak role, a pravega scope, and so on. Service bindings provide access (including the URL and credentials) to those resources.

Components in catalog

Component name Description

service-catalog Manages service instances and service bindings.

CRs in catalog

clusterservicebrokers clusterserviceclasses clusterserviceplans servicebindings servicebrokers serviceclasses serviceinstances serviceplans

Log files This section describes how to obtain useful logs.

Get installation logs

To track installation progress, you can monitor the installation logs. From the installation folder, look at decks- install.logs.

162 Troubleshooting

Get pod logs for a namespace

List pod names:

kubectl get pods --all-namespaces

Get information about a pod:

kubectl describe pod -n

For example:

kubectl describe pod keycloak-service-broker-797849c678-52pnl -n nautilus-system

Get the logs for a pod in a namespace:

kubectl logs -n

Useful troubleshooting commands This section introduces CLI commands that can help you get started with researching problems in an SDP deployment.

OpenShift client commands on page 163 helm commands on page 163 kubectl commands on page 163

OpenShift client commands

Use OpenShift client (oc) commands to manage the environment in which your SDP cluster exists.

oc --help | grep cluster

List commands related to cluster information.

oc cluster-info Get information about the cluster.

oc adm ... Tools for managing a cluster

oc run ... Run a specified image on the cluster.

helm commands

Use helm commands to manage the Kubernetes packages that are installed in your cluster. You can also check the current helm version number.

The following helm commands are useful when getting started with troubleshooting SDP deployments. For descriptions of all helm commands and their syntax, see https://helm.sh/docs/.

helm version Shows the client and server versions for Helm and Tiller.

helm ls --all Lists all the releases that are installed in the cluster.

helm ls --all --short

Generates an abbreviated output of the above command.

kubectl commands

Use kubectl commands to investigate Kubernetes resources in the cluster.

The following kubectl commands and flags are useful for troubleshooting SDP deployments. For descriptions of all kubectl commands and their syntax, see https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands.

Troubleshooting 163

Common flags

The following flags apply to many commands.

--all- namespaces

Applies a command, such as kubectl get, to all namespaces rather than to a named namespace.

-o yaml Outputs a yaml formatted API object.

Useful commands

kubectl config use- context name>

Switch your current command-line context to the named cluster. Use this command when you are logged into two clusters in the same session.

kubectl cluster-info

Display addresses of the master and services with the label kubernetes.io/cluster- service=true.

kubectl api- resources ...

Print supported API resources. Some useful flags in this command are: --verbs=[verbs] to limit the output to resources that support the specified verbs.

--namespaced={true | false} to include or exclude namespaced resources. If false, only non-namespaced resources are returned.

-o {wide | name} to indicate an output format .

The following example displays all resources that support the kubectl list command. It includes namespaced resources and shows the output in the shortened name format.

kubectl api-resources --verbs=list --namespaced=true -o name

kubectl get ...

List resources of the specified resource type. Some useful kubectl get commands for SDP are:

kubectl get pods kubectl get pods --all-namespaces kubectl get services kubectl get deployments kubectl get deployment kubectl get nodes kubectl get events kubectl get storageclass kubectl get serviceaccounts

kubectl describe ...

Show details of a specific resource or group of resources.

kubectl logs ...

Display logs for a container in a pod or specified resource. If the pod has only one container, the container name is optional.

For example:

kubectl logs -n

kubectl exec ...

Run a command in a container.

kubectl attach ...

Attach to a process running inside a container. For example, you might want to get output from the process.

kubectl run ...

Create a deployment or a job by running a specified image.

164 Troubleshooting

FAQs These frequently asked questions (FAQs) include common installation conditions and operational observations.

My installation does not have all the components installed.

If you invoked the installer with the decks-install apply command, the decks-install sync command is safe to use to resume an existing installation. The command tries to install the remaining components. You can run the decks-install sync command more than once.

Run decks- status. Results should look similar to the following:

[core@service-startline rc3]$ ./decks-install-linux-amd64 status APPLICATION STATE VERSION MANAGED BY bookkeeper-cluster Succeeded 0.7.1-54-29d5ca9 nautilus bookkeeper-operator Succeeded 0.1.3-54-29d5ca9 nautilus catalog Succeeded 0.3.1 nautilus cert-manager Succeeded v0.15.2 nautilus cert-manager-resources Succeeded 1.2-RC3-0-f4a0565c8 nautilus cluster-scaling Succeeded 1.1.0-HF1.RC2-2-3d4a82c nautilus decks Succeeded 1.2.5 nautilus default-runtime-images Succeeded 1.2-RC3-0-f4a0565c8 nautilus dellemc-streamingdata-license Succeeded 1.2.5 nautilus ecs-service-broker Succeeded 1.2-RC3-0-f4a0565c8 nautilus external-dns Succeeded 3.2.4 nautilus external-dns-resources Succeeded 1.2-RC3-0-f4a0565c8 nautilus flink-default-resources Succeeded 1.2-RC3-0-f4a0565c8 nautilus flink-operator Succeeded 1.2-RC1-3-050c9ff00 nautilus kahm Succeeded 1.2.5 nautilus keycloak Succeeded 1.2-W25-1-fcc4562 nautilus keycloak-injection-hook Succeeded 1.2-W14-2-addc3b3 nautilus keycloak-service-broker Succeeded 1.2-W20-4-23bb482 nautilus metrics-operator Succeeded 1.2-RC1-1-70b9d2f nautilus monitoring Succeeded 1.2-W16-4-c899548 nautilus nautilus-ui Succeeded 1.2-RC1-7-0cb1f6b0 nautilus nginx-ingress Succeeded 1.40.3 nautilus openshift-sdp-resources Succeeded 1.2-RC3-0-f4a0565c8 nautilus pravega-cluster Succeeded 1.2-RC3-0-f4a0565c8 nautilus pravega-operator Succeeded 0.5.2-221-5dce68d3 nautilus pravega-service-broker Succeeded 1.2-RC2-2-28c8432 nautilus project-operator Succeeded 1.2-RC1-2-58878d7 nautilus psearch-cluster Succeeded latest nautilus psearch-operator Succeeded 1.2-W26-4-75f8e393 nautilus psearch-resources Succeeded 1.2-RC3-0-f4a0565c8 nautilus schema-registry Succeeded 0.0.1-61-f1b6734 nautilus sdp-operator Succeeded 1.2-W26-2-31ac657 nautilus spark-operator Succeeded 1.2-RC1-4-b0500de nautilus srs-gateway Succeeded 1.2.5 nautilus zookeeper-cluster Succeeded 1.2-RC3-0-f4a0565c8 nautilus zookeeper-operator Succeeded 0.2.9-153-e100c87 nautilus [core@service-startline rc3]$

My uninstall failed. I still see components in helm list.

If the decks-install unapply command fails, try it again. If the rerun does not solve your problem, you can manually remove the charts using helm del --purge --no-hooks chart name. Then retry the decks-install unapply command. This command is necessary in the uninstall process because it deregisters the custom resource definitions that are used with the product. .

My pods are showing that they cannot pull the images.

For sites that do not have access to image repositories and registries, the installer comes with a set of tar ball of docker images. You can register those images in your local registry where the pods can pull them successfully.

Troubleshooting 165

My pods are showing as running but the user interface displays errors.

The running status indicates the readiness of the pod. The status does not indicate anything about errors concerning the applications or services running within the pod. To research errors, check pod logs with the kubectl logs command. The logs include a timestamp that you can correlate with the logs in the other pods to sequence together the chain of actions.

Is there a way to see events for a specific project or application?

Yes. All projects have their corresponding Kubernetes namespace. You can use kubectl get events --namespace to get events only from the namespace that corresponds to a project. Also, the user interface lists system events and logs that correspond to a single application.

My logs show intermittent failures but my pods are all healthy.

Check to see that your applications and services are reachable by both names and IP addresses. This check holds true for ingress, proxy, load-balancer and all the pods. DNS records, text entries, and registrations must be accurate. The DNS provider may have listings of entries that were created during installation. Check with your system administrator or cloud services provider. You can also ping the pods and connect to the services from the containers to ensure that they are reachable within the cluster.

My pods complain that the volume mounts failed.

Delete the pod. Deleted pods come back, and volume mounts are refreshed.

My Keycloak service broker complains that Keycloak is not available.

Most likely, Keycloak is running, but is not resolving by name. Check to see that the Keycloak endpoint is accessible from the keycloak-service-broker. Otherwise, uninstall and reinstall the product.

My user interface does not install. I get a 503 service unavailable or 404 default backend error.

The User Interface is not installed properly. You can use the helm chart to delete the User Interface and install it again. Otherwise, uninstall and reinstall the product . A 404 error in the User Interface implies that the ingress named nautilus-ui in the nautilus-system namespace is not set up properly. When you uninstall and reinstall the product, the ingress is set up automatically.

My ingress and services do not have IP addresses.

This condition may occur if the public IP pool is exhausted. For example, the NSX-T environment defines a public IP address pool. To check for this issue, run the following command, and look for the value pending in the public IP column.

kubectl get svc -n nautilus-system

A Network Administrator may be able to resolve this issue.

My DNS records are not showing, or the installer is not adding these records.

1. The external-dns credentials for the DNS provider may be incorrect, or you may have exceeded the rate limit. For example, with a Route53 DNS service provider, there is a rate limit of 5 requests per second per account. To research the issue, check the logs for the external-dns pod.

166 Troubleshooting

2. Ensure that unique values are used for txtOwnerId and domainFilters in the external-dns configuration. If the same values are used across clusters and policy is set to sync, a new cluster could overwrite all entries with the same txtOwnerId.

yaml txtOwnerId: " . " ## Modify how DNS records are sychronized between sources and providers (options: sync, upsert-only ) # if sync policy is used Please make sure the txtOwnerId has to be unique to the cluster, using . ensures uniqueness policy: sync domainFilters: [ . ]

My DNS records are showing, but the DNS records are not propagated.

Run nslookup keycloak. . If it resolves, see if it is resolving from the pod network as well. Start the dnstools pod using this command: kubectl run -it --rm --image infoblox/dnstools dnstools. From the dnstools pod spawned above, run nslookup keycloak. . If it is not resolving from the dnstools pod, contact an administrator to look into the network configuration of the cluster.

The cert-manager does not issue the certificates.

If the certificate issuer is Let's Encrypt, and you see an entry for Keycloak in the ingress with kubectl get ingress -n nautilus-system, then check if the ingress has a certificate. If the certificate is issued, the output of keycloak-tls certificate issue should be as follows:

kubectl get secret keycloak-tls -n nautilus-system -o jsonpath="{.data.tls\.crt}" | base64 --decode | openssl x509 -text|grep Issuer Issuer: C=US, O=Let's Encrypt, CN=Let's Encrypt Authority X3 CA Issuers - URI:http://cert.int-x3.letsencrypt.org/

If the output looks like the following instead, then check the cert-manager pod logs for error messages. A limit may exist, such as 50 certificates per week per domain.

Issuer: O=cert-manager, CN=cert-manager.local

If you are frequently installing and uninstalling the product, remember to reuse the hostname. Certificate reissues do not count towards the certificate limit. If the certificate was reissued already several times, and you see a message like the following, then use another domain that has not reached the certificate limit.

urn:ietf:params:acme:error:rateLimited: Error creating new order :: too many certificates already issued for exact set of domains: .com: see https://letsencrypt.org/docs/rate-limits/" "key"="nautilus-system/nautilus-ui-tls-2473631805"

If you are using self-signed certificates, you would see the following message instead, and you can use the same steps as above.

kubectl get secret cluster-wildcard-tls-secret -n nautilus-system -o jsonpath="{.data.tls\.crt}" | base64 --decode | openssl x509 -text|grep Issuer Issuer: CN = Self-Signed CA

Is Keycloak ready with TLS?

Check whether you can connect to Keycloak:

kubectl get ingress keycloak nautilus-system openssl s_client -showcerts -servername -connect :443

Then, check that the certificate in the secret is the same as the certificate that Keycloak returns:

kubectl get secret cluster-wildcard-tls-secret -n nautilus-system -o jsonpath="{.data.tls\.crt}" | base64 --decode | openssl s_client

Troubleshooting 167

-showcerts -servername keycloak.cluster1.desdp.dell.com -connect 10.247.114.101:443

My pods are in the containerCreating state.

Check whether the issue is local to the pod. For example, the following message indicates that nautilus-ui is waiting for its secrets. It waits until the keycloak-service-broker starts servicing the service instance requests.

MountVolume.SetUp failed for volume "nautilus-ui" : secrets "nautilus-ui" not found

Check the keycloak-service-broker logs. Eventually after timeout, the keycloak-service-broker starts servicing and creates the secrets.

The cert-manager is not renewing certificates when the LetsEncrypt provider is used.

In certain cases, when cert-manager cannot connect to LetsEncrypt, certificate orders to LetsEncrypt get stuck in the pending state indefinitely. If you see that expired certificates are not renewing, delete the orders that are in a pending state. The cert-manager then creates new orders automatically and completes the certificate renewal process.

kubectl get orders -n nautilus-system kubectl delete order -n nautilus-system

A reader job fails on a Flink cluster.

When you see that a reader job is not progressing (no longer reading), check whether it is stuck on UnknownHostException. Some of the symptoms you may see are:

1. Flink Dashboard shows jobs in a Restarting status instead of Running. 2. Flink job manager continuously throws UnkownHostException with message "Temporary failure in name

resolution".

When the job manager is in this state, it does not recover from exceptions even if you fix the DNS issues that caused the resolution errors.

Here is a workaround example:

kubectl get sts -n longevity-0|grep jobmanager longevity-0-jobmanager 1/1 8d kubectl scale sts longevity-0-jobmanager -n longevity-0 --replicas=0 kubectl scale sts longevity-0-jobmanager -n longevity-0 --replicas=1

After using the workaround, check the logs and the Flink dashboard to see if jobs have a status of Running. Here is an example kubectl command that checks the logs:

kubectl logs -f longevity-0-jobmanager -n longevity-0 -c server

Application connections when TLS is enabled This section describes TLS-related connection information in Pravega and Flink applications.

When TLS is enabled in SDP, Pravega applications must use a TLS endpoint to access the Pravega datastore. The URI used in the Pravega client application, in the ClientConfig class, must start with:

tls://:443

168 Troubleshooting

If the URI starts with tcp://, the application fails with a javax.net.ssl.SSLHandshakeException error.

To obtain the Pravega ingress endpoint, run the following command.

kubectl get ing pravega-controller -n nautilus-pravega

The HOST column in the output shows the Pravega endpoint.

Online and remote support

The Dell Technologies Secure Remote Services (SRS) or Secure Connect Gateway (SCG) and call home features are available for SDP. These features require an SRS or SCG Gateway server configured on-site to monitor the platform. The SDP installation process configures the connection to the SRS or SCG Gateway.

Detected problems are forwarded to Dell Technologies as actionable alerts, and support teams can remotely connect to the platform to help with troubleshooting.

Online Support:

https://www.dell.com/support

Telephone Support:

United States: 800-782-4362 (800-SVC-4EMC)

Canada: 800-543-4782

Worldwide: +1-508-497-7901

Troubleshooting 169

Reference Information

Topics:

Configuration Values File Reference Summary of Scripts Installer command reference

IV

170 Reference Information

Configuration Values File Reference

Topics:

Template of configuration values file

Template of configuration values file The following template shows all configuration attributes for SDP.

global: ## Type of External storage, valid values are nfs or ecs_s3 ## If `nfs` then a "nfs-client-provisioner" section is required ## If `ecs_s3` then a "pravega-cluster" section is required #storageType: nfs

## Whether to install the cluster in dev mode. Note this should not be used for production deployments. ## Among other things, this is going to enable the SOCKS5 proxy. #devel: false

# distribution: openshift # or omitted completely ## When installing on atlantic, set platform to atlantic. This will influence the way certain ## components are installed. In some cases they will be skipped entirely. ## The reason why this is a distinct field, and why "distribution" is not used for this is ## because there may be combinations, for example: Atlantic with Openshift. ## Other values in the future may include vsphere7 ## platform: Atlantic | VMware | OpenShift (case-sensitive)

#kahmNotifiers: [ ] #List of shipped KAHM notifiers: [nautilus-notifier, streamingdata- snmp-notifier, streamingdata-supportassist-ese] #skipNFSClientProvisioner: true | false #skip installing the nfs-client-provisioner (in case it already exists) #skipNginxIngress: true | false #skip installing the nginx-ingress (in case it already exists) #skipPSearch: false #skip installing psearch #skipLogging: true | false #skip installing the logging external: ## The full TLD to use for external connectivity. A blank string means no external connectivity. ## If you change domain here, change domain in domainFilters of external-dns config below ## and hostedZoneID in cert-manager-resources #host: " . " ## The name of your Kubernetes context. This is a required value. ## This can be found using `kubectl config current-context` #clusterName: " " ## Whether to enable TLS or not. ## If `true` then TLS is going to be enabled ## If `false` then TLS is going to be disabled #tls: false

## The list of supported TLS protocols. If omited, TLS1.2 and 1.3 will be enabled. # To explicit set the list (for example to restrict to TLSv1.3 only), set it to: #tlsProtocols: "TLSv1.3" # (separate by commas, with no spaces, if several protocols are listed)

20

Configuration Values File Reference 171

## Certificate issuers used by cert-manager, supported values: cli, selfsigned-ca- issuer, letsencrypt-production certManagerIssuerName:

## Uncomment the following when certManagerIssuerName: cli #certManagerIssuerGroup: nautilus.dellemc.com

## Uncomment the following when certManagerIssuerName: cli #certOrganization: dell.com #change it as per your company name

## Uncomment the following when certManagerIssuerName: cli #certOrganizationalUnit: uds #change as per your department name

## Uncomment the following to use wildcard certificates #wildcardSecretName: cluster-wildcard-tls-secret ## Whether the cluster is installed in a dark-site environment (i.e. no or partial external connectivity) ## Among other things, this is going to disable the SRS Gateway which requires external access. #darksite: false ## class name for Ingress Provider ##ingressClassName: nginx-nautilus #change if not using nginx chart as provided by SDP or you change the name the nginx chart will use # ingress: # annotations: {} ## Custom CA trust certificates in PEM format. These certificates are injected into certain Java components. ## The main use for them at the moment is when communicating with an ECS Object endpoint that uses custom trust, i.e. Self Signed Certificates # This bundle MUST include the internalCA defined in the cert-manager-resources block, as well as additional CAs # as needed (ecsObject ca, and self signed ca if used in cert-manager-resources too) # Note: certificate for ecsObject below can be obtained by querying data node on ECS # example: openssl s_client -showcerts -connect :9021 # If it is not possible to directly query via node public ip, ssh to ECS management node and then query using data node ip. # tlsCABundle: # ecsObject: |- # -----BEGIN CERTIFICATE----- # MIIDnDCCAoSgAwIBAgIJAOlxdGLbM3vBMA0GCSqGSIb3DQEBCwUAMBYxFDASBgNV # BAMTC0RhdGFTZXJ2aWNlMB4XDTIwMDIxOTE5MzMzNVoXDTMwMDIxNjE5MzMzNVow # ... # internalCa: |- # -----BEGIN CERTIFICATE----- # MIIDbTCCAlWgAwIBAgIUOgtNlun2VkJR+Grk7dTs37IIAdAwDQYJKoZIhvcNAQEL # BQAwRjELMAkG # ...

catalog: # image:

default-project-resources: # # ArtifactRepository Customizations # artifactRepository: # defaultStorageClassName: standard # storageSize: "10Gi"

# #IngestGateway HelmRelease Customizations # ingestGateway: # chartVersion: 1.3-15-29363a5 # haProxyImagetag: 2.2.14

# #JupyterHub HelmRelease Customizations # jupyterHub: # chartVersion: 0.1.0 # extraProfileList: []

# metrics: # influxdbStorageSize: "10Gi" # grafanaStorageSize: "10Gi"

172 Configuration Values File Reference

# # MQTT broker customizations # pravegaMQTTBroker: # chartVersion: 1.3-RC0-4-136c098 # useLoadBalancerIP: true

# videoServer: # rustBacktrace: full # cpu: 100m # memory: 256Mi

# # Zookeeper Customizations # zookeeper: # repository: devops-repo.isus.emc.com:8116/nautilus/zookeeper # tag: 0.2.15-237-6dea1e2 # replicas: 3

zookeeper-operator: # image: # repository: # tag:

metrics-operator: # image: # repository: # tag: # "kahmNotifiers": []

nautilus-ui: # image: # repository: # tag: # zookeeperPerProject: 3 #exposed for small env developement. LEAVE AS DEFAULT FOR PROD.

nginx-ingress: # controller: # image: # repository: # tag: # config: # proxy-buffer-size: "128k" # proxy-buffers: "4 256k" # fastcgi-buffers: "16 16k" # fastcgi-buffer-size: "32k" # max-worker-open-files: "16384" # podAnnotations: # ncp/ingress-controller: true # resources: # limits: # cpu: 1 # memory: 1Gi # requests: # cpu: 500m # memory: 500Mi # defaultBackend: # image: # repository # tag:

keycloak: # image: # repository: # tag: ## Keycloak admin password and UI admin password can be set at deployment time (not recommended for prod) ## Please follow the password policy for password to have 12 characters minimum; at least one lower and one upper case, ## at least one number and one specialchar from @#$%^&+= ## sample password myPa$$w0rd123. ## If not set up, the keycloak chart would generate a random one ## (it outputs a kubectl command to retrieve the secret) #secrets:

Configuration Values File Reference 173

#admin-creds: #stringData: #user: admin #password: "..." #desdp-creds: #stringData: #user: desdp #password: "..."

keycloak-injection-hook: # image: # repository: # tag:

keycloak-service-broker: # image: # repository: # tag:

## To be used if storageType is nfs nfs-client-provisioner: # image: # repository: # tag: # nfs: # server: # path: # mountOptions: # - nfsvers=4.0 # - sec=sys # - nolock # storageClass: # archiveOnDelete: "false"

project-operator: # image: # repository: # tag: # mavenImage: # repository: # tag: # zkImage: # repository: # tag:

flink-operator: # image: # repository: # tag: # flinkImage: # repository: # tag_1_7_2:

gstreamer-operator: # image: # repository: # tag:

spark-operator: # image: # repository: # tag: # sparkImage: # repository: # tag_1_7_2:

bookkeeper-operator: # image: # repository: # tag: # testmode: # enabled: true | false # whether to enable test mode without minimum replicas check # webhookCert:

174 Configuration Values File Reference

# crt: # key: # generate: true

pravega-operator: # image: # repository: # tag: # testmode: # enabled: true | false # whether to enable test mode without minimum replicas check # webhookCert: # crt: # key: # generate: true

pravega-service-broker: # image: # repository: # tag: # storage: # className: "" # size: 5Gi

zookeeper-cluster: # replicas: 3 # image: # repository: # tag: # storage: # volumeSize: 50Gi # storageClassName: # domainName: ""

bookkeeper-cluster: # image: # repository: # tag: # # Note: if deploying with 1 replica, update bk_client options in pravega-cluster # # and disable the minimumCount check in pravega_external_bk_placement_script_options # replicas: 6 # zookeeperUri: zookeeper-client:2181 # blockOwnerDeletion: true # pravegaClusterName: nautilus

# probes: {} # readiness: # initialDelaySeconds: 20 # periodSeconds: 10 # failureThreshold: 9 # successThreshold: 1 # timeoutSeconds: 30 # liveness: # initialDelaySeconds: 60 # periodSeconds: 15 # failureThreshold: 4 # successThreshold: 1 # timeoutSeconds: 5

# hooks: # backoffLimit: 100 # Might set to 10 with lesser number of BK replicas

# storage: # ledger: # volumeSize: 250Gi # className: # journal: # volumeSize: 250Gi # className: # index: # volumeSize: 10Gi # className:

Configuration Values File Reference 175

## Overridable bookkeeper options # options: # useHostNameAsBookieID: "true" # minorCompactionThreshold: "0.4" # minorCompactionInterval: "1800" # majorCompactionThreshold: "0.8" # majorCompactionInterval: "43200" # isForceGCAllowWhenNoSpace: "true" # journalDirectories: "/bk/journal/j0,/bk/journal/j1,/bk/journal/j2,/bk/journal/j3" # ledgerDirectories: "/bk/ledgers/l0,/bk/ledgers/l1,/bk/ledgers/l2,/bk/ledgers/l3" # ledgerStorageClass: "org.apache.bookkeeper.bookie.InterleavedLedgerStorage" # flushEntrylogBytes: "134217728" # flushInterval: "60000" # enableStatistics: "false" # entryLogPerLedgerEnabled: "true" # gcWaitTime: "600000" # autoRecoveryDaemonEnabled: "true" # emptyDirVolumeMounts: "logs=/opt/bookkeeper/logs,heap-dump=/tmp/dumpfile/heap"

# jvmOptions: # # Note that `memoryOpts` is an array and gets replaced as a whole. # # Make sure to configure heap and direct memory to fit within memory limit. # memoryOpts: # - "-Xms2g" # - "-XX:MaxDirectMemorySize=8g" # - "-XX:+ExitOnOutOfMemoryError" # - "-XX:+CrashOnOutOfMemoryError" # - "-XX:+HeapDumpOnOutOfMemoryError" # - "-XX:HeapDumpPath=/tmp/dumpfile/heap"

# resources: # limits: # cpu: 8000m # memory: 16Gi # requests: # cpu: 2000m # memory: 4Gi

pravega-cluster: # pravega_debugLogging: false # pravega_version: 0.5.0-2269.6f8a820-0.9.0-019.007be9f # credentialsAndAcls: base64 encoded password file: https://github.com/pravega/pravega- tools/blob/2c2dcb327a289f1f861deb96e23c2bf29e6b7f6c/pravega-cli/src/main/java/io/pravega/ tools/pravegacli/commands/admin/PasswordFileCreatorCommand.java # pravega_security pravega.client.auth.token & credentialsAndAcls are coupled, if one changes, the other must

# pravega_controller_auth_handler: # image: # repository: "nautilus-pravega-auth-init" # tag: # segment_store_init_container: # image: # repository: "nautilus-pravega-auth-init" # tag:

## Below are the overridable settings that are being passed ## to the "options" block of pravega deployment ## (controllers & segment stores) pravega_container_count: 48 ## expect it to be 8 containers per segmentstore if not reducing memory

segment_store: # readIndexBlockSize: Not recommended to be set to lower than 256MB in Pravega 0.9.x # For intensive random reads workloads (PSearch) could be lowered to 128MB # however the following condition must be satisfied "write_size.rolloverSizeBytes / segment_store.readIndexBlockSize < 16K" # readIndexBlockSize: "268435456" # # cacheMaxSize: This value (in bytes) should always be below "segment_store_jvm_max_direct_memory" setting # cacheMaxSize: "11811160064" # cacheMaxTimeSeconds: "600"

176 Configuration Values File Reference

# metadataRolloverSizeBytes: "134217728" # storageLayout: "ROLLING_STORAGE" | "CHUNKED_STORAGE" # enable_appends: "true" | "false" ## must be "false" if global.storageType is ecs_s3

write_size: # blockSize: "67108864" # flushThresholdBytes: "67108864" # maxFlushSizeBytes: "67108864" # rolloverSizeBytes: "4398046511104"

controller: # retention_bucketCount: "10" # service_asyncTaskPoolSize: "20" # retention_threadCount: "4" # transaction_lease_count_max: "120000" # transaction_execution_timeBound_days: "1"

bk_client: # bkEnsembleSize: "3" # bkWriteQuorumSize: "3" # bkAckQuorumSize: "3" # bkWriteTimeoutMillis: "60000" # maxOutstandingBytes: "33554432"

pravega_external_bk_placement_script_options: # Note that if BK-cluster is deployed with one replica, # the below option should be set to "false" # bookkeeper.write.quorum.racks.minimumCount.enable: "true"

## any extra pravega options that are not set via above overrides pravega_options: # log.level: "DEBUG"

## Two overridable segment store JVM options: # Note that total of heap and direct memory should be below the SS pod memory limit # segment_store_jvm_max_heap_size: "4g" #produces "-xmx4g" # segment_store_jvm_max_direct_memory: "12g" #produces "- XX:MaxDirectMemorySize=12g"

## Any extra segment store JVM options would go here: # segment_store_extra_jvm_options: # - "-xms1g" # example, not a default

## Controller JVM options, if any (default is none) # controller_jvm_options: # - "-xmx1g" # example, not a default

pravega_storage: # tier2: # size: 250Gi # class_name: "nfs" # cache: # size: 100Gi # class_name:

pravega_replicas: # controller: 1 # segment_store: 3

pravega_resources: controller: # limits: # cpu: 500m # memory: 1Gi # requests: # cpu: 250m # memory: 512Mi segment_store: # limits: # cpu: "1" # memory: 2Gi # requests: # cpu: 500m

Configuration Values File Reference 177

# memory: 1Gi

pravega_security: # TOKEN_SIGNING_KEY: "..." # pravega.client.auth.method: "Basic" # pravega.client.auth.token: "..." #note, if this changes credentialsAndAcls needs to change # autoScale.security.auth.token.signingKey.basis: "..." # AUTHORIZATION_ENABLED: "true" # autoScale.controller.connect.security.auth.enable: "true"

pravega_externalAccess: # enabled: true # type: LoadBalancer | NodePort

# More details on service types # controllerExtServiceType: ClusterIP # controllerSvcAnnotations: {} # segmentStoreExtServiceType: # segmentStoreSvcAnnotations: {} # segmentStoreLoadBalancerIP: # segmentStoreExternalTrafficPolicy: Local

pravega_image: # repository:

grafana_image: # repository: # tag:

influxdb_image: # repository: # tag:

metrics_cluster_storage: # className: "" # influxdbSize: 10Gi # grafanaSize: 1Gi

external-dns-resources: # externalDNSSecrets: # - name: # value: | # { # .... # }

external-dns: ## see https://github.com/helm/charts/blob/8ab12e10303710ea3ad9d771acdd69d7658b7f47/ stable/external-dns/values.yaml

cert-manager-resources: # certManagerSecrets: # - name: # value: | # { # .... # }

# internalCASecrets: # - name: tls.crt # value: | # -----BEGIN CERTIFICATE----- # MIIDbTCCAlWgAwIBAgIUOgtNlun2VkJR+Grk7dTs37IIAdAwDQYJKoZIhvcNAQEL # BQAwRjELMAkGA1UEBhMCVVMxCzAJBgNVBAgMAlJJMRMwEQYDVQQHDApDdW1iZXJs # ... # -----END CERTIFICATE----- # - name: tls.key # value: | # -----BEGIN RSA PRIVATE KEY----- # MIIEogIBAAKCAQEA1V+pz/U6px4tfwsGiTU+FjXely0tC8UnUrT+zxCW3x6yLdSg # Mk8eY8AZys8HxWlt7//5BFTJEqSPMIQLxDJ9gREeXrnt8JRtlAZ+B7CeBbgVkGhE

178 Configuration Values File Reference

# -----END RSA PRIVATE KEY-----

# clusterIssuer: # name: # server: # email: # acmeSecretKey: # solvers: # ## you can specify multiple solvers using labels and selectors. see: # ## https://docs.cert-manager.io/en/latest/tasks/issuers/setup-acme/index.html # - dns01: # clouddns: # serviceAccountSecretRef: # name: # key: # project: # - dns01: # route53: # # hosted zone id taken from route53 Hosted Zone Details # hostedZoneID: # region: # accessKeyID: # secretAccessKeySecretRef: # name: #TODO need to put this keey above in certManagerSecrets # key: #TODO need to put this keey above in certManagerSecrets

cert-manager: ## see https://github.com/jetstack/cert-manager/blob/v0.8.0/deploy/charts/cert-manager/ values.yaml

## Serviceability:

sdp-serviceability: decks: # storageClassName: ""

dellemc-streamingdata-license:

kahm: # storageClassName: ""

supportassist: # # The followings can also be configured via the SDP UI: # enabled: boolean # if true, enable SupportAssist # siteID: string # siteID and pin are used to generate accessKey from Dell support website # pin: string # accessKey: string # # remoteAccessEnabled: boolean # if true, enable remote access via secure gateway (SupportAssist must also be enabled) # gateways: # At least one gateway (up to eight) is required for remote access # - hostname: string # gateway host to connect to # port: integer # gateway port to connect to, should be 9443 # priority: integer # priority is used when multiple hosts are given

service-pod: # sshCred: # user credentials to use for remote access # user: svcuser # group: users # password: ChangeMe

snmp-notifier: # Required adding "streamingdata-snmp-notifier" to global.kahmNotifier # snmpServer: # host: string # SNMP server host or IP address

nautilus-notifiers: # slack: # token: string # Bot OAuth token with "chat:write" scope to send alerts # conversationId: string # Slack conversation ID or name to receive alerts # email:

Configuration Values File Reference 179

# service: string # if provided (e.g. "gmail"), can skip host, port below see https://nodemailer.com/smtp/well-known/. # host: string # hostname or IP address of the SMTP server to connect to (default is localhost) # secure: boolean # if true, use TLS when connecting to server # port: integer # port to connect to (defaults to 587 if secure is false or 465 if secure is true) # authName: string # email used for login # authPass: string # password used for normal login # sender: string # sender email # to: string # receiver emails, comma-separated

monitoring: # image: # repository: devops-repo.isus.emc.com:8116/nautilus/monitoring # tag: latest # pullPolicy: Always

# license: # name: dellemc-streamingdata-license # namespace: nautilus-system

# schedule: "*/10 * * * *"

# storage: # storageClassName: ""

# subjects: # - name: Streaming Flink Cores # code: STRM_FLINK_CORES # uom_code: ZC # uom_name: Individual CPU Cores # niceName: Flink # selectors: # - component=taskmanager # - component=jobmanager # - name: Streaming Platform Cores # code: STRM_CORES # uom_code: ZC # uom_name: Individual CPU Cores # niceName: Platform # namespaces: # - nautilus-system # - nautilus-pravega

logging: # platform-logging: # rsyslog: # persistence: # storageClassName: "" # size: 200Gi

## Use cluster-monitoring stack cluster-monitoring: # Enable Smartmon daemon # nodeSmartonExporter: # enabled: true # # Default # of replicas is 2 # server: # replicaCount: 1 # service: # annotations: # alertmanager: # replicaCount: 1

## To be used if storageType is ecs_s3 ecs-service-broker: # namespace: nightshift # replicationGroup: RG # # api:

180 Configuration Values File Reference

# endpoint: "http://10.243.86.70:9020" # ## Certificate for ecsConnection can be obtained by querying the ECS management node # example: openssl s_client -showcerts -connect :4443 # ecsConnection: # endpoint: "https://10.243.86.70:4443" # username: nightshift-sdp # password: ChangeMe # certificate: |- # -----BEGIN CERTIFICATE----- # MIIDCTCCAfGgAwIBAgIJAJ1g36y+tM0RMA0GCSqGSIb3DQEBCwUAMBQxEjAQBgNV # BAMTCWxvY2FsaG9zdDAeFw0yMDAyMTkxOTMzMjVaFw0zMDAyMTYxOTMzMjVaMBQx # EjAQBgNVBAMTCWxvY2FsaG9zdDCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoC # ggEBAKdvGtq912EKCBNsHTYjtbRlDkJyietThHw7p3WQCUq+TBxSisPJCshWRbNr # vpHjygnwZOSUcmvGdjpgXZeSUTWP0jgjCaOvhgeTHCRTKpkOd+tjEFpxCDGjZ61a # wHWSzxHKKBCgd8v6ERgkUyaD7S6RaNVjZ6E49raiDqeGmTd1uQRVq8RzRy2HePiT # G+rSRfS8qP3QzAjw1pIeJU100Ihy6wmMhUhmNJlU93INziF5mg0VYbnSW/bRb7zL # tNVHI0jkgPDQvqPtf26TZAhIO1bBiWebEk9UUhzGtaNgFE6/k9NTpu7Z32zkvOFe # vu7lw0x2c24p5mkPedycrGOGkHsCAwEAAaNeMFwwHwYDVR0jBBgwFoAU2us7uV9T # ZvUaai/9XGPrjhRbKNswGgYDVR0RBBMwEYIJbG9jYWxob3N0hwR/AAABMB0GA1Ud # DgQWBBTa6zu5X1Nm9RpqL/1cY+uOFFso2zANBgkqhkiG9w0BAQsFAAOCAQEAWcrd # +6xY62Ctn+ijphaUOuH9GO1VCUjLRPwZc1/QfMLsss/UhJVITt7JhMMUjGcV5UyV # JbQwoA+vso2DUbfehTceCWOa7QXLnO7si+ujVn0/vlCLvOj7MSJeRwOD3+3b2MpV # AJImqfo4J7ovY4iYfKPLVO6l1B5N9Cw3F3ztcA3HfmtKS7pTA0H/TLxaJ8GVCQO+ # d+StKXhM4dLC/uyvkPRQhXifMYJB5WM5G1X+kaiD30kZ5R0oEyWmaDrkuy2Xy0AP # 4xCeTPYMHUQlqaUYYG4iNnMfdwJuBiTpEpxKN/zt1M+ycWXSyGSW/EkItxztO49v # kQjNgkCo/artraBjvQ== # -----END CERTIFICATE----- # ## Custom plans for Bucket creation. Plan named "default" overrides standard default SDP bucket plan # s3Plans: # ## Unique UUID is required # - id: 9e777d49-0a78-4cf4-810a-b5f5173b019d # name: small # settings: # quota: # limit: 5 # warn: 4

Configuration Values File Reference 181

Summary of Scripts

Topics:

Summary of scripts

Summary of scripts The following scripts are included with SDP.

The scripts are in extracted contents of the decks-installer- .zip file, under the /scripts folder. There are software requirements for your local machine for most scripts. See Prepare the working environment on page 90.

health-check.py

This script may be run at any time after SDP is installed. It checks the state of various components in the SDP cluster and generates a summary as output.

See Run health-check on page 135.

post-install.sh

Run this script after running the decks-install apply command. This script confirms that your latest run of the decks- install apply command left the cluster in a healthy state. This script invokes the health check script. You may run this script at any time.

See Run the post-install script on page 95. Also see Update the applied configuration on page 107.

post-upgrade.sh

Run this script after upgrading the SDP with a new distribution of manifests and charts. It confirms that the cluster was upgraded properly and is healthy. It runs the health checks.

prereqs.sh

This script ensures that your environment is ready for installation by verifying the following:

Checks your local environment for the required tools and versions of those tools Checks the SDP cluster for a default storage class definition

The decks-install apply runs this script. Dell Technologies recommends that you run this script before running the decks-install apply command for the first time (or the first time on a new local machine). If you run this script in those conditions, you ensure that your environment is ready when you want to run the installation. You may run this script at any time.

See Run the prereqs.sh script on page 91.

pre-install.sh

This script must be run one time before installing SDP.

21

182 Summary of Scripts

It generates nondefault credentials for Pravega components. It also generates the gen-values-1.3.yaml file that contains those credentials. You must include that file with every run of the decks-install apply command, and this file must appear last in the list of values files.

See Run pre-install script on page 93.

pre-upgrade.sh

This script must be run before upgrading the SDP version with a new distribution of manifests and charts. The script ensures that the environment is healthy, including running the health checks. Do not update a cluster that is unhealthy.

Ensure to use the pre-upgrade.sh script from the new SDP distribution. The script is version-specific.

provisioner.py

This script recommends configurations for scaling SDP. Use this script after adding new ESXi hosts to expand the cluster.

See Get scaling recommendations on page 153.

scale.py

This script scales SDP using recommended values from the provisioner.py script.

See Scale SDP on page 155.

validate-values.py

This script is part of the installation and change configuration processes. It reads the configuration values files and checks the values over certain criteria. For example, it validates the values that are configured for external connectivity and serviceability.

The decks-install apply command runs this script automatically. You may run this script independently at any time to verify the configuration values files.

See Run the validate-values script on page 93. Also see Update the applied configuration on page 107.

Summary of Scripts 183

Installer command reference

Topics:

Prerequisites Command summary decks-install apply decks-install config set decks-install push decks-install sync decks-install unapply

Prerequisites To use the installer, you must meet the following prerequisites. The SDP Kubernetes cluster must exist. You must have direct or network access to the Kubernetes cluster. You must have authentication access rights to the Kubernetes cluster. The installer tool runs in the Kubernetes shell

environment, outside of the Kubernetes cluster. The user must have Kubernetes administrator privileges. Your working environment must have kubectl installed.

A default registry must be configured. The installer applies the default registry pathname to any unqualified image names in the application manifest, producing a path name of registry-path/image-name.

The decks-install push command must be used first, to push images to the default registry. Then you can use the other decks-install commands.

NOTE: Every run of decks-install has its log saved in a time stamped configmap in nautilus-private namespace. This allows for debugging any possible installation issues at a later time while accessing from a different system.

The name of the configmap has the following format: installer-logs- .

$ kubectl get configmap --namespace nautilus-private NAME DATA AGE installer-logs-2022-06-29.06-23-41 1 6h26m installer-logs-2022-06-29.06-43-33 1 6h6m installer-logs-2022-06-29.06-43-59 1 6h6m

Command summary The installer tool is a command-line executable that installs and uninstalls applications and resources in a Kubernetes cluster, and configures the cluster.

decks-install apply

Applies a given manifest bundle (with optional overrides) to a remote Kubernetes cluster.

decks-install unapply

Unapplies a manifest bundle from a Kubernetes cluster.

decks-install sync

Starts a reconciliation loop between applications and Helm releases.

decks-install config list

Lists the current configuration values.

decks-install config set

Sets a config value.

22

184 Installer command reference

decks-install push

Pushes an image bundle to a registry.

decks-install version

Shows the version and build information of the installer tool.

decks-install check

Runs health checks on the installed components.

decks-install status

Lists installed components and their state.

decks-install apply Applies the custom resource definitions (CRDs) and applications that are specified in a manifest bundle to the SDP Kubernetes cluster. By default, this command also starts the synchronization process that installs the Helm charts for each application.

Syntax

decks-install apply --kustomize --repo [--values , ,...] [--dry-run] [--skip-sync] [--simple-output] [--set = ] [--set-file = ]

Or alternate syntax:

decks-install apply --config

Options

--kustomize

Required. Specifies the location of the manifest bundle. Include the slash to indicate a directory. For example:

--kustomize ./manifests/

The manifest bundle is an artifact delivered in the root of the installer zip file, under manifests/. Manifest files must conform to Kubernetes Kustomize format, as described here.

--repo

Required. Specifies the location of the Helm charts directory. Include the slash to indicate a directory. For example:

--repo ./charts/

The charts are artifacts delivered in the root of the installer zip file, under charts/.

--values , [,...]

Specifies the pathnames of configuration values files. Separate multiple file names with commas and no spaces.

NOTE: SDP requires configuration files to define required attributes.

NOTE: SDP requires a configuration file named gen-values-1.3.yaml generated by the pre-

install.sh script to be the last configuration values file in the list of files.

Installer command reference 185

--dry-run Currently not used.

--skip-sync If specified, prevents the synchronization process from starting. You can start the synchronization process later, using the decks-install sync command.

The apply step adds CRDs and applications to the cluster in a pending state. The synchronization step reconciles the applications to the desired state.

--simple-output Displays logs to standard out and standard error, which are typically on the command line terminal.

If this flag is omitted, the command writes logs to decks-install.log and decks- install.stderr.

--set =

Provides the value for a configuration parameter, where: is the complete parameter name from the configuration values file, using periods to separate

the components in a name. is the configuration value. You can specify multiple key-value pairs in a comma-separated list.

The following example sets two configuration parameters on the command line:

--set global.storageType=ecs_s2,ecs-service-broker.namespace=mynamespace

The precedence for setting configuration values is: Values that are provided on the command line with the --set or --set-file options take precedence. Values that are provided in configuration values files are next. When multiple configuration values

files are specified and the same parameter appears in more than one file, the value in the right-most file takes precedence.

The installer uses internal default values if no other value is provided.

--set-file =

Provides the path of a file that contains a configuration parameter value, where: is the complete parameter name from the configuration values file, using periods to separate

the components in a name. is the file path name of the file that contains the value for the parameter. You can specify multiple key-file-path pairs in a comma-separated list.

The following example provides the path of the license file. The required configuration value in this case is the contents of the license file. The installer extracts the file contents.

--set-file dellemc-streamingdata-license.licensefile=

--config

Provides installer options in a YAML file.

With many values, overrides, and manual flag settings, it may be difficult to keep track of all flags and settings. For convenience, you can store some common flags in an installer configuration file rather than providing them manually every time you run the decks-install apply command.

Options that are set on the command line take precedence over those in the config file.

In the referenced yaml file, the keys are the option names. s and key value pairs are the command line option names and the must be in standard yaml format.

option1: option-value option2: - value 1 - value 2

Usage:

decks-install apply [-k MANIFESTS] [-r REPO] [--values FILE[,FILE...]] [flags]

186 Installer command reference

Flags:

-a, --all Select all managed application resources

--disable-openapi-validation if set, the rendered manifests will not be validated against the Kubernetes OpenAPI Schema

--dry-run Do not perform any actual changes

-h, --help help for apply

-k, --kustomize string Kustomization directory to apply to the cluster

-m, --managed-by string Select application resources managed by the specified value. The default value can be set in the config file using "decks-installer config set managed-by "

--max-history int Number of Helm secrets to preserve between releases (default: unlimited)

--redact-values Redact the sensitive values in logs (default true)

-r, --repo string Chart directory containing bundled charts

--set strings Set values on the command line (can specify multiple or separate values with commas: key1=val1,key2=val2)

--set-file strings Set values from respective files specified via the command line (can specify multiple or separate values with commas: key1=path1,key2=path2)

--simple-output Do not use a terminal emulator; use for non-POSIX terminals and emulators

--skip-sync Skip sync after apply

--use-last-values Use the last saved values from the previous applies

-f, --values strings Comma-separated YAML files with value overrides. Later files have higher precedence.

--values-from-secrets strings Values secret references (e.g. namespace/ values-secret1,namespace/values-secret2). Later references or values files have higher precedence.

--wait-timeout duration Maximum time to wait for resources like pods or jobs to complete or be in a ready state before retrying (default 168h0m0s)

Global Flags:

--accept-eula accept the Dell EMC EULA once

-c, --config string config file

--debug enable verbose output

--home string location of your DECKS config. Overrides $DECKS_HOME

Examples

Options on the command line

$ ./decks-install apply --kustomize ./manifests/ --repo ./charts/ \ --values /path/to/values.yaml>,/path/to/pre-install/gen-values-1.3.yaml> \ --set-file=dellemc-streamingdata-license.licensefile=/path/to/license.xml>

Installer command reference 187

Options in a config file

The following command and YAML file combination is equivalent to the command line example above.

$ ./decks-install --config config.yaml

Where config.yaml contains the following:

kustomize: manifests/ repo: charts/ values: - /path/to/values.yaml> - /path/to/pre-install/gen-values-1.3.yaml> set-file: - dellemc-streamingdata-license.licensefile=/path/to/license.xml>

Output

The command shows progress as components are installed. The default output lists the component name, status in the install process (for example: Pending, Updating, Succeeded), and explanatory information when the status is Pending.

You can change the contents of the third column by using the keys that appear at the bottom of the screen: d changes the third column to a description of the component being installed. v changes the third column to show versions being installed. i changes the third column back to the default view, which shows explanatory information about the reconciliation stage for

a component. With this view, you can see when a component is waiting for dependencies.

decks-install config set Sets a configuration value.

Usage

This command uses a key value pair to set a configuration value for an installation setting.

Syntax

decks-install config set key value

Options

key

A configuration field name. See the configuration file template here.

value

The setting value.

Set the registry

The following example sets the container registry.

$ ./decks-install config set registry gcr.io/stream-platform/reg

188 Installer command reference

decks-install push Pushes an image bundle to a configured container registry.

Usage

The image bundle (a tar archive) for SDP contains several large images. The push operation may take hours to complete.

The registry is typically preconfigured, using the decks-install config command:

decks-install config set registry example-registry.com/some/path

You may override the configured registry URL with the --registry option.

Syntax

decks-install push --input [--registry ]

Options

--input

A .tar file of images. Do not extract files. The installer expects the .tar file as input.

--registry

Optional. Overrides the configured registry URL, to push the images to a different registry URL.

decks-install sync Synchronizes the Kubernetes cluster to the wanted terminal state.

Usage

Synchronization consists of installing, upgrading, reconfiguring, or uninstalling components in the cluster as needed to match the wanted state for each application. The synchronization procedure ends when all components are installed, configured, or removed in accordance with the wanted configuration as recorded from previously applied or unapplied configurations. Synchronization usually takes a few minutes.

A synchronization process begins automatically after you use the decks-install apply or decks-install unapply command. If the synchronization process fails for whatever reason, use the decks-install sync command to resume the process.

It is safe to restart the synchronization process at any time. Be sure to specify the correct manifest and repo chart locations.

Syntax

decks-install sync [--kustomize ] [--repo ]

Installer command reference 189

Options

[--kustomize ]

Required. Specifies the path and directory name of the manifest bundle that describes the applications and resources to synchronize. Include the final slash indicating the directory. For example:

--kustomize ./manifests/

This option is optional on the sync command. If omitted, the installer only synchronizes the application resources based on the information found on the cluster. Namespace and CRD cleanup operations are not run since they use the information from the manifest file. It is recommended to pass --kustomize if you have the manifests.

--repo

Required to synchronize application installations. Specifies the path and directory name of the Helm charts . Include the final slash indicating the directory. For example:

--repo ./charts/

decks-install unapply Marks applications for removal from the Kubernetes cluster and starts the synchronization process. Use this command to uninstall SDP from a cluster.

Syntax

decks-install unapply --kustomize --repo [--dry-run] [--skip-sync] [--simple-output]

Usage

Use this command if you need to start over with a completely new SDP installation due to corruption or a major system failure.

If you run decks-install unapply against the same manifest bundle used for installation, it uninstalls all SDP components and it deletes all SDP data. When command execution completes, you have an empty Kubernetes cluster. You can then start over with a new installation into that cluster.

WARNING: In SDP, this process deletes all user data that was ingested into Pravega.

Options

--kustomize

Required. Specifies the location of the manifest bundle that defines the applications to uninstall. Include the slash to indicate a directory. For example:

--kustomize ./manifests/

Manifest files must conform to Kubernetes Kustomize format, as described here.

--repo

190 Installer command reference

Specifies the location of the Helm charts directory to reconcile with. Include the slash to indicate a directory. For example:

--repo ./charts/

The charts are artifacts originally delivered in the root of the installer zip file, under charts/.

--dry-run Currently not used.

--skip-sync If specified, prevents the synchronization process from starting. You can start the synchronization process later, using the decks-install sync command.

The apply step adds CRDs and applications to the cluster in a pending state. The synchronization step reconciles the applications to the desired state.

--simple-output Displays logs

Manualsnet FAQs

If you want to find out how the 1.4 Dell works, you can view and download the Dell Streaming Data Platform 1.4 Software Installation Guide on the Manualsnet website.

Yes, we have the Installation Guide for Dell 1.4 as well as other Dell manuals. All you need to do is to use our search bar and find the user manual that you are looking for.

The Installation Guide should include all the details that are needed to use a Dell 1.4. Full manuals and user guide PDFs can be downloaded from Manualsnet.com.

The best way to navigate the Dell Streaming Data Platform 1.4 Software Installation Guide is by checking the Table of Contents at the top of the page where available. This allows you to navigate a manual by jumping to the section you are looking for.

This Dell Streaming Data Platform 1.4 Software Installation Guide consists of sections like Table of Contents, to name a few. For easier navigation, use the Table of Contents in the upper left corner.

You can download Dell Streaming Data Platform 1.4 Software Installation Guide free of charge simply by clicking the “download” button in the upper right corner of any manuals page. This feature allows you to download any manual in a couple of seconds and is generally in PDF format. You can also save a manual for later by adding it to your saved documents in the user profile.

To be able to print Dell Streaming Data Platform 1.4 Software Installation Guide, simply download the document to your computer. Once downloaded, open the PDF file and print the Dell Streaming Data Platform 1.4 Software Installation Guide as you would any other document. This can usually be achieved by clicking on “File” and then “Print” from the menu bar.