Contents

Dell Streaming Data Platform 1.2 Software Installation Guide PDF

1 of 164
1 of 164

Summary of Content for Dell Streaming Data Platform 1.2 Software Installation Guide PDF

Dell EMC Streaming Data Platform Installation and Administration Guide

Version 1.2

July 2021

Notes, cautions, and warnings

NOTE: A NOTE indicates important information that helps you make better use of your product.

CAUTION: A CAUTION indicates either potential damage to hardware or loss of data and tells you how to avoid

the problem.

WARNING: A WARNING indicates a potential for property damage, personal injury, or death.

2019 - 2021 Dell Inc. or its subsidiaries. All rights reserved. Dell, EMC, and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be trademarks of their respective owners.

Chapter 1: Product Description..................................................................................................... 8 Product summary................................................................................................................................................................ 8 Product components.......................................................................................................................................................... 9

About Pravega................................................................................................................................................................9 About analytic engines and Pravega connectors................................................................................................. 10 Management plane....................................................................................................................................................... 11

Deployment options............................................................................................................................................................11 Component deployment matrix...................................................................................................................................... 12 Architecture and supporting infrastructure ................................................................................................................12 Product highlights..............................................................................................................................................................14 More features..................................................................................................................................................................... 15 Basic terminology............................................................................................................................................................... 17 Interfaces............................................................................................................................................................................. 17

User Interface (UI).......................................................................................................................................................18 Grafana dashboards.....................................................................................................................................................18 Apache Flink Web UI................................................................................................................................................... 19 Apache Spark Web UI.................................................................................................................................................20 APIs.................................................................................................................................................................................20

What you get with SDP................................................................................................................................................... 20 Use case examples.............................................................................................................................................................21 Documentation resources................................................................................................................................................ 21

Part I: Install SDP Edge............................................................................................................... 23

Chapter 2: Download installation files..................................................................................... 24 Supported SDP Edge deployment options............................................................................................................ 24 Download files for SDP Edge installation...............................................................................................................24 Prerequisites for SDP Edge ..................................................................................................................................... 24

Chapter 3: Install Ubuntu on Bare Metal..................................................................................26 Install Ubuntu on bare metal.....................................................................................................................................26 Fix the drive used for the boot disk........................................................................................................................29

Chapter 4: Deploy SDP Edge....................................................................................................31 Required customer information................................................................................................................................ 31 Installation..................................................................................................................................................................... 35 Configure UI access.................................................................................................................................................... 37 Add trusted CA to browser.......................................................................................................................................39 Get SDP URL and login credentials........................................................................................................................ 39

Chapter 5: Manage SDP Edge.................................................................................................. 41 Add new user in SDP Edge........................................................................................................................................ 41 Create a project ..........................................................................................................................................................42 Set retention size on streams.................................................................................................................................. 43

Contents

Contents 3

Shutdown and restart the Kubespray cluster.......................................................................................................43 Add a node.................................................................................................................................................................... 44 Remove a node............................................................................................................................................................ 44 Upgrade from single-node to 3-node deployment.............................................................................................. 45 Backup........................................................................................................................................................................... 45 Recover the control plane.........................................................................................................................................45

Part II: Install SDP Core.............................................................................................................. 47

Chapter 6: Site Prerequisites..................................................................................................48 Obtain and save the license file............................................................................................................................... 48 Deploy SRS Gateway..................................................................................................................................................48 Set up local DNS server.............................................................................................................................................49 Provision long-term storage on PowerScale ....................................................................................................... 49 Provision long-term storage on ECS...................................................................................................................... 49

Chapter 7: Configuration Values.............................................................................................. 51 About configuration values files............................................................................................................................... 51 Prepare configuration values files........................................................................................................................... 52 Source control the configuration values files....................................................................................................... 52 Validate the configuration values............................................................................................................................ 52 Configure global platform settings .........................................................................................................................52 TLS configuration details...........................................................................................................................................54

Enable or disable TLS........................................................................................................................................... 54 Configure TLS using certificates from Let's Encrypt ..................................................................................55 Self-signed certificates........................................................................................................................................55 Configure TLS using signed certificates from a Certificate Authority ....................................................56

Configure connections to a local DNS .................................................................................................................. 58 Configure long-term storage on PowerScale ......................................................................................................59 Configure long-term storage on ECS ....................................................................................................................59 Configure or remove connection to the SRS Gateway......................................................................................64 Enable periodic telemetry upload to SRS ............................................................................................................. 64 Configure default admin password......................................................................................................................... 65

Chapter 8: Install SDP Core.................................................................................................... 67 Download installation files......................................................................................................................................... 67 Install required infrastructure (RHEL and CoreOS)............................................................................................ 67 Unzip installation files.................................................................................................................................................68 Prepare the working environment...........................................................................................................................68 Push images into the registry...................................................................................................................................68 Run the prereqs.sh script .........................................................................................................................................69 Prepare self-signed SSL certificate .......................................................................................................................69 Run pre-install script...................................................................................................................................................70 Run the validate-values script...................................................................................................................................71 Install SDP..................................................................................................................................................................... 72 Run the post-install script.............................................................................................................................72

(Optional) Validate self-signed certificates.......................................................................................................... 73 Obtain connection URLs ...........................................................................................................................................74

4 Contents

Part III: Manage SDP................................................................................................................... 76

Chapter 9: Post-install Configuration and Maintenance...........................................................77 Obtain default admin credentials............................................................................................................................. 77 Configure federated user accounts........................................................................................................................ 78

Configure an LDAP identity provider ............................................................................................................... 78 Configure Keycloak for LDAP federation.........................................................................................................79

Add Pravega alerts to event collection .................................................................................................................80 Temporarily disable SRS connectivity or telemetry uploads..............................................................................81 Verify telemetry cron job...........................................................................................................................................82 Update the default password for SRS remote access ............................................................................ 82 Ensure system availability when a node is down ................................................................................................ 82 Change applied configuration................................................................................................................................... 83 Graceful shutdown and startup............................................................................................................................... 84 Uninstall applications.................................................................................................................................................. 86 Reinstall into existing cluster.................................................................................................................................... 87 Change ECS credentials after installation............................................................................................................. 87

Chapter 10: Manage Connections and Users............................................................................90 Obtain connection URLs .......................................................................................................................................... 90 Connect and login to the web UI ............................................................................................................................ 91 Log in to OpenShift for cluster-admins.................................................................................................................. 91 Log in to OpenShift command line for non-admin users................................................................................... 92 Create a user ...............................................................................................................................................................92

Add new local user on the Keycloak UI............................................................................................................ 92 Assign roles ..................................................................................................................................................................93 User password changes.............................................................................................................................................93

Change password in Keycloak............................................................................................................................ 93

Chapter 11: Expand and Scale the Infrastructure .................................................................... 95 Difference between expansion and scaling...........................................................................................................95 Expansion...................................................................................................................................................................... 95

Determine expansion requirements...................................................................................................................95 Add new rack..........................................................................................................................................................96 Add nodes to the OpenShift cluster ................................................................................................................96 Add supporting storage........................................................................................................................................96

Scaling............................................................................................................................................................................96 Get scaling recommendations............................................................................................................................ 96 Scale the K8s cluster............................................................................................................................................ 97 Scale SDP ...............................................................................................................................................................98 Scale Apache Flink resources............................................................................................................................. 99 Impact of cluster expansion and scaling ....................................................................................................... 100

Chapter 12: Manage Projects ................................................................................................ 102 Top-level navigation in the UI ................................................................................................................................102 Naming requirements................................................................................................................................................103 Manage projects........................................................................................................................................................ 104

About projects ..................................................................................................................................................... 104

Contents 5

Create a project .................................................................................................................................................. 104 Create a project manually.................................................................................................................................. 105 Delete a project.................................................................................................................................................... 107 Add or remove project members .................................................................................................................... 108 List projects and view project contents......................................................................................................... 108 What's next with projects...................................................................................................................................110

Manage scopes and streams................................................................................................................................... 110 About scopes and streams................................................................................................................................. 110 Create and manage streams............................................................................................................................... 111 Stream configuration attributes.........................................................................................................................111 Manage cross project scope sharing............................................................................................................... 112 Start and stop stream ingestion........................................................................................................................113 Monitor stream ingestion.................................................................................................................................... 113

Chapter 13: Monitor Health.................................................................................................... 114 Monitor licensing.........................................................................................................................................................114 Temporarily disable SRS connectivity or telemetry uploads............................................................................ 115 Monitor and manage events.................................................................................................................................... 116 Run health-check........................................................................................................................................................116 Monitor Pravega health.............................................................................................................................................117 Monitor stream health............................................................................................................................................... 117 Monitor Apache Flink clusters and applications .................................................................................................118 Monitor Pravega Search resources and health................................................................................................... 119 Logging..........................................................................................................................................................................119

Chapter 14: Use Pravega Grafana Dashboards........................................................................120 Grafana dashboards overview................................................................................................................................ 120 Connect to the Pravega Grafana UI...................................................................................................................... 121 Retention policy and time range ........................................................................................................................... 122 Pravega System dashboard.....................................................................................................................................123 Pravega Operation Dashboard................................................................................................................................125 Pravega Scope dashboard....................................................................................................................................... 127 Pravega Stream dashboard..................................................................................................................................... 128 Pravega Segment Store Dashboard.......................................................................................................................131 Pravega Controller Dashboard .............................................................................................................................. 132 Pravega Alerts dashboard........................................................................................................................................132 Custom queries and dashboards ........................................................................................................................... 133 InfluxDB Data ............................................................................................................................................................. 134

Chapter 15: Troubleshooting .................................................................................................136 View versions of system components.................................................................................................................. 136 Kubernetes resources............................................................................................................................................... 136

Namespaces.......................................................................................................................................................... 136 Components in the nautilus-system namespace.......................................................................................... 137 Components in the nautilus-pravega namespace........................................................................................ 138 Components in project namespaces............................................................................................................... 138 Components in cluster-monitoring namespace............................................................................................ 138 Components in the catalog namespace..........................................................................................................139

Log files........................................................................................................................................................................ 139

6 Contents

Useful troubleshooting commands........................................................................................................................ 140 OpenShift client commands.............................................................................................................................. 140 helm commands.................................................................................................................................................. 140

kubectl commands.......................................................................................................................................... 140

FAQs..............................................................................................................................................................................142 Application connections when TLS is enabled....................................................................................................145 Online and remote support...................................................................................................................................... 146

Part IV: Reference Information.................................................................................................. 147

Chapter 16: Configuration Values File Reference................................................................... 148 Template of configuration values file....................................................................................................................148

Chapter 17: Summary of Scripts............................................................................................ 156 Summary of scripts................................................................................................................................................... 156

Chapter 18: Installer command reference ..............................................................................158 Prerequisites............................................................................................................................................................... 158

Command summary...................................................................................................................................................158 decks-install apply..................................................................................................................................................... 159 decks-install config set............................................................................................................................................. 161 decks-install push...................................................................................................................................................... 162 decks-install sync.......................................................................................................................................................162 decks-install unapply................................................................................................................................................. 163

Contents 7

Product Description

Topics:

Product summary Product components Deployment options Component deployment matrix Architecture and supporting infrastructure Product highlights More features Basic terminology Interfaces What you get with SDP Use case examples Documentation resources

Product summary Dell EMC Streaming Data Platform (SDP) is an autoscaling software platform for ingesting, storing, and processing continuously streaming unbounded data. The platform can process both real-time and collected historical data in the same application.

SDP ingests and stores streaming data, such as Internet of Things (IoT) devices, web logs, industrial automation, financial data, live video, social media feeds, and applications. It also ingests and stores event-based streams. It can process multiple data streams from multiple sources while ensuring low latencies and high availability.

The platform manages stream ingestion and storage and hosts the analytic applications that process the streams. It dynamically distributes processing related to data throughput and analytical jobs over the available infrastructure. It also dynamically autoscales storage resources to handle requirements in real time as the streaming workload changes.

SDP supports the concept of projects and project isolation or multi-tenancy. Multiple teams of developers and analysts all use the same platform, but each team has its own working environment. The applications and streams that belong to a team are protected from write access by other users outside of the team. Cross-team stream data sharing is supported in read-only mode.

SDP integrates the following capabilities into one software platform:

Stream ingestionThe platform is an autoscaling ingesting engine. It ingests all types of streaming data, including unbounded byte streams and event-based data in real time.

Stream storageElastic tiered storage provides instant access to real-time data, access to historical data, and near-infinite storage.

Stream processingReal-time stream processing is possible with an embedded analytics engine. Your stream processing applications can perform functions, such as: Process real-time and historical data. Process a combination of real-time and historical data in the same stream. Create and store new streams. Send notifications to enterprise alerting tools. Send output to third-party visualization tools.

Platform managementIntegrated management provides data security, configuration, access control, resource management, easy upgrade process, stream metrics collection, and health and monitoring features.

Run-time managementA web-based User Interface (UI) allows authorized users to configure stream properties, view stream metrics, run applications, view job status, and monitor system health.

Application developmentThe product distribution includes APIs. The web UI supports application deployment and artifact storage.

1

8 Product Description

Product components SDP is a software-only platform consisting of integrated components, supporting APIs, and Kubernetes Custom Resource Definitions (CRDs). This product runs in a Kubernetes environment.

Figure 1. SDP main components

Pravega Pravega is the stream store in SDP. It handles ingestion and storage for continuously streaming unbounded byte streams. Pravega is an Open Source Software project, which is sponsored by Dell EMC.

Unified Analytics SDP includes the following embedded analytic engines for processing your ingested stream data.

Apache Flink Apache Flink is an embedded stream processing engine in SDP. Dell EMC distributes Docker images from the Apache Flink Open Source Software project.

SDP ships with images for Apache Flink. It also supports custom Flink images.

Apache Spark Apache Spark is a unified analytics engine for large-scale data processing.

SDP ships with images for Apache Spark.

Pravega Search Pravega Search provides query features on the data in Pravega streams. It supports filtering and tagging incoming data as it is ingested as well as searching stored data.

For supported analytic engine image versions for this SDP release, see Component deployment matrix on page 12.

Management platform

The management platform is Dell EMC proprietary software. It integrates the other components and adds security, performance, configuration, and monitoring features.

User interface The management plane provides a comprehensive web-based user interface for administrators and application developers.

Metrics stacks SDP deploys InfluxDB databases and Grafana instances for metrics visualization. Separate stacks are deployed for Pravega and for each analytic project.

Pravega schema registry

The schema registry provides a serving and management layer for storing and retrieving schemas for Pravega streams.

APIs Various APIs are included in the SDP distributions. APIs for Spark, Flink, Pravega, Pravega Search, and Pravega Schema Registry are bundled in this SDP release.

About Pravega

The Open Source Pravega project was created specifically to support streaming applications that handle large amounts of continuously arriving data.

In Pravega, the stream is a core primitive. Pravega ingests unbounded streaming data in real time and coordinates permanent storage.

Product Description 9

Pravega user applications are known as Writers and Readers. Pravega Writers are applications using the Pravega API to ingest collected streaming data from several data sources into SDP. The platform ingests and stores the streams. Pravega Readers read data from the Pravega store.

Pravega streams are based on an append-only log data structure. By using append-only logs, Pravega rapidly ingests data into durable storage. Pravega handles all types of streams, including:

Unbounded or bounded streams of data Streams of discrete events or a continuous stream of bytes Sensor data, server logs, video streams, or any other type of information

Pravega seamlessly coordinates a two-tiered storage system for each stream. Bookkeeper (called Tier 1) stores the recently ingested tail of a stream temporarily. Long-term storage (sometimes called Tier 2) occurs in a configured alternate location. You can configure streams with specific data retention periods.

An application, such as a Java program reading from an IoT sensor, writes data to the tail of the stream. Apache Flink applications can read from any point in the stream. Multiple applications can read and write the same stream in parallel. Some of the important design features in Pravega are: Elasticity, scalability, and support for large volumes of streaming data Preserved ordering and exactly-once semantics Data retention based on time or size Durability Transaction support

Applications can access data in real time or past time in a uniform fashion. The same paradigm (the same API call) accesses both real-time and historical data in Pravega. Applications can also wait for data that is associated with any arbitrary time in the future.

Specialized software connectors provide access to Pravega. For example, a Flink connector provides Pravega data to Flink jobs. Because Pravega is an Open Source project, it can potentially connect to any analytics engine with community-contributed connectors.

Pravega is unique in its ability to handle unbounded streaming bytes. It is a high-throughput, autoscaling real-time store that preserves key-based ordering of continuously streaming data and guarantees exactly-once semantics. It infinitely tiers ingested data into long-term storage.

For more information about Pravega, see http://www.pravega.io.

About analytic engines and Pravega connectors

SDP includes analytic engines and connectors that enable access to Pravega streams.

Analytic engines run applications that analyze, consolidate, or otherwise process the ingested data.

Apache Flink Apache Flink is a high throughput, stateful analytics engine with precise control of time and state. It is an emerging market leader for processing stream data. Apache Flink provides a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. It performs computations at in-memory speed and at any scale. Preserving the order of data during processing is guaranteed.

The Flink engine accommodates many types of stream processing models, including:

Continuous data pipelines for real-time analysis of unbounded streams Batch processing Publisher/subscriber pipelines

The SDP distribution includes Apache Flink APIs that can process continuous streaming data, sets of historical batch data, or combinations of both.

For more information about Apache Flink, see https://flink.apache.org/.

Apache SparkTM Apache Spark provides a dataflow engine in which the user can express the required flow using transformation and actions. Data is handled through a Resilient Distributed Dataset (RDD) that is an immutable, partitioned dataset that transformations and operations work on. Applications dataflow graphs are broken down into stages, each stage creating a new RDD but importantly the RDD is not a materialized view on disk but rather an in-memory representation of the data held within the Spark cluster that later stages can process.

10 Product Description

The SDP distribution includes Apache Spark APIs that can process streaming data, sets of historical batch data, or combinations of both. Spark has two processing modes: batch processing and streaming micro-batch.

For more information about Apache Spark, see https://spark.apache.org.

Other analytic engines and Pravega connectors

You may develop custom Pravega connectors to enable client applications to read from and write real- time data to Pravega streams. For more information, see the SDP Code Hub.

Management plane

The SDP management plane coordinates the interoperating functions of the other components.

The management plane deploys and manages components in the Kubernetes environment. It coordinates security, authentication, and authorization. It manages Pravega streams and the analytic applications in a single platform.

The web-based UI provides a common interface for all users. Developers can upload and update application images. All project members can manage streams and processing jobs. Administrators can manage resources and user access.

Some of the features of the management plane are:

Integrated data security, including TLS encryption, multilevel authentication, and role-based access control (RBAC) Project-based isolation for team members and their respective streams and applications Possibility of sharing stream data in a read-only mode across streams Flink cluster and application management Spark application management Pravega streams management Stream data schema management and evolution history with a Schema Registry DevOps oriented platform for modern software development and delivery Integrated Kubernetes container environment Application monitoring and direct access to the Apache Flink or Apache Spark web-based UIs Direct access to predefined Grafana dashboards for Pravega Direct access to project-specific predefined Grafana dashboards showing operational metrics for Flink, Spark, and Pravega

Search clusters

Deployment options Streaming Data Platform supports options for deployment at the network edge and in the data center core.

SDP Edge Deploying at the Edge, near gateways or sensors, has the advantage of local ingestion, transformation, and alerting. Data is processed, filtered, or enriched before transmission upstream to the Core.

SDP Edge is a small footprint deployment, requiring fewer minimum resources (CPU cores, RAM, and storage). It supports configurations of 1 or 3 nodes. Single-node deployment is an extremely low cost option for development, proof of concept, or for

production use cases where high availability (HA) is not required. Single-node deployment operates without any external long-term storage, using only node disks for storage.

3-node deployments provide HA at the Edge for local data ingestion that cannot tolerate downtime. This deployment can use node disks or PowerScale for long-term storage.

SDP Core SDP Core provides all the advantages of on-premise data collection, processing, and storage. SDP Core is intended for data center deployment with full size servers. It handles larger data ingestion needs and also accepts data collected by SDP Edge and streamed up to the Core.

HA is built into all deployments. Deployments start with a minimum of 3 nodes and can expand up to 12 nodes, with built-in scaling of added resources. Recommended servers have substantially more resources than SDP Edge. Long-term storage is Dell EMC PowerScale clusters or Dell EMC ECS appliances. Multi-tenant use cases are typical. Other intended use cases build models across larger data sets and ingest large amounts of data.

Product Description 11

Component deployment matrix This matrix shows the differences between SDP Edge and SDP Core deployments.

Table 1. Component deployment matrix

Component SDP Core 1.2 SDP Edge 1.2

Apache Flink Ships with versions 1.11.2-2.12 Ships with versions 1.11.2-2.12

Apache Spark Ships with versions 2.4.7 & 3.0.1 Ships with versions 2.4.7 & 3.0.1

Kubernetes Platform OpenShift 4.6 KubeSpray 2.14.2

Minimum number of nodes 3 1

Maximum number of nodes 12 3

Container Runtime Crio version 1.19.0 Docker version 19.03

Operating System RHEL 8.3, CORE OS 4.6 Ubuntu 18.04

Long-term storage option:

Dell EMC PowerScale

Gen5 or later hardware

OneFS 8.2.x or 9.x software with NFSv4.0 enabled

Gen5 or later hardware

OneFS 8.2.x or 9.x software with NFSv4.0 enabled

Single-node deployment supports local long-term storage

Long-term storage option:

Dell EMC ECS

ECS object storage appliance with ECS 3.5.1.4 and later, ECS 3.6.1.1 and later, and 3.7.0.9 and later

Not supported

Secure Remote Services (SRS) Gateway SRS 3.38 (minimum) Optional. If used, SRS 3.38 (minimum)

Architecture and supporting infrastructure The supporting infrastructure for SDP supplies storage and compute resources and the network infrastructure.

SDP is a software-only solution. The customer obtains the components for the supporting infrastructure independently.

For each SDP version, the reference architecture includes specific products that are tested and verified. The reference architecture is an end-to-end enterprise solution for stream processing use cases. Your Dell EMC sales representative can provide appropriate reference architecture solutions for your expected use cases.

A general description of the supporting infrastructure components follows.

Reference Hardware

SDP runs on bare metal servers using custom operating system software provided in the SDP distribution. SDP Edge runs on Ubuntu. SDP Core runs on Red Hat Enterprise Linux Core OS.

Network A network is required for communication between the nodes in the SDP cluster and for the external clients to access cluster applications.

Local storage Local storage is required for various system functions. The Dell support team helps size storage needs based on intended use cases.

Long-term storage

Long-term storage for stream data is required and is configured during installation. Long-term storage is any of the following: SDP Core production solutions require an elastic scale-out storage solution. You may use either of the

following for long-term storage: A file system on a Dell EMC PowerScale cluster A bucket on the Dell EMC ECS appliance

SDP Edge production solutions uses a file system on a Dell EMC PowerScale cluster. For testing, development, or use cases where only temporary storage is needed, long-term storage

may be defined as a file system on a local mount point.

12 Product Description

Kubernetes container environment included with SDP

SDP runs in a Kubernetes container environment. The container environment isolates projects, efficiently manages resources, and provides authentication and RBAC services. The required Kubernetes environments are provided with SDP distributions and are installed and configured as part of SDP installation. They are: SDP Edge runs in Kubespray. SDP Core runs in RedHat OpenShift.

The following figures show the supporting infrastructure in context with SDP.

Figure 2. SDP Core architecture

Figure 3. SDP Edge architecture

Product Description 13

Product highlights SDP includes the following major innovations and unique capabilities.

Enterprise-ready deployment

SDP is a cost effective, enterprise-ready product. This software platform, running on a recommended reference architecture, is a total solution for processing and storing streaming data. With SDP, an enterprise can avoid the complexities of researching, testing, and creating an appropriate infrastructure for processing and storing streaming data. The reference architecture consists of both hardware and software. The resulting infrastructure is scalable, secure, manageable, and verified. Dell EMC defines the infrastructure and provides guidance in setting it up. In this way, SDP dramatically reduces time to value for an enterprise.

SDP provides integrated support for a robust and secure total solution, including fault tolerance, easy scalability, and replication for data availability.

With SDP, Dell EMC provides the following deployment support:

Recommendations for the underlying hardware infrastructure Sizing guidance for compute and store to handle your intended use cases End-to-end guidance for setting up the reference infrastructure, including switching and network

configuration (trunks, VLANs, management and data IP routes, and load balancers) Comprehensive image distribution, consisting of customized images for the operating system,

supporting software, SDP software, and API distributions for developers Integrated installation and configuration for underlying software components (Docker, Helm,

Kubernetes) to ensure alignment with SDP requirements

The result is an ecosystem ready to ingest and store streams, and ready for your developers to code and upload applications that process those streams.

Unbounded byte stream ingestion, storage, and analytics

Pravega was designed from the outset to handle unbounded byte stream data.

In Pravega, the unbounded byte stream is a primitive structure. Pravega stores each stream (any type of incoming data) as a single persistent stream, from ingestion to long-term storage, like this:

Recent tailThe real-time tail of a stream exists on Tier 1 storage. Long-termThe entire stream is stored on long-term storage (also called Tier 2 storage in Pravega).

Applications use the same API call to access real-time data (the recent tail on Tier 1 storage) and all historical data on long-term storage.

In Apache Flink or Spark applications, the basic building blocks are streams and transformations. Conceptually, a stream is a potentially never-ending flow of data records. A transformation is an operation that takes one or more streams as input and produces one or more output streams. In both applications, non-streaming data is treated internally as a stream.

By integrating these products, SDP creates a solution that is optimized for processing unbounded streaming bytes. The solution is similarly optimized for bounded streams and more traditional static data.

High throughput stream ingestion

Pravega enables the ingestion capacity of a stream to grow and shrink according to workload. During ingestion Pravega splits a stream into partitions to handle a heavy traffic period, and then merges partitions when traffic is less. Splitting and merging occurs automatically and continuously as needed. Throughout, Pravega preserves order of data.

Stream filtering on ingestion

PSearch continuous queries process data as it is ingested, providing a way to filter out unwanted data before it is stored, or to enrich the data with tagging before it is stored.

Stream search PSearch queries can search an entire stored stream of structured or unstructured data.

Exactly-once semantics

Pravega is designed with exactly-once semantics as a goal. Exactly-once semantics means that, in a given stream processing application, no event is skipped or duplicated during the computations.

Key-based guaranteed order

Pravega guarantees key-based ordering. Information in a stream is keyed in a general way (for example, by sensor or other application-provided key). SDP guarantees that values for the same key are stored and processed in order. The platform, however, is free to scale the storage and processing across keys without concern for ordering.

The ordering guarantee supports use cases that require order for accurate results, such as in financial transactions.

14 Product Description

Massive data volume

Pravega accommodates massive data ingestion. In the reference architecture, Dell EMC hardware solutions support the data processing and data storage components of the platform. All the processing and storage reference hardware are easily scaled out by adding additional nodes.

Batch and publish/ subscribe models supported

Pravega, Apache Spark, and Apache Flink support the more traditional batch and publish/subscribe pipeline models. Processing for these models includes all the advantages and guarantees that are described for the continuous stream models.

Pravega ingests and stores any type of stream, including:

Unbounded byte streams, such as data streamed from IoT devices Bounded streams, such as movies and videos Unbounded append-type log files Event-based input, streaming or batched

In Apache Flink and Apache Spark, all input is a stream. Both process table-based input and batch input as a type of stream.

ACID-compliant transaction support

The Pravega Writer API supports Pravega transactions. The Writer can collect events, persist them, and decide later whether to commit them as a unit to a stream. When the transaction is committed, all data that was written to the transaction is atomically appended to the stream.

The Writer might be an Apache Flink or other application. As an example, an application might continuously process data and produce results, using a Pravega transaction to durably accumulate the results. At the end of a time window, the application might commit the transaction into the stream, making the results of the processing available for downstream processing. If an error occurs, the application cancels the transaction and the accumulated processing results disappear.

Developers can combine transactions and other features of Pravega to create a chain of Flink jobs. The Pravega-based sink for one job is the source for a downstream Flink job. In this way, an entire pipeline of Flink jobs can have end-to-end exactly once, guaranteed ordering of data processing.

In addition, applications can coordinate transactions across multiple streams. A Flink job can use two or more sinks to provide source input to downstream Flink jobs.

Pravega achieves ACID compliance as follows:

Atomicity and Consistency are achieved in the basic implementation. A transaction is a set of events that is collectively either added into a stream (committed) or discarded (aborted) as a batch.

Isolation is achieved because the transactional events are never visible to any readers until the transaction is committed into a stream.

Durability is achieved when an event is written into the transaction and acknowledged back to the writer. Transactions are implemented in the same way as stream segments. Data that is written to a transaction is as durable as data written directly to a stream.

Security Access to SDP and the data it processes is strictly controlled and integrated throughout all components.

Authentication is provided through both Keycloak and LDAP. Kubernetes and Keycloak role-based access control (RBAC) protect resources throughout the

platform. TLS controls external access. Within the platform, the concept of a project defines and isolates resources for a specific analytic

purpose. Project membership controls access to those resources.

For information about these and other security features, see the Dell EMC Streaming Data Platform Security Configuration Guide at https://dl.dell.com/content/docu103273.

More features Here are additional important capabilities in SDP.

Fault tolerance The platform is fault tolerant in the following ways:

All components use persistent volumes to store data. Kubernetes abstractions organize containers in a fault-tolerant way. Failed pods restart automatically,

and deleted pods are created automatically.

Product Description 15

Certain key components, such as Keycloak, are deployed in "HA" mode by default. In the Keycloak case, three Keycloak pods are deployed, clustered together, to provide near-uninterrupted access even if a pod goes down.

Data retention and data purge

Pravega includes the following ways to purge data, per stream:

A manual trigger in an API call specifies a point in a stream beyond which data is purged. An automatic purge may be based on size of stream. An automatic purge may be based on time.

Historical data processing

Historical stream processing supports:

Stream cuts Set a reading start point.

Apache Flink job management

Authorized users can monitor, start, stop, and restart Apache Flink jobs from the SDP UI. The Apache Flink savepoint feature permits a restarted job to continue processing a stream from where it left off, guaranteeing exactly-once semantics.

Apache Spark job management

Authorized users can monitor, start, stop, and restart Apache Spark jobs from the SDP UI.

Monitoring and reporting

From the SDP UI, administrators can monitor the state of all projects and streams. Other users (project members) can monitor their specific projects.

Dashboard views on SDP UI show recent Pravega ingestion metrics, read and write metrics on streams, and long-term storage metrics.

Heat maps of Pravega streams show segments as they are split and merged, to help with resource allocation decisions.

Stream metrics show throughput, reads and writes per stream, and transactional metrics such as commits and aborts.

Latencies at the segment store host level are available, aggregated over all segment stores.

The following additional UIs are linked from the SDP UI.

Project members can jump directly to the Flink Web UI that shows information about their jobs. The Apache Flink Web UI monitors Flink jobs as they are running.

Project members can jump directly to the Spark Web UI that shows information about their jobs. The Apache Spark Web UI monitors Spark jobs as they are running.

Administrators can jump directly to the Grafana UI with a predefined plug-in for Pravega metrics. Administrators can view Pravega JVM statistics, and examine stream throughputs and latency metrics.

Project members can jump to the project-specific Grafana UI (if the project was deployed with metrics) to see Flink, Spark, and Pravega Search operational metrics.

Project members can jump directly to the Kibana UI from a Pravega Search cluster page.

Logging Kubernetes logging is implemented in all SDP components.

Remote support Secure Remote Services and call home features are supported for SDP. These features require an SRS Gateway server that is configured to monitor the platform. Detected problems are forwarded to Dell Technologies as actionable alerts, and support teams can remotely connect to the platform to help with troubleshooting.

Event reporting Services in SDP collect events and display them in the SDP UI. The UI offers search and filtering on the events, including a way to mark them as acknowledged. In addition, some critical events are forwarded to the SRS Gateway.

SDP Code Hub The SDP Code Hub is a centralized portal to help application developers getting started with SDP applications. Developers can browse and download example applications and code templates, get Pravega connectors, and view demos. Applications and templates from Dell EMC teams include Pravega samples, Flink samples, Spark samples, and API templates. See the Code Hub here.

Schema Registry Schema Registry provides a serving and management layer for storing and retrieving schemas for application metadata. A shared repository of schemas allows applications to flexibly interact with each other and store schemas for Pravega streams.

16 Product Description

Basic terminology The following terms are basic to understanding the workflows supported by SDP.

Pravega scope The Pravega concept for a collection of stream names. RBAC for Pravega operates at the scope level.

Pravega stream A durable, elastic, append-only, unbounded sequence of bytes that has good performance and strong consistency. A stream is uniquely identified by the combination of its name and scope. Stream names are unique within their scope.

Pravega event A collection of bytes within a stream. An event has identifying properties, including a routing key, so it can be referenced in applications.

Pravega writer A software application that writes data to a Pravega stream.

Pravega reader A software application that reads data from a Pravega stream. Reader groups support distributed processing.

Flink application An analytic application that uses the Apache Flink API to process one or more streams. Flink applications may also be Pravega Readers and Writers, using the Pravega APIs for reading from and writing to streams.

Flink job Represents an executing Flink application. A job consists of many executing tasks.

Flink task A Flink task is the basic unit of execution. Each task is executed by one thread.

Spark application An analytic application that uses the Apache Spark API to process one or more streams.

Spark job Represents an executing Spark application. A job consists of many executing tasks.

Spark task A Spark task is the basic unit of execution. Each task is executed by one thread.

RDD Resilient Distributed Dataset. The basic abstraction in Spark that represents an immutable, partitioned collection of elements that can be operated on in parallel.

Project An SDP concept. A project defines and isolates resources for a specific analytic purpose, enabling multiple teams of people to work within SDP in separate project environments.

Project member An SDP user with permission to access the resources in a specific project.

Kubernetes environment

The underlying container environment in which all SDP services run. The Kubernetes environment is abstracted from end-user view. Administrators can access the Kubernetes layer for authentication and authorization settings, to research performance, and to troubleshoot application execution.

Schema registry A registry service that manages schemas & codecs. It also stores schema evolution history. Each stream is mapped to a schema group. A schema group consists of schemas & codecs that are associated with applications.

Pravega Search cluster

Resources that process Pravega Search indexing, searches, and continuous queries.

Interfaces SDP includes the following interfaces for developers, administrators, and data analysts.

Table 2. Interfaces in SDP

Interface Purpose

SDP User Interface Configure and manage streams and analytic jobs. Upload analytic applications.

Pravega Grafana custom dashboards Drill into metrics for Pravega.

Apache Flink Web User Interface Drill into Flink job status.

Apache Spark Web User Interface Drill into Spark job status.

Keycloak User Interface Configure security features.

Pravega and Apache Flink APIs Application development.

Product Description 17

Table 2. Interfaces in SDP (continued)

Interface Purpose

Project-specific Grafana custom dashboards

Drill into metrics for Flink, Spark, and Pravega Search clusters.

Project-specific Kibana Web User Interface Submit Pravega Search queries.

In addition, users may download the Kubernetes CLI (kubectl) for research and troubleshooting for the SDP cluster and its resources. This includes support for the SDP custom resources, such as projects.

User Interface (UI)

The Dell EMC Streaming Data Platform provides the same user Interface for all personas interacting with the platform.

The views and actions available to a user depend on that user's RBAC role. For example: Logins with admin role see data for all existing streams and projects. In addition, the UI contains buttons that let them

create projects, add users to projects, and other management tasks. Those options are not visible to other users. Logins with specific project roles can see their projects and the streams, applications, and other resources that are

associated with their projects.

Here is a view of the initial UI window that administrators see when they first log in. Admin see all metrics for all the streams in the platform.

Figure 4. Initial administrator UI after login

Project members (non-admin users) do not see the dashboard. They only see the Analytics and the Pravega tabs for the streams in their projects.

Grafana dashboards

SDP includes the collection, storage, and visualization of detailed metrics.

SDP deploys one or more instances of metrics stacks. One instance is for gathering and visualizing Pravega metrics. Additional project-specific metrics stacks are optionally deployed.

A metrics stack consists of an InfluxDB database and Grafana.

InfluxDB is an open-source database for storing time series data. Grafana is an open-source metrics visualization tool. Grafana deployments in SDP include predefined dashboards that

visualize the collected metrics in InfluxDB.

Developers can create their own custom Grafana dashboards as well, accessing any of the data stored in InfluxDB.

18 Product Description

Pravega metrics

In SDP, InfluxDB stores metrics that are reported by Pravega. The Dashboards page on the SDP UI shows some of these metrics. More detail is available on the predefined Grafana dashboards. Administrators can use these dashboards to drill into problems or identify developing memory problems, stream-related inefficiencies, or problems with storage interactions.

The SDP UI Dashboards page contains a link to the Pravega Grafana instance. The Dashboards page and the Pravega Grafana instance are available only to administrators.

Project metrics

An optional Metrics choice is available when a project is created. For a project that has Metrics enabled, SDP deploys a project-specific metrics stack. The InfluxDB collects metrics for Spark applications, Flink clusters, and Pravega Search clusters in the project. Predefined Grafana dashboards exist for visualizing the collected metrics.

The SDP UI project page contains a link to that project's Grafana instance. These instances are available to project members and administrators.

Application specific analytics

For projects that have metrics enabled, developers can add new metrics collections into their applications, and push the metrics to the project-specific InfluxDB instance. Any metric in InfluxDB is available for use on customized Grafana dashboards.

Apache Flink Web UI

The Apache Flink Web UI shows details about the status of Flink jobs and tasks. This UI helps developers and administrators to verify Flink application health and troubleshoot running applications.

The SDP UI contains direct links to the Apache Flink Web UI in two locations:

From the Analytics Project page, go to a project and then click a Flink Cluster name. The name is a link to the Flink Web UI which opens in a new browser tab. It displays the Overview screen for the Flink cluster you clicked. From here, you can drill into status for all jobs and tasks.

Figure 5. Apache Flink Web UI From the Analytics Project page, go to a project and then click a Flink Cluster name. Continue browsing the applications

running in the cluster. On an application page, a Flink sub-tab opens the Apache Flink UI. That UI shows the running Flink Jobs in the application.

Product Description 19

Apache Spark Web UI

The Apache Spark Web UI shows details about the status of Spark jobs and tasks. This UI helps developers and administrators to verify Spark application health and troubleshoot running applications.

The SDP UI contains direct links to the Apache Spark Web UI. From the Analytics Project page, go to a project, click Spark, and click a Spark application name. The name is a link to the Spark Web UI which opens in a new browser tab. It displays the Overview screen for the Spark application you selected. From here, you can drill into status for all jobs and tasks.

Figure 6. Apache Spark Web UI

APIs

The following developer resources are included in an SDP distribution.

SDP includes these application programming interfaces (APIs):

Pravega APIs, required to create the following Pravega applications: Writer applications, which write stream data into the Pravega store. Reader applications, which read stream data from the Pravega store.

Apache Flink APIs, used to create applications that process stream data. Apache Spark APIs, used to create applications that process stream data. PSearch APIs, used to register continuous queries or process searches against the stream. Schema Registry APIs, used to retrieve and perform schema registry operations.

Stream processing applications typically use these APIs to read data from Pravega, process or analyze the data, and perhaps even create new streams that require writing into Pravega.

What you get with SDP The SDP distribution includes the following software, integrated into a single platform.

Kubernetes environments Dell EMC Streaming Data Platform management plane software Keycloak software and an integrated security model Pravega data store and API Schema registry for managing schemas and codecs Pravega Search (PSearch) framework, query processors, and APIs Apache Flink framework, processing engine, and APIs

20 Product Description

Apache Spark framework, processing engine, and APIs InfluxDB for storing metrics Grafana UI for presenting metrics Kibana UI for presenting metrics SDP installer, scripts and other tools

Use case examples Following are some examples of streaming data use cases that Dell EMC Streaming Data Platform is especially designed to process.

Industrial IoT Detect anomalies and generate alerts. Collect operational data, analyze the data, and present results to real-time dashboards and trend

analysis reporting. Monitor infrastructure sensors for abnormal readings that can indicate faults, such as vibrations or

high temperatures, and recommend proactive maintenance. Collect real-time conditions for later analysis. For example, determine optimal wind turbine placement

by collecting weather data from multiple test sites and analyzing comparisons.

Streaming Video Store and analyze streaming video from drones in real time. Conduct security surveillance. Serve on-demand video.

Automotive Process data from automotive sensors to support predictive maintenance. Detect and report on hazardous driving conditions that are based on location and weather. Provide logistics and routing services.

Financial Monitor for suspicious sequences of transactions and issue alerts. Monitor transactions for legal compliance in real-time data pipelines. Ingest transaction logs from market exchanges and analyze for real-time market trends.

Healthcare Ingest and save data from health monitors and sensors. Feed dashboards and trigger alerts for patient anomalies.

High-speed events

Collect and analyze IoT sensor messages. Collect and analyze Web events. Collect and analyze logfile event messages.

Batch applications

Batch applications that collect and analyze data are supported.

Documentation resources Use these resources for additional information.

Table 3. SDP documentation set

Subject Reference

Dell EMC Streaming Data Platform documentation

Dell EMC Streaming Data Platform Documentation InfoHub: https://www.dell.com/support/article/us/en/19/sln319974/ dell-emc-streaming-data-platform-infohub

Dell EMC Streaming Data Platform support site: https://www.dell.com/support/home/us/en/04/ product-support/product/streaming-data-platform/overview

(This guide) Dell EMC Streaming Data Platform Developer's Guide at https://dl.dell.com/content/docu103272

Dell EMC Streaming Data Platform Installation and Administration Guide at https://dl.dell.com/content/docu103271

Product Description 21

Table 3. SDP documentation set (continued)

Subject Reference

Dell EMC Streaming Data Platform Security Configuration Guide at https://dl.dell.com/content/docu103273

Dell EMC Streaming Data Platform Release Notes 1.2 at https:// dl.dell.com/content/docu103274

NOTE: You must log onto a Dell support account to access release notes.

SDP Code Hub Community-supported public Github portal for developers and integrators. The SDP Code Hub includes Pravega connectors, demos, sample applications, API templates, and Pravega and Flink examples from the open-source SDP developer community: https:// streamingdataplatform.github.io/code-hub/

Pravega concepts, architecture, use cases, and Pravega API documentation

Pravega open-source project documentation:

http://www.pravega.io

Apache Flink concepts, tutorials, guidelines, and Apache Flink API documentation

Apache Flink open-source project documentation:

https://flink.apache.org/

Apache Spark concepts, tutorials, guidelines, and Apache Spark API documentation

Apache Spark open-source project documentation:

https://spark.apache.org

https://github.com/StreamingDataPlatform/workshop-samples/tree/ master/spark-examples

22 Product Description

Install SDP Edge

Topics:

Download installation files Install Ubuntu on Bare Metal Deploy SDP Edge Manage SDP Edge

I

Install SDP Edge 23

Download installation files

Topics:

Supported SDP Edge deployment options Download files for SDP Edge installation Prerequisites for SDP Edge

Supported SDP Edge deployment options Two bare metal deployments are supported. Install Ubuntu on bare metal. The Ubuntu iso file is included in the product download.

Install on a VMWare Open Virtual Appliance (OVA). The ova file is included in the product download.

Download files for SDP Edge installation The download includes the iso for Ubuntu and all required files for SDP installation.

Prerequisites

You need 16 GB free disk space to download these files. You need a valid Dell Technologies support account linked to your customer site.

Steps

1. Go to https://www.dell.com/support/home/en-us/product-support/product/streaming-data-platform/drivers.

2. Log in with your Dell support account.

3. Navigate to 1.2 > 1.2 Edge.

4. Download all files in the list.

Prerequisites for SDP Edge Before starting any installations, make sure your environment satisfies these prerequisites.

Table 4. SDP Edge prerequisites

Requirement Description

Bare metal nodes Each node must have at least: CPU: 16 Cores Memory: 128 GB Local Disk(s): Two 1.9-TB Disks

Node disks There are 2 types of storage usage on node disks: Kubernetes resources use persistent volumes which in turn use associated disks Pravega long term storage can optionally use an NFS configured on disks

NOTE: SDP Edge single-node deployments must use node disks for Pravega long-term storage. The 3+ node deployments can use either node disks or a Dell EMC PowerScale cluster.

2

24 Download installation files

Table 4. SDP Edge prerequisites (continued)

Requirement Description

Make sure you have attached disks with enough space to satisfy your intended use cases. Contact Dell Support for sizing discussions.

NTP servers A list of NTP servers is passed into the automated installation procedure.

Valid Dell Support account You need a Dell Support account linked to a Customer Site ID to download the installation files.

Optional SRS Gateway

An SRS Gateway is optional for SDP Edge. The default installation assumes that an SRS Gateway is not configured.

An SRS Gateway provides remote support from Dell EMC and telemetry uploads. If you have a 3-node SDP Edge deployment that would benefit from those services, contact a Dell support representative for configuration information.

Download installation files 25

Install Ubuntu on Bare Metal

Topics:

Install Ubuntu on bare metal Fix the drive used for the boot disk

Install Ubuntu on bare metal This procedure installs Ubuntu from the provided iso file, configures the boot disk, configures the network, and sets other required Ubuntu settings. Perform these steps on each baremetal node.

Steps

1. Apply the iso to the node:

a. In a browser on a workstation, go to the iDRAC IP for the baremetal node. b. Attach the iso file to the node as a CD/DVD media. c. Select the boot option boot from virtual CD/DVD. d. Select Save. e. Reboot the node using chassis warm reset.

For more detail, see https://www.dell.com/support/kbdoc/000124001/using-the-virtual-media-function-on-idrac-6-7-8- and-9

2. On the Language screen, select your language.

3. On the Keyboard screen, select your keyboard layout.

4. On the Network Connections screen:

a. Move the cursor to select the first eth interface, and press Enter. b. Scroll to select Edit IPv4 and press Enter. c. On the Edit IPv4 configuration screen, press Enter, and then select Manual. d. Complete the IPv4 configuration screen, providing all the requested information, including the subnet, address, gateway,

DNS, and search domain. e. Select Save. f. Scroll down and select Done.

5. On the Misc Configuration screens,

a. Configure the Proxy screen, and select Done. b. Configure the Ubuntu archive mirror, and select Done.

6. On the Storage Configuration screen, do the following to install the OS so that the / file system goes on disk /dev/sda. This configuration uses the maximum capacity of the disk.

a. Select Guided Configuration. b. Select Use an entire disk and Set up this disk as an LVM group.

Typically select the disk with an uneven amount of disk space. For example, in the figure that follows, notice that the sda has 371GB compared to other disks that have 894GB.

c. Press Enter. d. On the Storage configuration screen, verify that the / file system is an LVM logical volume.

3

26 Install Ubuntu on Bare Metal

7. Check the boot disk configuration as follows:

a. Scroll to the USED DEVICES section. b. Select the disk that shows a partition for either bios_grub or /boot/efi.

Here is an example that uses bios_grub.

Here is an example that uses /boot/efi.

c. Select Info. d. Verify that the Info screen shows /dev/sda for Path, as shown here:

Install Ubuntu on Bare Metal 27

e. If Path contains a value other than /dev/sda, correct this situation before continuing. See Fix the drive used for the boot disk on page 29 to fix the Path value.

8. Update the main disk partition size.

NOTE: If Path for partition 1 is not /dev/sda, do not proceed with this step. See the previous step and fix the boot

disk now. Then return here and perform this step.

a. In the Storage Configuration screen, under USED DEVICES, select the ubuntu-lv partition, and select Edit.

b. On the Edit screen, change the Size value to match the max value, and select Save.

9. Continue with installation, as follows:

a. On the Storage Configuration screen, select Done.

28 Install Ubuntu on Bare Metal

The Confirm destructive action dialog appears.

b. Select Continue.

10. On the Profile screen, enter your profile information, and select Done.

11. On the SSH setup screen, press Space to select Install OpenSSH server.Then scroll down and select Done.

12. For Featured Server Snaps, scroll down and select Done.

13. Wait for installation to complete.

The Install Complete! screen appears. When installation is complete, the Reboot option appears.

14. Select Reboot.

15. After reboot, validate connections as follows:

a. Connect to the node using ssh. b. Ping the DNS Servers and the network gateway. The gateway is the switch through which the node will communicate to

peer nodes and the outside network.

Fix the drive used for the boot disk If the boot disk is not using the /dev/sda drive, use this procedure to assign the correct drive.

Steps

1. Delete all partitions from the boot device, except for partition 1 (bios_grub).

NOTE: Typically, the system will not allow you to delete partition 1 (bios_grub).

a. On the Storage Configuration screen, in the USED DEVICES section, select a partition and then select Delete. b. Continue to select and delete partitions.

2. Scroll to the AVAILABLE DEVICES section, select the device with type of LVM volume group device, and then select Delete.

3. Under USED DEVICES, select the LVM device and then select Delete.

4. Remove partition 1 by selecting the disk associated with it, and then select Reformat.

Install Ubuntu on Bare Metal 29

5. Repeat deletions until there are no hard drives remaining in the USED DEVICES section.

6. Locate the disk with path name /dev/sda:

a. In the AVAILABLE DEVICES section, select a disk, and then select Info. b. On the Info screen that appears, check the Path field. c. If the path value is /dev/sda, make a note of the disk ID, and select Close. Proceed to Step 7.

d. If the path is not /dev/sda, select Close and repeat the steps with another disk. Continue until you find the disk with the /dev/sda path. Make a note of that disk ID.

7. Go back to the Guided Storage configuration screen.

8. Check Use an entire disk, and choose the drive ID associated with /dev/sda that you discovered above.

9. Select Done.

10. Return to the main configuration procedures in Install Ubuntu on bare metal on page 26 and continue configuration at Step 8, Update the main disk partition size.

30 Install Ubuntu on Bare Metal

Deploy SDP Edge

Topics:

Required customer information Installation Configure UI access Add trusted CA to browser Get SDP URL and login credentials

Required customer information The following tables show the information that is required to install SDP Edge.

The tables use the terms primary network and alternative network.

Primary network This is the external network for your SDP Edge system. This network is used for data collection and administrative functions, including direct browser access to the SDP UI (without the Socks5 proxy).

Alternative network

Typical SDP Edge installations do not require an alternative network. In special case situations for which the primary network is not suitable for all functionality, you can configure this optional network as an administrative network. For example, consider a configuration in which the primary network is a WIFI hotspot that is collecting data at the Edge. You might want an alternate network for management functions.

The customer information is passed to the SDP installer through the env.yaml file. A required step in the SDP Edge installation procedure is to prepare that file. We recommend that you work with your System Integrator or Dell Support to set up and validate the env.yaml file. An example is shown below.

This section contains:

Typical installation information on page 31 Special case: using primary interface as a wireless hotspot on page 33 Special case: configuring an alternative network on page 33 Configuring an external NFS server on page 34 Example env.yaml file on page 34

Table 5. Typical installation information

# Name Type Example Default Description

1 provisioner_user string ubuntu Required A sudo user valid on all nodes, with sudo access. Used by Ansible to provision the nodes.

2 ntp_servers list of strings

- " 0.ntp.pool.org iburst"

Required List of NTP servers required for time sync. Used in the NTP configuration file.

3 upstream_dns list of string

- 192.10.100.101 - 192.20.200.201

Required A list of the upstream DNS servers

4 bookeeper_ledger_dis k

string sdb Required SDP bookeeper dedicated disk

5 bookeeper_ledger_siz e

string 101 Required SDP bookeeper disk size

6 primary_network string ens160 Required Primary network interface name

4

Deploy SDP Edge 31

Table 5. Typical installation information (continued)

# Name Type Example Default Description

.interface_name

.

7 primary_network

.is_wifi

bool false Required Indicates if the primary network is wifi or wired

8 primary_network

.network_enabled

bool true Required Primary network enabled flag

9 primary_network

.floating_ips

string 192.42.0.10-192.42.0.12 Required Primary metalLB IP pool. This pool is used by SDP services that require a load balancer.

The minimum number of IPs required in the range is:

single-node deployment: 4 IPs 3-node deployment: 7 IPs

10 primary_network

.dhcp_enabled

bool false Required Enable or disable the DHCP client on the primary network interfaces. (This field does not refer to a DHCP server.)

11 primary_network

.gateway4

string 192.42.0.1 Required Primary network gateway

12 primary_network

.interface_mask

integer 24 Required Primary network interface mask

13 primary_network

.nameservers

list 192.42.0.4 Required Primary network nameservers. The SDP installation creates a local DNS server on each SDP node. Therefore, this value is the same as the primary interface IPs. Provide one IP for single-node or a list of IPs for 3- node deployments.

14 alternative_network

.network_enabled

bool false Required Alternative network enabled flag

15 local_nfs bool true Required Indicates whether to set up an NFS server on the SDP Edge node.

16 base_domain_name string sdp-demo.org Required Base domain used for SDP deployment

17 sdpdir string /desdp/decks-installer Required Location where the playbook will set up the SDP installer

18 sdp_values string values.yaml Required Name of the SDP installer configuration values file

19 sdp_version string 1.2 Required SDP version

20 decks_installer_versi on

1.2.0.0-a0c68f9b Required Taken from the Decks installer zip file name

21 sdp_domain_name edge. {{base_domain_name}}

Required The {{base_domain_name}} is an Ansible variable that is initialized to the base_domain_name value (row 16 above) and is mandatory as shown.

You may change the prefix.

22 dockerport 31001 Required Local docker registry port for SDP installer

32 Deploy SDP Edge

Table 5. Typical installation information (continued)

# Name Type Example Default Description

23 sdpregistry string sdp-registry: {{dockerport}}/desdp

Required Docker registry URL for SDP installer

Table 6. Special case: using primary interface as a wireless hotspot

# Name Type Example Default Description

1 primary_network

.interface_name

string wlp3s0 Required Primary network interface name

2 primary_network

.is_wifi

bool true Required Primary network if this is wifi or wired

3 primary_network

.interface_ip

string 192.42.0.1 Required Primary network inteface IP

4 primary_network

.network_enabled

bool true Required Primary network enabled flag

5 primary_network

.floating_ips

string 192.42.0.10-192.42.0.1 2

Required Primary metalLB IP pool

6 primary_network

.dhcp_enabled

bool false Required Primary network dhcp enabled/disabled

7 primary_network

.enable_hot_spot

bool true Required Primary network enable/disable hotspot

8 primary_network

.hotspot_password

string 9781c142 Required Primary network hotspot password

9 primary_network

.hotspot_access_point

string Edge_Hotspot Required Primary network hotspot access point

10 primary_network

.interface_mask

integer 24 Required Primary network interface mask

11 primary_network

.nameservers

list 192.42.0.4 Optional Primary network nameservers. The SDP installation creates a local DNS server on each SDP node. Therefore, this value is the same as the primary interface IPs. Provide one IP for single-node or a list of IPs for 3-node deployments.

Table 7. Special case: configuring an alternative network

# Name Type Example Default Description

1 alternative_network

.interface_name

string enp184s0f0 Required Alternative network interface name

2 alternative_network

.is_wifi

bool false Required Indicates if the alternative network is wifi or wired

3 alternative_network

.network_enabled

bool true Required Alternative network enabled flag

Deploy SDP Edge 33

Table 7. Special case: configuring an alternative network (continued)

# Name Type Example Default Description

4 alternative_network

.floating_ips

string 192.0.30.101-192.0.30. 102

Required Alternative metalLB IP pool

5 alternative_network

.dhcp_enabled

bool true Required Alternative network DHCP client enabled or disabled

6 alternative_network

.interface_mask

integer 24 Optional Alternative network interface mask needed for static IP configuration

7 alternative_network

.metallb_name

string wired Required Alternative network interface mask

8 alternative_network

.nameservers

list Required Alternative network DNS is the same IP as the alternative network interface.

Table 8. Configuring an external NFS server

# Name Type Example Default Description

1 local_nfs bool false Required Set to false to use an external NFS server Set to true to set up an NFS server on the

SDP Edge node.

2 nfs_server string 192.0.3.240 Optional. Not needed if local_nfs is set to true.

Identifies the external NFS server.

3 nfs_path string /ifs/123/1234 Optional. Not needed if local_nfs is set to true.

The external NFS path.

Example env.yaml file

Here is an example env.yaml file. A step in the installation procedure is to edit this file.

# variables ##################### Deploy ssh key ##################### provision_user: edge disable_password_auth: false disable_root_login: false add_new_user: false ##################### NTP service ##################### ntp_enabled: true ntp_servers: - "0.ntp.pool.org iburst" - "192.2.2.2.2 iburst" # kubespray deployment # for single node, lts will be on the same disk bookeeper_ledger_disk: sdb

34 Deploy SDP Edge

bookeeper_ledger_size: 101 # Primary network primary_network: interface_name: wlp3s0 is_wifi: true interface_ip: 192.2.0.1 interface_mask: 24 network_enabled: true floating_ips: 192.2.0.21-192.2.0.23 dhcp_enabled: false enable_hot_spot: true hotspot_password: "9781c142" hotspot_access_point: "Edge_Hotspot" # The upstream DNS Servers. upstream_dns: - 192.2.2.10 - 192.2.2.20 # Alternative network alternative_network: network_enabled: true is_wifi: false dhcp_enabled: true interface_name: enp184s0f0 metallb_name: wired floating_ips: 192.2.10.81-192.2.10.82 nameservers: - 192.2.10.85 # using the alternative network to bind DNs server on the node # socks5 proxy LB pool socks5_address_pool: wired # SDP variables local_nfs: true base_domain_name: cluster1.sdp-demo.org sdpdir: /desdp/decks-installer sdp_values: values.yaml sdp_version: 1.2 decks_installer_version: 1.2.0.0-9a2c3df sdp_domain_name: "sdp.{{base_domain_name}}" # docker registry dockerport: 31001 sdpregistry: "sdp-registry:{{dockerport}}/desdp" cadir: desdp/ca

Installation Install SDP Edge.

Steps

1. Collect installation files on the host.

Place all downloaded files, except the Ubuntu ISO, in the ~/desdp directory of the SDP host.

The ~/desdp directory must not contain multiple versions of the same file. For example, make sure there is only one decks-installer file.

2. Extract the installation files using the sdp-extract-all.sh script.

edge@edge1:~/desdp$ chmod a+x sdp-extract-all.sh ./sdp-extract-all.sh

Deploy SDP Edge 35

3. Copy your customer-specific SDP license file into the required location in the ~/desdp/ directory. The license file name must be license.xml.

edge@edge1:~$ cp license.xml ~/desdp/sdp-auto-installer/ansible/roles/sdpinstaller/ files/

If you do not have a license file, see Obtain and save the license file on page 48 for information about obtaining your SDP license.

NOTE: An evaluation license file is included in the downloaded files. If needed, you may complete the installation with

the temporary license and reapply the installation later using a permanent license.

4. Save a copy of the downloaded inventory.ini file, and then edit inventory.ini with your site details.

The inventory.ini file identifies the hosts and IPs for the kubespray cluster.

a. Save a copy of the original inventory.ini.

For example:

cd ~/desdp/sdp-auto-installer/ansible cp inventory.ini inventory.ini.org

b. Edit inventory.ini in the ~/desdp/sdp-auto-installer/ansible directory.

c. Replace the IP with the Ansible ssh host IP.

For single-node deployments, you would have only one ansible_ssh_host listed. For 3-node deployments, you need one line per node.

For example:

cd ~/desdp/sdp-auto-installer/ansible nano inventory.ini << [kubespray] node01 ansible_ssh_host=10.243.55.52 node02 ansible_ssh_host=10.243.55.53 node03 ansible_ssh_host=10.243.55.54 >>

d. Save and exit.

5. Save a copy of the downloaded env.yaml file, and then edit env.yaml with details for your site.

a. Save a copy of the original env.yaml.

For example:

cd ~/desdp/sdp-auto-installer/ansible cp env.yaml env.yaml.org

b. Edit the env.yaml file in the ~/desdp/sdp-auto-installer/ansible directory.

cd ~/desdp/sdp-auto-installer/ansible nano env.yaml

The env.yaml file contains all the variables that are used in the installation process. For variable explanations and an example, see Required customer information on page 31.

6. Run the SDP run_sdp_auto_installer.sh script.

Run time is approximately 65 minutes in a medium sized VM setup. Your results might vary, depending on the hardware in your setup.

cd ~/desdp/sdp-auto-installer/ansible ./run_sdp_auto_installer.sh

NOTE: If you are prompted for an SSH password, provide the password for the user account that you are currently

using. This prompt is specifically for Ansible to log in over SSH as the provisioning user.

36 Deploy SDP Edge

This script does the following: Configures the OS. Configures the network. Installs and configures a Bind9 DNS server. Performs other required configurations. Installs Kubespray. Installs SDP.

7. Reboot the server.

The reboot ensures that all DHCP clients are serviced by the correct DNS server (the Bind9 DNS server).

sudo reboot

8. Wait about 4 minutes, and SSH into the server.

9. Confirm DNS operation.

a. Use the following command to check the DNS servers:

systemd-resolve --status | grep "DNS Servers"

b. Confirm that all DNS servers return the Bind9 IP address ${ALT_INTERFACE_IP}.

#expected output: DNS Servers: 10.0.30.100 DNS Servers: 10.0.30.100

10. Confirm connection to the wireless network.

Ping a node on your wireless network.

ping 10.42.0.41

11. (Optional) Install a package from an online repo.

If you need to install an APT package from the Internet, perform these steps:

sudo cp /etc/apt/ubuntu_sources.list /etc/apt/sources.list sudo apt update sudo apt install To change back to offline repos, if required: sudo cp /etc/apt/offline_sources.list /etc/apt/sources.list

Configure UI access Configure access to the SDP browser-based user interface.

About this task

You may configure access to the SDP UI in either of the following ways: Enable access through browsers on the SDP LAN. Enable access through a VPN using the Socks5 proxy in a Chrome browser.

The kubeconfig necessary to access the cluster is set automatically during Kubespray installation. No need to login separately to kubectl to run the following commands.

Enable access through browsers on the SDP LAN

About this task

This configuration is optional and typically not needed. It configures a browser connected to the Primary Network (wireless) to view the SDP UI.

Deploy SDP Edge 37

Steps

1. Get the SDP external ingress details.

The following command gets the ingress details in the appropraite format for the /etc/hosts file:

kubectl get ingress -A | awk '{ print $4, $3 }'

2. Copy the output from the previous command.

3. On each host where the browsers are installed, edit the corresponding hosts file and paste the copied information.

The hosts files are located here:

Linux /etc/hosts Windows C:\Windows\System32\Drivers\Etc\Hosts

4. Go to Get SDP URL and login credentials on page 39.

Enable access through a VPN using the Socks5 proxy

Steps

1. Enable the Socks5 proxy.

kubectl apply -f ~/desdp/site-config/etc/socks5-service.yaml kubectl get svc -n kube-system socks5-proxy

2. Make a note of the EXTERNAL-IP address from the above command.

This address is your SOCKS5 proxy IP address.

3. Add Trusted Certificate Authority to your browser.

Copy the certificate (.crt) from ~/desdp/certs and add it to your browser or OS as a trusted certificate authority (CA).

4. Configure browsers to use the Socks5 proxy as follows:

a. Open Chrome. b. Open the following link and install it.

https://chrome.google.com/webstore/detail/proxy-switchyomega/ padekgcemlokbadohgkifijomclgjgif?hl=en

c. Create a profile name.

In the upper right corner of the browser window, click the SwitchyOmega icon (a circle) and select + New Profile.

Type a new profile name. The new name appears in the list of profiles in the main window.

d. Select the profile name in the list, and configure that profile as follows:

Set the Protocol to SOCKS5 Set the Server to your SOCK5 IP address from a prior step. Set the Port to 1080

Here is an example:

38 Deploy SDP Edge

e. In the upper right corner of the browser window, click the SwitchyOmega icon (a circle) and select the profile name. This action activates the profile.

5. Go to Get SDP URL and login credentials on page 39.

Add trusted CA to browser Each user who needs access to the SDP UI must add the trusted Certificate Authority to their browser.

Steps

1. Copy the certificate (.crt) from ~/desdp/certs.

2. Add the certificate to the browser or operating system as a trusted certificate authority (CA) for identifying web sites.

3. If you are using an intermediate CA, users may need to trust both the root CA and the intermediate (technical) CA.

Get SDP URL and login credentials Log into the SDP UI.

Steps

1. Determine the URL: The SDP UI URL is:

https://${SDP_DOMAIN_NAME}

where SDP_DOMAIN_NAME is the value you specified in the environment variable file before installing the product.

You can get the domain name with the following command:

kubectl get ing -n nautilus-system nautilus-ui

In the output, the FQDN in the HOSTS field is the SDP_Domain_Name to use in the URL.

2. Get the login credentials for the default admin account.

Deploy SDP Edge 39

The default admin user name is desdp.

To get the password value, ask a cluster admin or other user with access to the nautilus-system namespace to run this command:

kubectl get secret keycloak-desdp -n nautilus-system -o \ jsonpath='{.data.password}' | base64 -d ; echo

3. Browse to the SDP UI URL and login.

40 Deploy SDP Edge

Manage SDP Edge

Topics:

Add new user in SDP Edge Create a project Set retention size on streams Shutdown and restart the Kubespray cluster Add a node Remove a node Upgrade from single-node to 3-node deployment Backup Recover the control plane

Add new user in SDP Edge Add users to the Keycloak instance to provide them with access to the SDP UI.

About this task

1. Add a new user account on the Keycloak UI, as described below. 2. Give that user access to a project by making the user a project member.

NOTE: It is not possible to give a Keycloak local user access to the Kubernetes command line.

Only cluster-admin users have access to the Kubernetes command line.

Steps

1. In a browser window, go to the Keycloak endpoint in the SDP cluster.

To list connection endpoints, see Obtain connection URLs on page 74. If the SDP UI is open, you can try prepending keycloak. to the UI endpoint. For example, http://

keycloak.sdp.lab.myserver.com. Depending on your configuration, this might not always work.

2. On the Keycloak UI, click Administration Console.

3. Log in using the keycloak administrator username (admin) and password.

To get the password value, ask a cluster admin or other user with access to the nautilus-system namespace to run this command:

kubectl get secret keycloak-http -n nautilus-system -o \ jsonpath='{.data.password}' | base64 -d ; echo

4. Click Manage > Users.

5. On the Users screen, click Add User on the right.

6. Complete the form.

NOTE: The username must conform to Kubernetes and Pravega naming requirements as described in Naming

requirements on page 103.

7. Optionally click the Credentials tab to create a simple initial password for the new user.

Create a temporary password. Enable Temporary, which prompts the user to change the password on the next login.

8. To authorize the new user to perform actions and see data in SDP, make the user a member of projects.

5

Manage SDP Edge 41

Create a project Create a project on the SDP UI.

Steps

1. Log in to SDP as an admin.

2. Click the Analytics icon.

The Analytic Projects table appears.

3. Click Create Project at the top of the table.

4. In the Name field, type a name that conforms to Kubernetes naming conventions.

The project name is used for the following: Project name in SDP UI The Kubernetes namespace for the project A local Maven repository for hosting artifacts for applications defined in the project The project-specific Pravega scope Security constructs that allow any Flink Applications in the project to have access to all the Pravega streams in the

project-specific scope

5. In the Description field, optionally provide a short phrase to help identify the project.

6. Configure storage for the project.

For SDP Edge, long-term storage is NFS on either a PowerScale cluster or on node disks. The medium is configured during installation.

NOTE: For single-node deployments, long-term storage is always on node disks.

Long-term storage type

Field name Description

NFS Storage Volume Size Provide the size of the persistent volume claim (PVC) to create for the project. This value is the anticipated space requirement for storing streams that are associated with the project.

SDP provisions this space in the configured PowerScale file system or node disks, depending on how SDP Edge was configured during installation.

Maven Volume Size Provide the size of the PVC to create for the Maven repository for the project. This value is the anticipated space requirement for storing application artifacts that are associated with the project.

7. Under Metrics, choose whether to enable or disable project-level analytic metrics collection.

The option is enabled by default. Data duration policy is set to two weeks.

For more information about Metrics, see the Dell EMC Streaming Data Platform Developer's Guide at https://dl.dell.com/ content/docu103272.

8. Click Save. The new project appears in the Analytic Projects table in a Deploying state. It may take a few minutes for the system to create the underlying resources for the project and change the state to Ready.

9. Create streams in the project.

a. Go to Pravega > project-name. provide neccessary details(name/type/retention). Save b. Click Create stream. c. Provide stream configuration details such as name and type. d. Best practice is to set retention sizes on all streams.

Retention size is particularly important when long-term storage is on node disks. By setting appropriate retention sizes, you ensure against space problems and resulting system downtime.

If you skip retention size setting now, you can edit the stream later to set retention size.

For retention size calculation suggestions, see the next section called Set retention size on streams on page 43.

42 Manage SDP Edge

Set retention size on streams When long-term storage is defined on node disks, space is a limited resource and must be managed appropriately. Retention size enforces a limit on the space that a stream can consume.

About this task

When SDP Edge uses node disks for long-term storage, the following best practices are recommended: Realize that the disk space is shared by all projects. Allocate 50% of the available disk space for streams. The 50% should be distributed across all the streams for all the

projects on the platform. Set retention sizes on each stream. Set sizes that enable the system to enforce the preferred 50% allocation. For example,

if the disk size is 100GB and you have two projects, each with 1 stream, then we recommend setting retention sizes as 25GB for stream1 and 25GB for stream2.

Retention size is the number of MBytes to retain in the stream. The remainder at the older end of the stream is discarded.

Retention size can be set when a stream is created. However, developers typically create streams and may skip that step. Administrators should edit each stream definition to set retention sizes.

Steps

1. To find the size of your configured long-term storage when that storage is on node disks:

df -h /desdp/lts

2. On the SDP UI, click Pravega.

A list of Pravega scopes appears. A scope represents a project in Pravega.

3. Click a Scope.

A list of all the streams in the scope appears.

4. Click the Edit action for a stream.

5. Scroll to view the Retention Policy section.

6. Make sure the toggle at the top is set to on (green).

7. Click Retention Size.

8. Type the size in MB, according to the guidelines above.

9. Click Save

10. Continue the process for each stream in each scope. Make sure that the total of all retention sizes equals about 50% of the total disk size.

Shutdown and restart the Kubespray cluster

Steps

1. To stop Kubespray on a three-node deployment, use the shutdown command on each node.

sudo shutdown

2. Restart the nodes to restart Kubespray.

3. To restart one node, you can use the reboot command:

sudo reboot

Manage SDP Edge 43

Add a node You can add a node to a Kubespray cluster.

About this task

Add a new node to an existing cluster in the following scenarios: Replace a nodeTo prevent disruption of current processing, add the new node and then remove the old node that is in a

failed or down state. See Remove a node on page 44. NOTE: Do not use this process to replace the node in a single-node deployment. If the replacement process failed for

any reason, the entire system would be down. To replace the node in a single-node deployment, start from the beginning

with a new deployment.

Upgrade from a single-node to a 3-node deploymentIn this case, add the two new nodes first, and then run the Upgrade playbook. See Upgrade from single-node to 3-node deployment on page 45.

Steps

1. In a 3-node cluster, connect to the primary node. Run all steps on the primary node.

2. Set up the node inventory.

cp -r inventory/sample inventory/mycluster declare -a IPS=(192.10.1.3 192.10.1.4 192.10.1.5 192.10.1.6)

Where: the node IPs identify all the nodes in the cluster including the new one you are adding. In the above example, the first

node is the primary node. The next two are also existing nodes in the cluster. The last node is the new one to add.

3. Set required environment variable.

CONFIG_FILE=inventory/mycluster/hosts.yml python3 contrib/inventory_builder/.py $ {IPS[@]}

4. Verify that passwordless ssh is possible from the primary node to the node being added.

5. On the primary node, run the Ansible playbook, specifying that execution should occur on the new node only.

ansible-playbook -i inventory/mycluster/hosts.yml scale.yml l <nodeN>

where:

-l <nodeN> limits the playbook run to the identified node. Continuing with the example in step 2, which lists four IPs, the new node is node4.

Remove a node Remove a failed or unreachable node from a 3-node Kubespray cluster.

About this task

Steps

Run the remove-node Ansible playbook, specifying the node to remove in the command line.

ansible-playbook -i inventory/mycluster/hosts.yml remove-node.yml extra-vars node=

Where:

44 Manage SDP Edge

identifies the number of the node to remove. If there are 4 nodes in the inventory and you want to remove the second one in the list, the command would be:

ansible-playbook -i inventory/mycluster/hosts.yml remove-node.yml extra-vars node=node2

Upgrade from single-node to 3-node deployment Use the following playbook to change a single-node SDP Edge deployment to a 3-node deployment.

Steps

1. Add the two new nodes to the cluster as described in Add a node on page 44.

2. Ensure that pod disruption budgets are correct.

This procedure cordons, drains, and uncordons nodes. Draining fails if pod disruption is not correct.

3. Set up the node inventory.

cp -r inventory/sample inventory/mycluster declare -a IPS=(192.10.1.3 192.10.1.4 192.10.1.5)

Where: The first node IP is the existing node. The next two IPs are he new nodes being added.

4. Set required environment variable.

CONFIG_FILE=inventory/mycluster/hosts.yml python3 contrib/inventory_builder/.py $ {IPS[@]}

5. Run the upgrade-cluster Ansible playbook.

ansible-playbook -i inventory/mycluster/hosts.yml upgrade-cluster.yml

Backup Use Etcd to back up SDP Edge control plane.

About this task

Steps

Run etcd backup on SDP endpoints.

sudo ETCDCTL_API=3 etcdctl -- endpoints=https://192.10.1.3:2379,https://192.10.1.4:2379,https://192.10.1.5:2379 -- cacert=/etc/ssl/etcd/ssl/ca.pem --cert=/etc/ssl/etcd/ssl/ca.pem --key=/etc/ssl/etcd/ssl/ ca-key.pem snapshot save snapshotdb

Recover the control plane Use the recover-control-plane.yml playbook.

About this task

Manage SDP Edge 45

Steps

Run the recover-control-plane.yml playbook.

ansible-playbook -i inventory/mycluster/hosts.yml -e override_system_hostname=false -e dashboard_token_ttl=0 -e kubectl_localhost=true \ -e kubeconfig_localhost=true -e kube_version_min_required=n-2 --become --become- user=root recover-control-plane.yml \ --limit=etcd,kube-master -e ignore_assert_errors=yes --ask-pass --ask-become-pass -vvv e etcd_snapshot=/home/ubuntu/snapshotdb

46 Manage SDP Edge

Install SDP Core

Topics:

Site Prerequisites Configuration Values Install SDP Core

II

Install SDP Core 47

Site Prerequisites

Topics:

Obtain and save the license file Deploy SRS Gateway Set up local DNS server Provision long-term storage on PowerScale Provision long-term storage on ECS

Obtain and save the license file SDP is a licensed product and requires the license file for installation. An evaluation license is available in the distributed images as a default. The license file is an XML file.

Prerequisites

Obtain the license activation code (LAC) for SDP. Customers typically receive the LAC in an email when they purchase a license.

Have your support.dell.com account credentials available.

Steps

1. In a browser, search for Dell Software Licensing Center.

2. Log in using your support.dell.com account credentials.

3. In the Software Licensing Center (SLC), perform a search for your LAC number.

4. Follow the SLC wizards to assign your license to a system and activate the license.

5. Use the SLC feature to download the license file.

6. Save the license file to a location accessible to the host on which you will run the installation commands.

The license file path is required information during installation.

7. Do not alter the license file in any way. Altering the file invalidates the signature and the product will not be licensed.

NOTE: Be careful if you FTP the license file. Always use a binary FTP; otherwise, FTP may cause a file alteration and

invalidate the license.

Deploy SRS Gateway For typical production deployments, you must have a Secure Remote Services Gateway (SRS Gateway) at your site. Dark sites and sites with evaluation licenses may skip this step.

The SRS Gateway supports the following features in SDP:

Collection and forwarding of events to Dell technical support Call home features to alert Dell EMC technical support of problems Remote access to your SDP cluster by Dell EMC technical support for log collection and other troubleshooting activities, if

you authorize such access.

SDP Core requires an SRS Gateway v3.38 or greater. Follow standard Dell EMC procedures for deploying the SRS Gateway. See the Secure Remote Services 3.38 Installation Guide at https://www.dellemc.com/en-us/collaterals/unauth/technical-guides- support-information/2019/08/docu95325.pdf.

SRS Gateway deployment requires a support.dell.com account to register the SRS Gateway to the Dell EMC SRS backend services.

6

48 Site Prerequisites

Save the following information for later use in the configuration values files:

IP address or Fully Qualified Domain Name of the SRS Gateway Credentials (username and password) of the support.dell.com user account

Provide the above connection information in the SDP configuration values file, in the srs-gateway: section. See Configure or remove connection to the SRS Gateway on page 64 for details.

Set up local DNS server A local DNS server enables convenient connections to SDP endpoints from external requests.

In a production deployment, a local DNS server enhances your user experiences. The local DNS server resolves requests that originate outside the cluster to endpoints running inside the Kubernetes cluster. Although it is not technically required to install SDP, a local DNS server is recommended. The installation and connection instructions later in this guide assume that you have a local DNS server set up.

NOTE: Do not use the corporate DNS server for this purpose.

Local DNS installed in the internal network (CoreDNS or BIND)

Set up the local DNS server on the vSphere used by SDP or use another local DNS server elsewhere in your network. Save the connection information to the server for later use in the configuration values file.

Cloud DNS You may use a cloud DNS. To use cloud DNS solutions, such as AWS Route53 or Google Cloud DNS, you must have an account with the cloud provider. Save the account name and credentials to the account for later use in the configuration values file.

Provide the connection details for the local DNS server in the configuration values file, in the external-dns: section. See Configure connections to a local DNS on page 58.

Provision long-term storage on PowerScale To use a PowerScale cluster for SDP long-term storage, provision the file system and gather the relevant information.

Provision a file system on the PowerScale cluster using standard PowerScale procedures. Then gather the following information, which is required to configure SDP:

PowerScale cluster IP address Mount point Mount options

Add this information to the configuration values file, in the nfs-client-provisioner: section. See Configure long-term storage on PowerScale on page 59.

Provision long-term storage on ECS To use an ECS appliance for SDP long-term storage, provision the namespace and a bucket for Pravega, and gather the relevant information.

Prerequisites

Check with your Dell EMC Support team to ensure that the ECS version you are using has incorporated the STORAGE-27535 fix.

SDP supports long-term storage on an ECS appliance, in a namespace containing S3 buckets. You may need a load balancer for HTTP traffic to the ECS nodes. ECS consists of data nodes that can handle HTTP

requests from external clients. Pravega uses the ECS Smart Client feature to balance traffic to the data nodes, and does not need a load balancer. Apache Flink is not compatible with the ECS Smart Client feature. If you intend to use SDP to run Apache Flink

applications, there must be a load balancer in front of ECS. Any Layer-4 load balancer (either hardware or software) may be used to load balance the HTTP traffic.

Site Prerequisites 49

NOTE: For a software load balancer, such as HAProxy, the load balancer must be configured outside of the SDP cluster.

SDP does not provide a node for the ECS load balancer.

Steps

1. Define an ECS namespace for SDP.

Using standard ECS procedures, provision the namespace with the following attributes:

a. Assign a name that indicates the namespace is for a SDP installation. b. Enable replication on the namespace and add it to a Replication Group.

NOTE: GeoReplication is not supported.

c. Define an ECS administrator account with admin privileges to the namespace.

Internal components in SDP use these credentials to create buckets in the namespace as users create projects.

2. Define one bucket in the namespace.

Using standard ECS procedures, provision the bucket with the following attributes:

a. It must be an S3 bucket. b. The bucket must not have any data in it prior to SDP installation. c. Name the bucket. Pravega and its segment store use this bucket for general purposes. A name that includes pravega

provides context. d. Create SecretKey credentials to control access to the bucket.

NOTE: IAM credentials are not supported.

3. Gather configuration information for later use in the configuration values file. See Configure long-term storage on ECS on page 59.

Namespace name Replication group name ECS Object API endpoint (for the pravega.ecs_s3.uri field in the configuration values file)

ECS Management API endpoint ECS admin credentials Bucket name that you provisioned for Pravega SecretKey credentials to the Pravega bucket: Access key and secret key. If the ECS management or Object API endpoints use custom trusts (self-signed certificates), download the certificates.

50 Site Prerequisites

Configuration Values

Topics:

About configuration values files Prepare configuration values files Source control the configuration values files Validate the configuration values Configure global platform settings TLS configuration details Configure connections to a local DNS Configure long-term storage on PowerScale Configure long-term storage on ECS Configure or remove connection to the SRS Gateway Enable periodic telemetry upload to SRS Configure default admin password

About configuration values files Configuration values files contain configuration settings for SDP. These files are required input to the installation command.

Purpose SDP configuration and deployment options must be planned for and specified in configuration files before the installation takes place. The installer tool uses the configuration values during the installation process.

NOTE: Some settings cannot be changed after installation, requiring an uninstall and reinstall.

The configuration values serve the following purposes: Enable and disable features. Set high-level customer-specific values such as server host names and required licensing files. Set required secrets for component communication. Configure required storage for the platform. Configure features. Override default values that the installer uses for sizing and performance resources. Override default names that the installer users for some components.

Template See Template of configuration values file on page 148. The template contains the configuration settings for SDP installer.

File format A configuration values file contains key-value pairs in YAML syntax. Spacing and indentation are important in YAML files.

The sections in the values file are named according to the component that they are configuring. For example, the section that contains configuration values for the SRS Gateway is named srs-gateway.

If you copy from the template, notice that the entire template comments out all the sections. Be sure to remove the # characters from the beginnings of lines to uncomment sections that you copy.

Multiple configuration values files

The SDP installer accepts a comma-separated string of configuration value file names. Some sites prefer using one large file that contains all the values, and others prefer multiple files. With multiple files, you can isolate sensitive values and separate permanent values from values that might require more frequent updates.

Override values during installation

The SDP installation command provides several options that override the values in configuration values files. See the --set, --set-file, and --config options in the decks-install apply command description here.

7

Configuration Values 51

Prepare configuration values files This procedure describes the values that are essential to a successful SDP installation. You may optionally add other values that you see documented in the templates or elsewhere throughout the documentation.

Steps

1. Create one or more text files to hold the configuration values.

The installation command accepts multiple file names for values.

2. Add information to the configuration values files as described in the following sections.

3. Save the values files in a secure location. These files are required input to the installer command.

The files are typically named values.yaml or similar, but that name is not required.

Some secrets may be in plain text. For this reason, Dell recommends that you source-control the values files. You can split the secrets into separate files and strictly control access to them. The installer tool accepts multiple

configuration value file names in a comma-separated string.

Source control the configuration values files We recommend using your enterprise source control procedures to protect the configuration values files.

Access to the configuration values must be limited and protected for the following reasons:

The values files are your record of your current configuration. To make adjustments to your configuration, you will want to edit the current configuration values, making needed changes to the current configuration.

NOTE: Values are not carried over internally. Every reapply of the configuration uses the values that are provided in the

values files that you use in the command.

A running record of changes that were made to the configuration might be useful for research purposes when you are fine-tuning some of the operational values.

The values files may contain secrets.

Validate the configuration values SDP includes a script that validates the configuration values files before you include them in an installation command.

The validate-values.py script checks that required values are included and that values are specified in an acceptable format. The script will indicate if something required is missing, or if you are good to continue with installation. Resolve all missing items identified by the validate-values.py script and re-run it until it indicates you are ready to proceed with the installation.

You cannot use the validate-values.py script until you set up your local environment with required tools and extract the product files, as described in the Installation chapter. That chapter also includes the procedure for validating configuration values by running the validate-values.py script.

See Run the validate-values script on page 71 for information about running validate-values.py on demand.

Configure global platform settings The global section of the configuration values file sets platform-wide installation choices.

This configuration is required to set external connection to the platform UI, set the type of long-term storage for the cluster, and other platform-wide settings.

Copy the global: section from the template, or copy the following example:

#global: # bookkeeperDeployment: k8s # storageType: nfs | ecs_s3 # external: # host: "" #tld that services will use

52 Configuration Values

# clusterName: "" # tls: true | false # darksite: true | false # ingress: # annotations: # kubernetes.io/ingress.class: nautilus-nginx # kubernetes.io/tls-acme: "true" # # Custom CA trust certificates in PEM format. These certificates are injected into certain Java components. # The main use for them at the moment is when communicating with an ECS Object endpoint that uses custom trust, i.e. Self Signed Certificates # tlsCABundle: # ecsObject: |- # -----BEGIN CERTIFICATE----- # MIIDnDCCAoSgAwIBAgIJAOlxdGLbM3vBMA0GCSqGSIb3DQEBCwUAMBYxFDASBgNV # BAMTC0RhdGFTZXJ2aWNlMB4XDTIwMDIxOTE5MzMzNVoXDTMwMDIxNjE5MzMzNVow # ...

Table 9. Configure global settings

Name Description

bookkeeperDeployment:k8s Controls how to deploy the Bookkeeper component (Tier 1 storage). Must be set to k8s, which deploys Bookkeeper in the SDP cluster.

storageType The long-term storage solution (Pravega Tier 2 storage) for this instance of SDP. Changing the storage type after installation is not supported.

NOTE: If this parameter is not in the configuration values files, the value defaults to nfs.

Choose one of the following values: nfs for a Dell EMC PowerScale cluster. Then configure the nfs-client-

provisioner: section with connection details.

ecs_s3 for an S3 namespace on an ECS appliance. Then configure the pravega:ecs_s3: section with connection details.

external.host: Required. The top-level domain (TLD) name you want to assign to SDP master node. This value is visible to end users in the URLs they use to access the UI and other endpoints running in the cluster.

The format is "<name>.<host-fqdn>" where:

<name> is your choice.

<host-fqdn> is the fully qualified domain name of the server hosting SDP.

For example, in xyz.desdp.example.com, xyz is <name> and desdp.example.com is the host-fqdn.

This field is setting the top-level domain name (TLD) from the perspective of SDP. The product UI is served off https:// . The Grafana UI is served off https://grafana. , and so on, for the other endpoints.

For example, a TLD of xyz.desdp.example.com serves the UI off https://xyz.desdp.example.com and Grafana off https:// grafana.xyz.desdp.example.com. The DNS server has authority to serve records for *.xyz.desdp.example.com.

external.clusterName: Required. The name that you plan to use for the SDP Kubernetes cluster. This value is the name of the cluster to create in Kubernetes.

NOTE: The SDP installer propagates this value into the Helm charts.

external.tls: Controls whether TLS is enabled or disabled. Set to true or false.

If true, other values are required that specify the type of certificates to use and configures those certificates. See TLS configuration details on page 54 for all TLS options and how to configure them.

Configuration Values 53

Table 9. Configure global settings (continued)

Name Description

If false, TLS configurations are ignored.

external.darksite: true Defaults to false. Add this line and set to true if your installation does not have an SRS Gateway.

ingress: Leave the default annotations as shown. The first annotation specifies which ingress controller can act on platform

ingress. The SDP installer deploys the Nginx Ingress Controller with -- ingress-class=nginx-nautilus. The controller handles all ingresses that have this annotation.

The second annotation specifies that the platform Cert Manager should automatically provision the TLS certificate for the ingress endpoint.

tlsCABundle: This section contains a collection of custom CA trust certificates in PEM format. If you do not use custom CA certificates, leave blank.

tlsCABundle.ecsObject: This field is required if your site uses a custom CA trust certificate for the object API endpoint for long-term storage on ECS. Copy the entire certificate contents and paste here. See the last step in Configure long-term storage on ECS on page 59.

TLS configuration details SDP supports Transport Layer Security (TLS) for external connections to the platform.

TLS is optional. The feature is enabled or disabled with a true/false setting in the configuration values file. You have the following certificate authority (CA) options.

Table 10. TLS options

Option Description

Let's Encrypt Let's Encrypt is an open certificate authority, provided by the Internet Security Research Group (ISRG). It provides free digital certificates for HTTPS and TLS access. If you configure this option in SDP, then obtaining, securing, and renewing certificates is automated.

Private CA You can generate a private certificate key and certificate, add the certificate to a trust store and make it available to the SDP installer.

Enterprise or well-known CA For these options, you extract the Certificate Signing Requests (CSRs) from the SDP installer and send them to the CA. The CA will issue the signed certificates which you will need to install. SDP provides a cli-issuer tool to facilitate handling of CSRs and installing signed certificates.

The following sections provide configuration details for each of the TLS options.

Enable or disable TLS

Enable or disable TLS in the initial installation.

About this task

You cannot reconfigure the tls: {true | false} setting after initial installation. The change would require an uninstall and reinstall procedure. You may change the type of TLS certificates that you are using after initial configuration.

Steps

1. In the configuration values file, set the following entry to either true or false.

global:external:tls: {true | false}

54 Configuration Values

2. If tls: is true, also supply additional required information under global:external:tls:. The following sections describe how to configure the additional information for the different types of CAs.

Configure TLS using certificates from Let's Encrypt

The Let's Encrypt certificates are the easiest to use because certificate generation is automated within the platform.

Steps

1. In the configuration values file, configure the global section as follows:

global: external: # fqdn of this cluster, this has to be unique host: " .abc-lab.com" tls: true certManagerIssuerName: letsencrypt-production

2. Configure the cert-manager-resources section as follows:

cert-manager-resources: certManagerSecrets: - name: value:

clusterIssuer: name: letsencrypt-production server: https://acme-v02.api.letsencrypt.org/directory email: me@example.com acmeSecretKey: issuer-letsencrypt-dns-auth-secret solvers: - dns01: #see DNS section#

cert-manager: webhook: enabled: false

Self-signed certificates

About this task

You must be further along in the installation process before you can complete all the recommended steps for using self-signed certificates. You can optionally look ahead for the relevant instructions:

Steps

1. See Prepare self-signed SSL certificate on page 69 for steps to create the certificates, add them to the configuration values file, and push them into the registry with the other SDP images.

2. See (Optional) Validate self-signed certificates on page 73 for steps to validate that the certificates are working after installation, and remove the certificate from the registry.

Configuration Values 55

Configure TLS using signed certificates from a Certificate Authority

This procedure obtains a signed certificate from a CA and configures SDP to use the certificate.

About this task

The SDP installer creates certificate signing requests (CSRs). Dell provides a tool to extract the CSRs from the cluster and then later, to import the signed certificates into the cluster.

This task describes the following process:

1. Starts the installer so it can create the CSRs. 2. Stops the installer so you can extract the CSRs. 3. Instructs you to submit the CSRs to the CA and wait for the signed certificates. 4. Imports the signed certificates into the SDP cluster. 5. Restarts the installer.

Steps

1. Prepare the configuration values file as follows:

global: external: tls: true certManagerIssuerName: cli wildcardSecretName: cluster-wildcard-tls-secret certManagerIssuerGroup: nautilus.dellemc.com cert-manager-resources: clusterIssuer: name: cli cert-manager: webhook: enabled: false ingressShim: defaultIssuerName: cli defaultIssuerKind: ClusterIssuer extraArgs: ['--dns01-recursive-nameservers-only=true','--feature- gates=CertificateRequestControllers=true']

2. Download the cli-issuer- .zip from the Dell Support Site.

Extract the cli-issuer- .zip archive and navigate into the expanded directory. There are three binary executables for different platforms, named cli-issuer- . For convenience, create a symlink or rename the appropriate executable to cli-issuer.

3. Start the installation using the decks-install apply command as described in the Install SDP on page 72.

4. In another window, monitor for CSR generation.

Enter the following command. The watch command before the cli-issuer command monitors every two seconds.

In the output, you are looking for messages that state Certificate signing request (CSR) is available.

watch ./cli-issuer list -A --cluster --insecure-skip-tls-verify

NAMESPACE NAME SUBJECT STATUS MESSAGE nautilus-pravega pravega-native-tls-certificate-763176170 *.pravega.cluster1.desdp.dell.com Pending Certificate signing request (CSR) is available nautilus-system wildcard-tls-certificate-80022116 *.cluster1.desdp.dell.com

You need a CSR for each of the SDP namespaces: nautilus-pravega and nautilus-system.

5. When the two CSRs are available, return to the install window and stop the installation by using CTRL-C.

56 Configuration Values

6. Use the cli-issuer export command to export the two CSRs from the cluster.

./cli-issuer export -n -f

Where and are from the output of the cli-issuer list command. For example:

$ ./cli-issuer export pravega-native-tls-certificate-763176170 -n nautilus-pravega -f pravega-native.csr $ ./cli-issuer export wildcard-tls-certificate-80022116 -n nautilus-system -f wildcard.csr

7. Submit or upload the two CSR files to your selected well-known CA or follow internal procedures for an enterprise CA.

8. When you receive the signed certificates from the CA, save them locally. The files include:

Signed Certificates (.pem) filesThere is a file for each CSR that you submitted. You should have a pravega file and a wildcard file.

The root certificateThe root is the end of the chain of certificates on the customer side.

9. Validate the certificates. The filename should match the certificate CN.

$ openssl x509 -in pravega-native.pem -noout -text | grep CN # the CN in the output should match the pem filename

$ openssl x509 -in wildcard-sabretooth.pem -noout -text | grep CN # the CN in the output should match the pem filename

10. Use the cli-issuer tool to import the signed certificates into the cluster.

$ cli-issuer import -A -f --ca -n

Where: path/to/cert is where the certificate issued by the well-known CA or internal CA is saved on your desktop.

path/to/ca-cert is where the ca-cert is saved on your desktop. The /path/to/cert and /path/to/ca-cert are typically the same value because the ca-cert is typically bundled with the issued certificate.

namespace is the namespace that is listed in the output from the cli-issuer list command. (See step 4.)

For example:

$ ./cli-issuer import -A -f pravega-native.pem --ca ../certs/lab.cacert.pem -n nautilus-pravega Imported a certificate for resource "nautilus-pravega/pravega-native-tls- certificate-763176170"\n $ ./cli-issuer import -A -f wildcard.pem --ca ../certs/lab.cacert.pem -n nautilus- system Imported a certificate for resource "nautilus-system/wildcard-tls- certificate-80022116"\n

11. Validate that certificates are successfully imported.

$ ./cli-issuer list -A NAMESPACE NAME SUBJECT STATUS MESSAGE nautilus-pravega pravega-native-tls-certificate-763176170 *.pravega.cluster1.desdp.dell.com Issued Certificate fetched from issuer successfully nautilus-system wildcard-tls-certificate-80022116 *.cluster1.desdp.dell.com Issued Certificate fetched from issuer successfully

12. Resume the install using the same command that you used to start the install.

Next steps

The signed certificates are imported into the corresponding TLS secrets in SDP.

Configuration Values 57

Configure connections to a local DNS Configure connections to the local DNS server.

This configuration is required.

You should have a local DNS server that is already set up. For information about the local DNS server and the various options for setup, see Set up local DNS server on page 49 in the Site Prerequisites chapter.

The following examples show configuration settings for three types of local DNS server. Copy one of the following external- dns: section examples as appropriate for your setup and supply the required values.

AWS Route53 option external-dns:

aws: credentials: secretKey: " " accessKey: " "

CoreDNS option external-dns: provider: coredns coredns: etcdEndpoints: "http://10.243.NN.NNN:2379" extraArgs: ['--source=ingress','--source=service','-- provider=coredns','--log-level=debug'] rbac: # create & use RBAC resources create: true apiVersion: v1 # Registry to use for ownership (txt or noop) registry: "txt" # When using the TXT registry, a name that identifies this instance of ExternalDNS txtOwnerId: " . " ## Modify how DNS records are sychronized between sources and providers (options: sync, upsert-only ) policy: sync domainFilters: [ . ] logLevel: debug

Bind option external-dns: provider: rfc2136 rfc2136: host: "10.243.NN.NNN" port: 53 zone: "nautilus-lab-ns.lss.emc.com" tsigSecret: "ooDG+GsRmsrryL5g9eyl4g==" tsigSecretAlg: hmac-md5 tsigKeyname: externaldns-key tsigAxfr: true rbac: # create & use RBAC resources create: true apiVersion: v1 # Registry to use for ownership (txt or noop) registry: "txt" # When using the TXT registry, a name that identifies this instance of ExternalDNS txtOwnerId: " . " ## Modify how DNS records are sychronized between sources and providers (options: sync, upsert-only ) policy: sync domainFilters: [ . ] logLevel: debug

58 Configuration Values

Configure long-term storage on PowerScale Configure the connection to a PowerScale cluster for long-term storage for SDP.

This configuration is required if you are using a PowerScale cluster for long-term storage. You can configure only one source for long-term storage, either PowerScale or ECS.

You should already have the file system configured on the PowerScale cluster. For reference, see Provision long-term storage on PowerScale on page 49 in the Site Prerequisites chapter.

Make sure that the global: StorageType: value is set to nfs.

global: StorageType: nfs

Then copy the nfs-client-provisioner: section from the template, or start with the following example:

nfs-client-provisioner: nfs: server: 1.2.3.4 path: /data/path mountOptions: - nfsvers=4.0 - sec=sys - nolock storageClass: archiveOnDelete: "false"

Table 11. Configure NFS storage

Name Description

nfs.server The NFS server hostname or address. This is the PowerScale cluster IP address.

nfs.path The NFS export path.

nfs.mountOptions The NFS mount options (in fstab format).

storageClass.archiveOnDelete Indicates how to handle existing data in the following circumstances: If SDP is uninstalled, whether to delete all of SDP data including stream data If a project is deleted, whether to delete all the project data including stream

data .

Values are:

false does not save any data.

true archives the data. However, this archive is not readable in a new installation of SDP or in a new project.

The default is true.

Configure long-term storage on ECS This configuration is required if you are using an ECS appliance for long-term storage. It configures the connection to an ECS S3 appliance and the bucket plans for project-specific buckets.

Prerequisites

You can configure only one source for long-term storage, either PowerScale or ECS. You should already have the namespace and one S3 bucket (the Pravega bucket) configured on the ECS appliance. For

reference, see Provision long-term storage on ECS on page 49 in the Site Prerequisites chapter.

About this task

The ECS namespace contains the following S3 buckets:

Configuration Values 59

Pravega bucketAs mentioned above, this bucket is preprovisioned before SDP installation. Pravega connects to this bucket on startup using credentials that you configure in this task. The Pravega segmentstore component uses this bucket.

Project-specific bucketsWhen a user creates a project, SDP provisions a project-specific bucket. The project streams are stored in its project bucket. Each project bucket has unique credentials that are autogenerated for it. The ECS Broker performs the provisioning. The ECS Broker gains access to ECS based on connection information that you configure in this task.

Other supporting bucketsSDP provisions additional supporting buckets as needed. For example, it provisions a registry bucket to help manage the project buckets.

SSL certificates might be required:

If ECS uses standard CAs for connection to both its management port and its object API port, certificates are not required. If either the management or the object API endpoints require custom trusts (self-signed certificates), you must provide the

certificates in the configuration values file. The steps to do so are in this task.

In the ecs_service_broker: section of the configuration values file, you configure attributes of the project-specific buckets.

bucket plans When users create a project, they select a bucket plan for the project-specific bucket that the broker provisions. A bucket plan defines policies for managing the bucket, such as size, quotas, warning settings, and access type.

Bucket plans are optional because there is a default bucket plan that is defined internally in the product. You can redefine the default plan and define additional bucket plans in the ecs-service-broker: section of the configuration values file.

bucket reclaim policy

A reclaim policy is the action that the ECS Broker takes on a project bucket when the project is deleted. Reclaim policy is set per bucket plan. The available reclaim policies are: Detach(The default if you do not override with another value.) The broker leaves the project

bucket and all data intact in ECS but removes the bucket from SDP. DeleteWipes data from the bucket and deletes the bucket from ECS and SDP.

CAUTION: The Delete policy is dangerous for data safety. Consider using Fail, which

only deletes empty buckets.

FailThe broker attempts to delete the ECS bucket but the operation fails if the bucket contains data.

default reclaim policy

The default reclaim policy for all buckets is Detach. You may override that default in the configuration values file by adding the following setting: ecs-service-broker.DefaultReclaimPolicy: .

allowed reclaim policy

When users create projects using the command line, they can specify extra parameters, one of which is reclaim-policy. This reclaim-policy would override the reclaim policy for the bucket plan as defined (or defaulted) in the bucket plan configuration. The AllowedReclaimPolicies setting in ecs_service_broker configuration limits the reclaim policies that users are permitted to specify on the command line. For example, you can ensure that the Delete reclaim policy is not allowed for any project.

Some important points about configuring plans:

SDP comes preconfigured with a default plan. You may change the definition of the default plan by configuring a plan using the name default.

You may configure additional plans. The plan names that you configure appear in a drop-down menu on the UI when users create a project.

You cannot change plan definitions after installation. You cannot add or delete plan definitions after installation.

Use the following steps to configure ECS connections, the ECS broker, S3 bucket plans, and bucket reclaim policies.

Steps

1. Set the global.StorageType: value to ecs_s3.

global: StorageType: ecs_s3

2. Configure the pravega.ecs_s3: section.

60 Configuration Values

Pravega connects to the preconfigured ECS namespace and bucket that you describe in this section. Copy the section from the template, or start with the following example:

pravega: ecs_s3: uri: https://192.0.5.1:9021/ bucket: pravega-tier2 namespace: "sdp-pravega" prefix: "/" accessKey: green secretKey: XXXX

Table 12. Configure pravega.ecs_s3

Name Description

uri The ECS Object API endpoint, in the following format :

Typical port values are 9020 for HTTP endpoints and 9021 for HTTPS endpoints.

bucket The bucket name that was previously provisioned on ECS for this SDP installation instance.

namespace The ECS namespace that was previously provisioned on ECS for this SDP installation instance.

prefix A prefix to use on the Pravega bucket name.

accessKey secretKey

The access key and secret that were previously provisioned on ECS for the namespace.

NOTE: Pravega uses these credentials to gain access. However, each project has its own unique bucket. Unique system-generated credentials protect those buckets.

3. Configure the ecs-service-broker: section.

The ECS Service Broker connects to ECS using the information configured in this section. This section also configures S3 bucket plans. Copy the ecs-service-broker: section from the template, or start with the following example.

NOTE: This example overrides the default bucket plan and defines two additional plans. Plan definitions are optional.

See the table for more information.

ecs-service-broker: namespace: mysdp prefix: green- replication-group: RG api: endpoint: "http://192.0.5.1:9020" ecsConnection: endpoint: "https://192.0.5.1:4443" username: mysdp-green password: ChangeMe # certificate required only for self-signed certs certificate: |- -----BEGIN CERTIFICATE----- MIIDCTCCAfGgAwIBAgIJAJ1g36y+tM0RMA0GCSqGSIb3DQEBCwUAMBQxEjAQBgNV BAMTCWxvY2FsaG9zdDAeFw0yMDAyMTkxOTMzMjVaFw0zMDAyMTYxOTMzMjVaMBQx EjAQBgNVBAMTCWxvY2FsaG9zdDCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoC ... -----END CERTIFICATE----- s3Plans: # Unique UUID is required - id: 7e777d49-0a78-4cf4-810a-b5f5173b019d name: default settings: quota:

Configuration Values 61

limit: 10 warn: 7 - id: 9e777d49-0a78-4cf4-810a-b5f5173b019d name: small settings: quota: limit: 5 warn: 4 allowed-reclaim-policies: Fail reclaim-policy: Fail - id: 10777d49-0a78-4cf4-810a-b5f5173b019d name: noQuota settings: allowed-reclaim-policies: Delete, Fail, Detach

Table 13. Configure the ECS Service Broker

Name Description

namespace The namespace on ECS that was provisioned for SDP.

prefix Optional prefix to use when naming project buckets that the ecs-broker provisions.

The broker names project buckets using this format:

-

The is system-generated. NOTE: The broker inserts a dash between the and the

.

For example, if prefix is sdp/, the bucket name for project videoproject2 would look similar to:

sdp/ videoproject2-4a7f2baf-4226-4bfc-8b7a-57609ce450b6

replication-group The ECS Replication Group name that the namespace was assigned to during provisioning.

api:endpoint The ECS API endpoint.

ecsConnection.endpoint: The ECS Management endpoint. The port is typically 4443 for HTTPS.

ecsConnection.username Admin credentials into ECS. This user account must have permission to create buckets in the namespace. These credentials are converted into Kubernetes secrets before they are passed to the ECS Broker.ecsConnection.password:

ecsConnection.certificate Required if the ECS management endpoint requires a custom self-signed certificate. Obtain the certificate and add it here.

NOTE: For trusted CAs, no configuration is required.

One way to obtain a certificate is on a browser.

a. Enter the ECS management endpoint in the browser URL field, click the lock icon, click View Certificate, and then the Details tab.

b. Click Copy to File, Next, and then choose to export in Base-64 format. c. Copy the contents of the downloaded certification into this field. Ensure

to preserve all indents as they exist in the exported certificate.

s3Plans: Configures the S3 bucket plans that are available to users when they create a project.

This section is optional. Comment out the s3Plans: section if you want to use the default plan that comes with the product for all buckets. The default plan has no quota and only allows the DefaultReclaimPolicy.

62 Configuration Values

Table 13. Configure the ECS Service Broker (continued)

Name Description

To redefine the default plan, include a plan with the name default in this section.

s3Plans: - id:

Unique id for the plan.

The unique ID must stay constant. UUIDs work well for this purpose. For an easy way to generate a unique id, see https://www.uuidgenerator.net/. V1 or V4 is acceptable.

s3Plans: - id: name:

Name for the plan. This name appears in the Plan drop-down list on the UI when a user creates a project.

s3Plans: - id: settings.quota.limit:

Quotas are optional. Without a quota, there is no limit on the bucket size.

The quota information is sent to ECS and used by ECS to configure the bucket. ECS enforces the bucket quota.

This value sets the hard limit on the number of GB in the bucket. Specify the number of GB. For example, the value 5 sets a hard limit of 5 GB on each bucket that uses this plan.

s3Plans: - id: settings.quota.warn:

Optional. This value sets a soft limit on the number of GB in the bucket. When the bucket reaches this size, ECS starts generating warning messages in the logs.

For example, if warn is set to 4 and limit is set to 5, ECS generates warning messages when a bucket reaches 4 GB in size and enforces the hard limit at 5GB.

s3Plans: - id: settings: allowed-reclaim- policies:

Optional. Defines the reclaim policies that users are allowed to specify when they create a project on the command line.

The default is to allow users to specify any of the reclaim policies. A typical setting is to allow only Detach and Fail, disallowing the use of Delete. See the introduction to this task for more about reclaim policies.

s3Plans: - id: settings: reclaim-policy:

Optional. Sets the default reclaim policy for the plan. If not provided, the platform-wide default reclaim-policy applies to the plan. See the introduction to this task for more about the default reclaim-policy.

4. Configure the pravega-cluster section.

The installer passes these settings to the Pravega cluster. The settings tune the cluster appropriately for interaction with ECS long-term storage. Most pravega-cluster settings default to pretested values. Depending on your use case, you might want to add the pravega_options shown here.

pravega-cluster: pravega_options: writer.flushThresholdMillis: "60000" extendeds3.smallObjectSizeLimitForConcat: "104857600" writer.maxRolloverSizeBytes: "6442450944"

5. If the ECS object API endpoint requires a self-signed certificate, obtain the certificate and add it into the values file.

NOTE: This certificate goes into the global section of the configuration values file because several platform

components require it.

Configuration Values 63

a. To export the certificate from a browser, enter the ECS object API endpoint in the browser, click the lock icon, click View Certificate, and then the Details tab.

b. Click Copy to File, Next, and then choose to export in Base-64 format. c. Copy the contents of the downloaded certification into the global.tlsCABundle.ecsObject: field. Ensure to

preserve all indents as they exist in the exported certificate. See Configure global platform settings on page 52 for an example.

Configure or remove connection to the SRS Gateway Most production deployments depend on an SRS Gateway for support from Dell and for telemetry collections. A dark site in production or deployments for testing purposes may not have an SRS Gateway.

This configuration is required. Do one of the following:

Configure SRS Gateway connection information, or Remove SRS Gateway deployment from the configuration

Both tasks are described below.

Configure SRS Gateway connection

For reference, see Deploy SRS Gateway on page 48 in the Site Prerequisites chapter.

Copy the srs-gateway: section from the template, or copy the following example:

srs-gateway: gateway: hostname: port: <9443> login: : product: STREAMINGDATA

Table 14. Configure SRS

Name Description

hostname: The fully qualified domain name or IP address of your SRS Gateway.

port: The value must be 9443.

login: Your dell.support.com account credentials.

product: The value must be STREAMINGDATA.

Remove SRS Gateway deployment from the configuration

Insert the global.external.darksite: true value into the configuration values file.

For example:

global: external: darksite: true

Enable periodic telemetry upload to SRS By default, the SDP deployment does not upload any information to SRS. You need to enable this feature.

About this task

This section describes how to enable telemetry uploads.

Read the following E-EULA before proceeding with the changes.

TELEMETRY SOFTWARE NOTICE

If you are acting on behalf of a U.S. Federal Government agency or if Customer has an express written agreement in place stating that no remote support shall be performed for this machine, please stop attempting to enable the Software and contact your sales account representative.

64 Configuration Values

By continuing to install this Software, you acknowledge that you understand the information stated below and accept it.

Privacy

Dell, Inc and its group of companies may collect, use and share information, including limited personal information from our customers in connection with the deployment of this telemetry software ("Software"). We will collect limited personal data when you register the Software and provide us with your contact details such as name, contact details and the company you work for. For more information on how we use your personal information, including how to exercise your data subject rights, please refer to our Dell Privacy Statement which is available online at www.dell.com/learn/policies-privacy- country-specific-privacy-policy.

Telemetry Software

This Software gathers system information related to this machine, such as diagnostics, configurations, usage characteristics, performance, and deployment location (collectively, "System Data"), and it manages the remote access and the exchange of the System Data with Dell Inc. or its applicable subsidiaries (together, "Dell"). By using the Software, Customer consents to Dell's connection to and remote access of the machine and acknowledges that Dell will use the System Data transmitted to Dell via the Software as follows ("Permitted Purposes"):

remotely access the machine and Software to install, maintain, monitor, remotely support, receive alerts and notifications from, and change certain internal system parameters of this machine and the Customer's environment, in fulfillment of applicable warranty and support obligations;

provide Customer with visibility to its actual usage and consumption patterns of the machine;

utilize the System Data in connection with predictive analytics and usage intelligence to consult with and assist Customer, directly or through a reseller, to optimize Customer's future planning activities and requirements; and

"anonymize" (i.e., remove any reference to a specific Customer) and aggregate System Data with that from machines of other Customers and use such data to develop and improve products.

Customer may disable the Software at any time, in which case all the above activities will stop. Customer acknowledges that this will limit Dell's ability and obligations (if any) to support the machine.

The Software does not enable Dell or their service personnel to access, view, process, copy, modify, or handle Customer's business data stored on or in this machine. System Data does not include personally identifiable data relating to any individuals.

Steps

1. Read the E-EULA above before proceeding with the changes.

2. Add the following section to the values.yaml file:

srs-gateway: configUpload: disable: false

Results

By installing with disable set to false, the SDP cluster contains a cronjob that runs every 12 hours and uploads configuration information to SRS.

Configure default admin password Decide how to assign passwords for the default administrator accounts. You can explicitly configure the passwords, or you can allow the system to generate passwords.

The two default admin accounts are:

Type User Name Description

Keycloak nautilus realm administrator

desdp This user is authorized to create analytic projects. This user has wildcard access to all projects and all their associated resources in SDP. Access to those resources is granted for both the Kubernetes cluster and the SDP UI for this user.

Configuration Values 65

Type User Name Description

This user is Admin in the nautilus realm. The nautilus realm is where SDP users and service accounts exist.

This user is not permitted to log in to the Keycloak Administrator Console for the master realm.

Keycloak master realm administrator

admin This user is authorized to log in to the Keycloak Administration Console in either the master or nautilus realm, and create users in Keycloak.

Your options are:

You can allow the installer to autogenerate passwords. After installation, see Obtain default admin credentials on page 77 to retrieve the generated values.

You can provide the initial password values by adding the keycloak.keycloak section into the configuration values file, as described here.

To provide specific password values, copy the keycloak: section from the template, or copy the following example:

keycloak: keycloak: password: DESDPPassword:

Table 15. Provide initial password values

Name Description

password: "" Add the password value enclosed in double quotes.

DESDPPassword: "" Add the password value enclosed in double quotes.

66 Configuration Values

Install SDP Core

Topics:

Download installation files Install required infrastructure (RHEL and CoreOS) Unzip installation files Prepare the working environment Push images into the registry Run the prereqs.sh script Prepare self-signed SSL certificate Run pre-install script Run the validate-values script Install SDP Run the post-install script (Optional) Validate self-signed certificates Obtain connection URLs

Download installation files This procedure includes links for downloading the Red Hat CoreOS and the Dell EMC SDP installation files.

Prerequisites

You need 16 GB free disk space to download these files. You need a valid Dell Technologies support account linked to your customer site. You need a Red Hat OpenShift account with valid credentials.

Steps

1. Go to https://www.dell.com/support/home/en-us/product-support/product/streaming-data-platform/drivers.

2. Log in with your Dell support account.

3. Navigate to 1.2 > 1.2 Core.

4. Download all files in the list.

5. Use the following links to download the required Red Hat Core OS files.

Openshift installer and client download link:

https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.6.8/

Openshift ISO download link:

https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.6/4.6.8/

Install required infrastructure (RHEL and CoreOS) SDP Core requires RedHat Enterprise Linux (RHEL) and RedHat OpenShift (Core OS) installed on bare metal hardware.

Steps

1. Read an installation overview at https://docs.openshift.com/container-platform/4.6/architecture/architecture- installation.html.

8

Install SDP Core 67

2. Perform the installation using instructions at https://docs.openshift.com/container-platform/4.6/installing/ installing_bare_metal/installing-bare-metal.html.

Unzip installation files

Steps

1. Move all SDP files that were downloaded from the Dell Support site to one of the nodes.

This node will become the management node.

2. Unzip the decks-installer- .zip file into a separate folder.

3. Create a link to the executable and set permissions.

For example, on Linux::

$ link decks-install-linux-amd64 decks-install $ chmod +x decks-install

On Windows, create an alias.

4. Unzip the nautilus-baremetal-master file into a separate folder.

Prepare the working environment Your working environment requires some software tools and interfaces before you can begin the installation steps. The working environment is the command-line environment that you intend to use for the installation. It could be your laptop or workstation, one of the SDP nodes, or a management node.

Steps

1. Install a local Docker daemon and CLI.

See Docker documentation at https://docs.docker.com/install/ to get the correct daemon for your local operating system.

2. Install a modern desktop browser such as Chrome, Firefox, or Edge for accessing the SDP UI.

3. Install PuTTY or other software that provides a way to connect to the intended SDP host machine and establish a Linux shell.

Push images into the registry This procedure uploads the installation images into the docker registry.

About this task

The last step in this procedure (uploading images to the registry) can take some time (up to an hour) to complete.

Steps

1. Create a repository in Docker to hold SDP images.

2. Navigate to the location where you extracted and created a link to the SDP installer tool.

3. Configure the installer to use the Docker registry and the repository you just created:

$ ./decks-install config set registry

where identifies the registry server and the repository you created for SDP images.

4. Verify the configured registry path:

$ ./decks-install config list

68 Install SDP Core

5. Push required images to the registry.

$ ./decks-install push --input [--ca-certs-dir ]

where:

is the path to the decks-images- .tar file included in the original set of installation files. This is a separate file, not part of the zip file that was extracted previously.

NOTE: Do not extract the contents from decks-images- .tar. The installer tool works directly with

the .tar file.

The optional --ca-certs-dir option injects a custom certificate into each image. Use this option if your company requires a certificate bundle for security purposes within your internal network.

This push operation may take up to an hour to complete.

Run the prereqs.sh script The prereqs.sh script ensures that your local environment and the Kubernetes cluster have all the tools that are required for a successful installation.

About this task

The decks-install apply command runs this script automatically. Regardless, Dell recommends that you run this script before running the decks-install apply command the first time or the first time on a new local machine. You may run the script at any time.

The script does the following types of checks:

It checks your local environment for the required tools and required minimum versions of those tools. For some tools, the script attempts to install the missing software. The script checks the SDP cluster for a default storage class definition. It generates messages describing what is missing.

Steps

1. Go to the folder where you extracted the decks-installer- .zip file.

2. Run the script using either of the following commands: This command saves the output to a log for later use:

./scripts/prereqs.sh 2&>1 prereqs.log This command prints the output on the command line:

$ ./scripts/prereqs.sh

3. Check the script output.

If the output contains errors about incorrect minimum versions of components or missing software, you must correct the condition before proceeding with SDP installation.

Prepare self-signed SSL certificate This procedure describes how to create SSL self-signed certificates for TLS connections and add them to SDP.

About this task

This procedure generates the key and self-signed certificate, adds them into the configuration values file, and loads them into your SDP registry.

Steps

1. Generate a private key.

Install SDP Core 69

Here is an example using openssl.

openssl genrsa -out tls.key 2048

2. Create a certificate.

The following example uses openssl to create a certificate:

openssl req -nodes -new -x509 -keyout tls.key -subj "/CN= " -out tls.crt

3. In the configuration values file, configure the relevant fields as shown in the example below.

Add the key information from above into the tls.key entry.

Add the certificate information from above into the tls.crt entry.

global: external: tls: true certManagerIssuerName: selfsigned-ca-issuer cert-manager-resources: clusterIssuer: name: selfsigned-ca-issuer certManagerSecrets: - name: tls.crt value: | -----BEGIN CERTIFICATE----- -----END CERTIFICATE----- - name: tls.key value: | -----BEGIN PRIVATE KEY----- -----END PRIVATE KEY----- cert-manager: webhook: enabled: false ingressShim: defaultIssuerName: selfsigned-ca-issuer defaultIssuerKind: ClusterIssuer extraArgs: ['--dns01-recursive-nameservers-only=true','--feature- gates=CertificateRequestControllers=true']

4. Copy the crt file to a certificates directory and rename the file to ca.crt.

NOTE: The filename must be ca.crt. Otherwise, the installation does not work.

$ cp tls.crt mycerts/ca.crt

5. Push the ca.crt to the truststore in the SDP images in the Docker registry.

./decks-install push --input ../decks-images-xxxx.tar --ca-certs-dir ../mycerts/

Run pre-install script The pre-install.sh script must be run one time before installing SDP.

About this task

This script creates credentials required for the internal communication of SDP components. It creates a values.yaml file containing these credentials. This yaml file is a required input to every execution of the decks-install apply command. The generated yaml file must be listed as one of the values files in the --values parameter of decks-install apply.

70 Install SDP Core

Steps

1. Navigate to the folder where you unzipped the decks-installer- .zip file.

2. Run the pre-install.sh script.

$ ./scripts/pre-install.sh

3. Verify that the script ran successfully and that the values.yaml file exists.

The output shows the pathname of the generated values.yaml file. It exists in a directory called pre-install. For example, scripts/pre-install/values.yaml.

The output also shows a passwd file which you may safely delete.

The output initially shows results from Pravega indicating user names and passwords that will look unfamiliar. You can ignore this output. This script is, in essence, replacing hardcoded Pravega credentials with secure credentials.

4. Consider renaming the generated values.yaml file and moving it to the same location where all the other configuration values files are stored.

For example, rename values.yaml to preinstall.yaml.

Results

The generated yaml file must be listed as one of the values files in the --values parameter of decks-install apply, along with all of your other configuration values files.

Run the validate-values script The validate-values.py script reads the configuration values files provided to it and validates the values over certain criteria. The script validates the values used for external connectivity and serviceability, in addition to many other validations.

About this task

The decks-install apply command runs this script automatically. We recommend that you run it independently before running the decks-install apply command, so you have an opportunity to resolve any errors and correct any warnings prior to installation. Warnings found by the installer will not stop the installer from continuing. You should review the output of the validate-values script at least once prior to running the installation.

You may run this script at any time.

Steps

1. Navigate to the folder where you unzipped the decks-installer- .zip file.

2. Run the script, providing file path names for all of the configuration values files that you plan to use in the actual installation command.

For example:

$ ./scripts/validate-values.py preinstall.yaml,values.yaml

Note the following: Separate the file path names with commas, no spaces. The yaml file generated by the pre-install script is required. The files are processed in the order listed. When the same field is included in more than one of the values files, the value

in the later file overrides the value in any earlier files in the list.

3. If the script indicates errors or warnings in your configuration values files, edit the files to correct the problems and rerun the script.

Install SDP Core 71

Install SDP Install SDP into the OpenShift environment.

Prerequisites

This procedure assumes that you prepared configuration values as described in Configuration Values on page 51. One or more values.yaml files are required by the SDP installer.

Steps

1. Save your customer-specific permanent SDP license file in the ~/desdp/ directory. The file name must be license.xml.

cp ~/desdp/license.xml

See Obtain and save the license file on page 48 for information about obtaining your SDP license file.

2. Run the SDP installation command.

cd ~/desdp ./decks-install-linux-amd64 apply --kustomize ./manifests/ --repo ./charts/ \ --values

Where the <list of values files> includes:

The pre-install.yaml file that was generated by the pre-install script during initial installation.

Other values files that you have prepared according to instructions in Configuration Values on page 51. Separate the values file path names with commas and no spaces.

The files are processed in the order listed. When the same field is included in more than one of the values files, the value in the later file overrides the value in any earlier files in the list.

For example:

cd ~/desdp ./decks-install-linux-amd64 apply --kustomize ./manifests/ --repo ./charts/ \ --values pre-install.yaml,values.yaml

The Apply Update screen appears, and continuously redisplays to show progress. The installation takes about 10 to 30 minutes to complete.

When the command is finished, the Apply Update screen stops refreshes and shows final state for all components.

3. Set permissions in the OpenShift cluster to allow the SRS config upload job to run.

NOTE: For information about optional SRS uploads and how to enable them, see Enable periodic telemetry upload to

SRS on page 64.

oc adm policy add-scc-to-user anyuid -z streamingdata-config-upload -n nautilus-system

Run the post-install script Run this script after you run the decks-install apply command.

About this task

This script confirms that your latest run of the decks-install apply command left the cluster in a healthy state. This script invokes the health check script.

Steps

1. Wait for all pods to report a status of Completed or Succeeded.

72 Install SDP Core

It may take some time (up to 10 minutes) for the installation and synchronization of components to complete. If you proceed before the system settles into a stable state, the post-install script is likely to generate false errors. False errors disappear if you wait for the system to synchronize.

2. Go to the folder where you extracted the decks-installer- .zip file.

3. Run the script.

$ ./scripts/post-install.sh

4. If the script indicates errors in your cluster, fix the issues, rerun decks-install apply, and then rerun this script.

(Optional) Validate self-signed certificates Use this procedure to validate that self-signed certificates are in correctly installed and ready to handle connection requests.

Steps

1. Validate that the certificates are ready.

$ kubectl get certificate -A NAMESPACE NAME READY SECRET AGE nautilus-pravega nautilus-pravega-grafana-tls True nautilus-pravega-grafana- tls 46h nautilus-pravega pravega-controller-api-tls True pravega-controller-api- tls 46h nautilus-pravega pravega-controller-tls True pravega-controller-tls 46h nautilus-pravega pravega-native-tls-certificate True pravega-tls 46h nautilus-pravega selfsigned-cert True selfsigned-cert-tls 46h nautilus-system keycloak-tls True keycloak-tls 46h nautilus-system nautilus-ui-tls True nautilus-ui-tls 46h

2. Get the certificate from the configuration values file.

$ kubectl get secret -n nautilus-system cert-manager-secrets -o jsonpath="{.data.tls\.crt}" | base64 -d -----BEGIN CERTIFICATE----- MIIDKTCCAhGgAwIBAgIUTBlCLINSVvL0zFUzngveXeKL2scwDQYJKoZIhvcNAQEL BQAwJDEiMCAGA1UEAwwZc2FicmV0b290aC5zYW1hZGFtcy5sb2NhbDAeFw0yMDA1 ... -----END CERTIFICATE-----

3. Check that you can connect to Keycloak.

a. Get the keycloak endpoint for use in other commands.

$ kubectl get ingress -n nautilus-system keycloak NAME HOSTS ADDRESS PORTS AGE keycloak keycloak.mycluster.com 10.243.42.132 80, 443 45h

b. Connect to Keycloak.

$ openssl s_client -showcerts -servername keycloak.mycluster.com -connect 192.2.0.7:443

Install SDP Core 73

c. Get the certificate from Keycloak.

$ kubectl get secret -n nautilus-system keycloak-tls -o jsonpath="{.data.tls\.crt}" | base64 -d -----BEGIN CERTIFICATE----- MIIDfzCCAmegAwIBAgIRAJxV4jFmB9HXULTTwrbwQZgwDQYJKoZIhvcNAQELBQAw JDEiMCAGA1UEAwwZc2FicmV0b290aC5zYW1hZGFtcy5sb2NhbDAeFw0yMDA1MTkx OTU3MDlaFw0yMDA4MTcxOTU3MDlaMFIxFTATBgNVBAoTDGNlcnQtbWFuYWdlcjE5 MDcGA1UEAxMwa2V5Y2xvYWsuc2FicmV0b290aC5zYW1hZGFtcy5zZHAuaG9wLmxh Yi5lbWMuY29tMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAqthOws62 6rkRO6P1MMB9SCZgJGU3o0wtW/I/HpPlIbqrRkJsbTBG4a0MjpiJmFlUOM5neDyN qzcWPy2r9alYX7SS1cv3oBufHTRTpTtZVJ4RXQvBPtfo9+x0VgxrFkwhhhia0hgw ZLHSQXhrxBh2fD5vTmYL9y0E28mm9Rt1dnhawa07Vr0ajdQLJ0stFi8Q0C4I3x7B GlYOzBL8u4XzvHzGERXLdbO/RLrRPQ24WpYNtrfsrKtC4Zz3nhSMVPdq7rWwJ7OL mXGF0bufsSrXdg0jhM+ns0MvUPf25irG/imgqbWa5uswW+6/3nTUejngZq9UbIwq Dz5riHdU9oIxRwIDAQABo34wfDAOBgNVHQ8BAf8EBAMCBaAwDAYDVR0TAQH/BAIw ADAfBgNVHSMEGDAWgBSkuPFCdT341Xhl6GU+WaGQAH4ZhzA7BgNVHREENDAygjBr ZXljbG9hay5zYWJyZXRvb3RoLnNhbWFkYW1zLnNkcC5ob3AubGFiLmVtYy5jb20w DQYJKoZIhvcNAQELBQADggEBADhdDefyjQJgqhXRAG3ITlWgHv0JCIWUdNkatKun unrSoJPwqRddCZeTr2o8BoMnvMZwNEoYqKVV/lYIhKnZKjqRwbOqcupTCP27Ejby U3DaiRa0aGWHp6tm9XWdDeZ0lzzbIURE27+GFFkd0m+0+iq1NLFUQsziZN72prIr zF1ygzb4cGVOglTh0Ma8nWO0VW4/opCks1fLLELFpoLPPPeHv8NpxGqGY2uj07KY ptV8OuaI3PIp7ELjWHZ7OZm/WuhkPK0YGvIWERtgHZLk7kkafXZH7ZtOabmtKroK OfYGOidSIzcFlKfgkySsa1f2PJjFFw5I7J7O/Iu9zhFcjao= -----END CERTIFICATE-----

d. Check if the certificate that is returned in the secret is the same as the certificate that Keycloak returns.

4. Update the root certificates in your browser.

Obtain connection URLs Cluster administrators can obtain the connection URLs using kubectl.

Steps

1. Log in as described in Log in to OpenShift for cluster-admins on page 91.

2. List access points into the cluster. Run kubectl get ingress --all-namespaces to list all access points into the cluster.

For example:

kubectl get ingress --all-namespaces NAMESPACE NAME HOSTS ADDRESS PORTS AGE my-project my-flinkcluster my-flinkcluster.my-project.test-psk.abc- lab.com 192.0.2.8... 80, 443 6d my-project repo repo.my-project.test-psk.abc-lab.com 192.0.2.8... 80, 443 6d nautilus-pravega nautilus-pravega-grafana grafana.test-psk.abc-lab.com 192.0.2.8... 80, 443 8d nautilus-pravega pravega-controller pravega-controller.test-psk.abc-lab.com 192.0.2.8... 80 8d nautilus-pravega pravega-controller-api pravega-controller-api.test-psk.abc- lab.com 192.0.2.8... 80 8d nautilus-system keycloak keycloak.test-psk.abc-lab.com 192.0.2.8... 80, 443 8d nautilus-system nautilus-ui test-psk.abc-lab.com 192.0.2.8... 80, 443 8d

All the values in the HOSTS column are valid access points for authorized users.

In the NAME column, locate nautilus-ui, and take note of the value in the HOSTS column. Values in the HOSTS column are URLs for external connections to the User Interface, and is the value to use in the configuration values file.

74 Install SDP Core

For example, from the list above, users can connect to the UI from external locations with the following URL:

https://test-psk.abc-lab.com

Install SDP Core 75

Manage SDP

Topics:

Post-install Configuration and Maintenance Manage Connections and Users Expand and Scale the Infrastructure Manage Projects Monitor Health Use Pravega Grafana Dashboards Troubleshooting

III

76 Manage SDP

Post-install Configuration and Maintenance

Topics:

Obtain default admin credentials Configure federated user accounts Add Pravega alerts to event collection Temporarily disable SRS connectivity or telemetry uploads Verify telemetry cron job Update the default password for SRS remote access Ensure system availability when a node is down Change applied configuration Graceful shutdown and startup Uninstall applications Reinstall into existing cluster Change ECS credentials after installation

Obtain default admin credentials The installation process creates two default administrator accounts.

About this task

The two accounts are:

Type User Name Description

Keycloak nautilus realm administrator

desdp This user is authorized to create analytic projects. This user has wildcard access to all scopes and projects and all their associated resources in SDP. A

This user is Admin in the nautilus realm. The nautilus realm is where SDP users and service accounts exist.

This user is not permitted to log in to the Keycloak Administrator Console for the master realm.

Keycloak master realm administrator

admin This user is authorized to log in to the Keycloak Administration Console in either the master or nautilus realm, and create users in Keycloak.

In the configuration values file, either password values were configured for the above accounts, or they were not.

If password values were specified in the configuration values file, use those values. You can skip steps 2 and 3 in the procedure below. The remaining steps are still important.

If password values were not specified in the configuration values file, the installer automatically generated passwords and inserted the values into secrets. This procedure describes how to obtain the secrets and extract the passwords.

Steps

1. Obtain the autogenerated password for the desdp user in the nautilus realm:

kubectl get secret keycloak-desdp -n nautilus-system -o jsonpath='{.data.password}' | base64 --decode

9

Post-install Configuration and Maintenance 77

2. Obtain the autogenerated password for the admin user in the master realm:

kubectl get secret keycloak-http -n nautilus-system -o jsonpath='{.data.password}' | base64 --decode

3. Verify that you can log in to both the Keycloak Administrator Console and the SDP UI.

See Obtain connection URLs on page 74.

4. As a security precaution, discard the two secrets that contain the passwords. Do this only after you have verified that you can log in to both Keycloak realms.

The two K8 secrets that contain the admin and desdp user passwords are only created once at install time. Any modifications of the user accounts (such as changing their passwords, deleting them, or renaming them) and product upgrades do not update these secrets. They are only used as an initial means to retrieve the passwords for bootstrapping purposes.

5. (Optional) Change the passwords.

You may change the default passwords later in Keycloak. See Change password in Keycloak on page 93. You can use that change procedure for passwords that were system-generated or explicitly provided.

Configure federated user accounts Users can interact with SDP on two planesthe SDP UI and the OpenShift command line. Federated accounts provide a unified authentication experience for users who need access to both planes.

Without federation

Typically, test and development environments are acceptable without federation. Without federation, each plane uses its own authentication mechanism, with no coordination. The user accounts are known as local accounts.

The SDP UI uses Keycloak for authentication. The Kubernetes container service uses its native Kubernetes authentication. SDP administrators must maintain separate login accounts for the two planes. The accounts in Keycloak and the Kubernetes container are known as "local" accounts. Non-admin users (project members) are not allowed any access to the SDP cluster at the Kubernetes cluster level. They

cannot research or troubleshoot their project resources on the command line. They may, however, manage their project resources (Flink Clusters, applications, artifacts, and others) using the SDP UI.

With federation

Federation provides a unified authentication experience for users who need access to both the SDP UI and the OpenShift command line. With federation, users maintain one account.

Federation is strongly recommended in production environments. It provides the following advantages:

Federation enables use of the same credentials to authenticate on both the SDP UI and the OpenShift command line. Federation is the only way that non-admin users (project members) can gain access to the cluster on the Kubernetes

command line. Federation ensures secure and consistent RBAC configurations that are applied to the user account. Regardless of whether

a user is logged onto the SDP UI or the OpenShift command line, they can only access resources in projects for which they are members.

Openshift and Keycloak allow configuration with many identity providers. LDAP is the recommended and verified identity provider for SDP.

Configure an LDAP identity provider

Federation with an outside identity provider is recommended in production environments.

To configure OpenShift to use an LDAP identity provider, see https://docs.openshift.com/container-platform/4.6/ authentication/identity_providers/configuring-ldap-identity-provider.html.

78 Post-install Configuration and Maintenance

Configure Keycloak for LDAP federation

Keycloak can authenticate accounts in a third-party identity provider.

Steps

1. In a browser window, go to the Keycloak endpoint in the SDP cluster.

If the SDP UI is open, you can prepend keycloak. to the UI endpoint. For example, http:// keycloak.sdp.lab.myserver.com. Otherwise, see Obtain connection URLs on page 74.

2. Click the User federation tab.

3. Select Ldap.

4. Complete the configuration fields on the Ldap screen.

The form contains the usual LDAP settings which vary depending on vendor and environment. The following is an example.

5. Use the Test Configuration button in the form above to validate the connection and credentials of the bind account.

6. Decide how you prefer to get the user accounts into Keycloak.

User accounts must exist in Keycloak before the user can be added to projects as a project member. There are two ways to get accounts into Keycloak. Choose either one.

Choice Description

Click synchronize all users on the Ldap screen, and Save the configuration on the Ldap screen.

This action loads all the available users from LDAP into Keycloak. If you do not want or need all LDAP users in the SDP Keycloak, choose the next option.

NOTE: Even though the accounts are loaded into Keycloak, the authentication

and password maintenance is always delegated through LDAP.

Save the configuration on the Ldap screen, and then add individual users into Keycloak.

Ask individual users to log into the SDP UI with their LDAP accounts.

The user is authenticated, but does not have any authorizations. Instead of seeing the SDP dashboard, they see a welcome screen asking them to talk to an Administrator.

In the background, SDP adds the user account into Keycloak.

7. Make users members of projects. See Add or remove project members on page 108. Those users can access the project resources on the SDP UI and on the OpenShift command line. The command line login is:

oc login -u

Post-install Configuration and Maintenance 79

Add Pravega alerts to event collection The Grafana Pravega Alerts Dashboard generates alerts that are related to Pravega health. This procedure adds the Grafana- generated alerts to the System > Events tab in SDP. This procedure also optionally adds the Grafana-generated alerts to the data that SRS uploads to Dell.

Steps

1. Log in to the SDP User Interface as an admin user.

2. On the Dashboard page, click the Pravega metrics link. The link appears in the above the charts, on the upper left corner of the page.

The Grafana UI opens.

3. In the list that appears, click the Pravega Alerts Dashboard.

4. From the Pravega Alerts Dashboard, click the Save Dashboard icon in the Grafana banner.

5. On the Save Dialog that appears, click Save.

6. On the Confirmation dialog that appears, click Override.

NOTE: The Override selection is required.

7. Wait about 10 to 20 seconds, until all the heart icons on the panels in the dashboard change to a green color. The colors change from black to green.

8. To verify rules creation, click the Alerting icon (bell-shaped icon) in the left banner of the Grafana window, and choose Alert Rules.

A list of alert rules appears, similar to the following.

9. Verify communication with SRS components.

a. Click the Notification Channels tab. b. Ensure that kahm-notifier is listed.

The kahm-notifier passes alerts to the Kubernetes and Applications Health Monitoring service (KAHM). KAHM is the event collecting service for SRS.

Results

This procedure has the following results:

80 Post-install Configuration and Maintenance

1. The System > Events tab in SDP lists events. 2. If the SRS telemetry feature is enabled, the uploads to Dell include the Grafana alerts about Pravega health. For information

about enabling SRS telemetry, see Enable periodic telemetry upload to SRS on page 64.

Temporarily disable SRS connectivity or telemetry uploads The SDP UI provides a way to temporarily disable SRS connectivity to Dell EMC or disable telemetry upload. These actions can be convenient for maintenance activities.

To use this procedure, telemetry uploads must have been enabled during installation.

These UI actions are temporary. If the decks-install apply command is run for any reason to change any configurations, the settings in the values.yaml files override these actions on the SDP UI.

To view or change SRS status, log in to the SDP UI and go to Settings > SRS Gateway.

Figure 7. SRS Gateway information in the UI

Field Description

FQDN/IP The fully qualified domain name or IP of the SRS Gateway

Port The port that is configured for communication with the SRS Gateway

Instance SWID The Software ID of the SRS license

Product The product name that is licensed for connection with SRS

Registered Shows whether the Dell EMC backend systems have registered this SRS

Test Dial Home Results Shows the results of the last dial home test

Test Dial Home Time Shows the time of the last dial home test

Telemetry Upload Enables or disables telemetry uploads to the SRS Gateway. Click Telemetry Upload and then click the Enable or Disable action that appears. When you click Enable, a legal agreement appears. You must then click Accept Agreement to enable telemetry uploads.

Actions TestTest the dial home feature. Dial home connects the SRS Gateway at the customer site with the Dell EMC support systems and allows support teams to connect remotely.

DisableDisable connectivity to the SRS Gateway. The events continue to queue and are delivered when the feature is enabled.

EnableEnable SRS Gateway connectivity.

Post-install Configuration and Maintenance 81

Verify telemetry cron job When telemetry is enabled, the SDP cluster contains a cronjob that runs every 12 hours and uploads configuration information to SRS.

Steps

To verify the cronjob schedul, run the following command:

$ kubectl get cronjobs -n nautilus-system NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE monitoring */10 * * * * False 0 9m33s 19h streamingdata-config-upload * */12 * * * False 0 62m

Update the default password for SRS remote access Use this procedure to change the streaming-data service pod default password for Dell support services to use for remote access.

Steps

1. Log in, if needed. See Log in to OpenShift for cluster-admins on page 91.

2. Locate the streamingdata-remote-access service pod:

## kubectl get pod | grep streamingdata-remote-access streamingdata-remote-access-bd565d8ff-7fwfr 1/1 Running 0 3d2h

3. If the pod is running and configured, update the password:

## kubectl exec -it streamingdata-remote-access-bd565d8ff-7fwfr passwd root Changing password for root New password: Retype password: passwd: password for root changed by root

4. Store the password in a secure location.

Dell EMC support services will need this password to remote access into the SDP cluster to provide support.

Ensure system availability when a node is down An alert with an id of KUBE-CLUSTER-0005 or KUBE-CLUSTER-0006 requires immediate attention to ensure system availability.

About this task

The following procedure ensures that pods that were running on a node that goes down are scheduled to run on another available node.

Steps

1. Run the following command and check state to see if any node is down.

kubectl get nodes

2. If a node is down, attempt to restart it immediately.

82 Post-install Configuration and Maintenance

3. If a node remains down and restarting does not bring it online immediately, check to see if any pods are stuck in a Terminating state.

kubectl get po -A -o wide | grep Terminating

NOTE: Pods do not move into Terminating state until 330 seconds after the node goes down.

4. Evict the pods stuck in Terminating state.

kubectl drain --delete-local-data --ignore-daemonsets --force

5. Force delete the pods in Terminating state.

NOTE: Do this step only for pods in the following namespaces: nautilus-system, catalog, nautilus-pravega,

and customer created project namespaces.

kubectl delete po -n --grace-period=0 --force

Deleted pods are automatically scheduled to start up on available nodes.

6. Verify that the pods are running and that they are bound with persistent volumes.

kubectl get po -n

Change applied configuration Some configuration values can be changed after installation by changing and reapplying the values files.

Prerequisites

Consult with Dell EMC support personnel about the values that you want to change. Some values cannot or should not be changed using this procedure.

You must have cluster-admin role on the SDP Kubernetes cluster.

Typically, you start with the configuration values files that were used for the last configuration and edit those files with the changes you want to make. Values are never carried over from the existing configuration.

About this task

To change the current configuration, run the decks-install apply command to reapply edited configuration values to the cluster. Every time that you run the decks-install apply command, the entire configuration is reconstructed using:

Override values that you supply in the configuration values files, and Default values in the installer

If a value is not supplied in the configuration values files, the default values are used.

NOTE: The edited configuration files become your new configuration. Use the edited files as the baseline for future

configuration changes.

NOTE: If the original installation used multiple values files, be sure to specify all the files in the reapply procedure.

While it is possible to use kubectl or other Kubernetes tools to update the resources running on the cluster, that process is not recommended. When you use tools outside of the installer and its values file, you have no record of the current configuration. The next decks-install apply command overrides whatever changes you made using other tools.

Kubernetes handles the reconfiguration. The administrator does not have to manually stop or restart anything. The changed configuration is applied across the cluster as fast as the Kubernetes reconcile loop can apply it. The results may take some time to complete.

Depending on which values you change for which components, some services may be restarted for reconfiguration. As a result, short interruptions in service may occur. For example, if a configuration change causes some Pravega components to restart, then Pravega stream ingestion could stop processing while the reconfiguration occurs.

Post-install Configuration and Maintenance 83

Use the following procedure to change the configuration.

Steps

1. Prepare the configuration files, remembering that the new configuration is entirely reconstructed from the values files that you provide.

2. If the script indicates errors in your configuration values file, edit the files to correct the problems and rerun the script.

3. Run the validate-values.py script.

a. Go to the folder where you extracted the decks-installer- .zip file.

b. Run the following command:

$ ./scripts/validate-values.py <list of values files>

Where the <list of values files> includes:

The pre-install.yaml file that was generated by the pre-install script during initial installation. This file contains non-default credentials for Pravega components and is required.

Other values files that you have prepared according to instructions in Configuration Values on page 51. Make sure to include all the values files that you plan to use in the decks-install apply command, in the same

order.

Separate the values file path names with commas and no spaces. When the same field occurs in more than one of the values files, the value in the right-most file overrides the values in left-most files.

For example:

$ ./scripts/validate-values.py pre-install.yaml,values.yaml

c. If the script indicates errors in your configuration values file, edit the files to correct the problems and rerun the script.

4. Log in to the cluster. See Log in to OpenShift for cluster-admins on page 91.

5. Run the decks-install apply command.

The ---values option specifies the configuration values files. Include all the same values files as described in step 3 above.

For example:

$ ./decks-install apply --kustomize ./manifests/ --repo ./charts/ --values pre- install.yaml,values.yaml

6. Run the post-install script.

a. Go to the folder where you extracted the decks-installer- .zip file.

b. Run the script.

$ ./scripts/post-install.sh

c. If the script indicates errors in your cluster, fix the issues, rerun decks-install apply, and then rerun this script.

Graceful shutdown and startup Use this procedure to shut down SDP in an orderly manner.

Prerequisites

The following utilities are required: awk kubectl (with context set to the SDP cluster)

sh xargs

Steps

1. (Optional but recommended) Stop all running Flink applications with a savepoint.

84 Post-install Configuration and Maintenance

Use either kubectl or the Streaming Data Platform UI to issue savepoints.

2. Save a copy of the PodDisruptionPolicy files.

kubectl get pdb --all-namespaces -o json > pre-shutdown-pdb.json

3. Create a file with substitute availability controls.

SDP attempts to maintain a level of availability. For a graceful shutdown, you do not want the system to enforce the availability criteria. This step updates the PodDistruptionPolicy files with different content that permits the shutdown.

a. Create a file with the following content:

{ "spec":{ "maxUnavailable":null, "minAvailable":0 } }

b. Save the file with the name patch-pdb.json.

4. Edit the PodDisruptionPolicy files with the patch content.

Run the following command:

$ kubectl get pdb --all-namespaces --no-headers | awk '{print $1,$2}' | xargs -n2 sh -c \ 'kubectl patch pdb $2 -n $1 --patch "$(cat patch-pdb.json)"' sh

The command gets all PodDisruptionPolicy files, gets the name and namespace of each, and then updates them.

5. Transition all Kubernetes worker nodes into maintenance mode.

Use the following commands:

$ NUM_NODES=$(kubectl get nodes --no-headers | grep -c ".")

$ kubectl get nodes --no-headers | awk '{print $1}' | xargs -n1 kubectl cordon

$ kubectl get nodes --no-headers | awk '{print $1}' | xargs -n1 -P $NUM_NODES sh -c \ 'kubectl drain $1 --delete-local-data --ignore-daemonsets' sh

NOTE: The command uses the option --ignore-daemonsets. In a later step that shuts down the cluster, the

daemonsets are drained.

6. Shut down the cluster using recommendation of the Kubernetes service provider that you are running.

7. Restart the cluster using recommendation of the Kubernetes service provider that you are running.

8. Enable the nodes to schedule pods (uncordon them).

NOTE: This step is not required if you used the force commands in the previous steps. The nodes are already

uncordoned.

$ NUM_NODES=$(kubectl get nodes --no-headers | grep -c ".")

$ kubectl get nodes --no-headers | awk '{print $1}' | xargs -n1 -P $NUM_NODES sh -c \ 'kubectl uncordon $1' sh

9. Monitor the startup.

$ watch kubectl get pods --all-namespaces

Give the cluster some time to become stable.

Post-install Configuration and Maintenance 85

10. When the cluster is stable, restore the original PodDisruptionPolicy files.

$ kubectl replace --force -f pre-shutdown-pdb.json

Uninstall applications Use the decks-install unapply command to uninstall specified platform applications and their associated resources from the SDP cluster. These are applications mentioned in the Kubernetes manifests.

Prerequisites

Consult with Dell EMC support personnel about your intended outcomes before uninstalling applications from the SDP cluster.

WARNING: If you need to delete the Flink, Spark, or Pravega application, be aware that existing Flink, Spark, or

Pravega data will be marked for deletion as well.

If Flink is listed for removal, you must first delete all existing Flink projects. When you delete a project, all its resources (such as Flink, Spark, and PSearch clusters) are deleted with the project. To perform these deletions, use either of the following methods: The SDP user interface, go to the Analytics page and delete projects. The /scripts/project/clean_projects.sh script, supplied with the distribution, deletes all projects.

If the Pravega application is listed for removal, be aware that existing streams in Pravega will not be readable by a newly installed Pravega instance. Even if the nfs-client-provisioner.storageClass.archiveOnDelete setting is "true" in the current configuration, the archived data will not be readable by a new installation of the Pravega application.

About this task

The decks-install unapply command marks applications for removal from the Kubernetes cluster, based on a specified manifest bundle. One reason to perform an unapply for an application is to prepare to reinstall it with a different set of configuration values.

To uninstall all SDP applications and resources from cluster, so that you can start over with a new installation, use the decks-install unapply command with the same manifest that was used for the installation.

NOTE: Only resources that were initially created by SDP are removed. Other resources are not affected by the uninstall

procedure.

Steps

1. Identify or edit the manifest bundle. If you are uninstalling all SDP applications and resources from the cluster, so that you can start over with a new

installation, there is no need to update the manifest bundle. Use the same manifest bundle that you used with the decks-install apply command.

If you are uninstalling a few selected applications, you need a different manifest bundle. However, contact EMC support for advise. Some SDP resources depend on other resources.

2. Run the decks-install unapply command.

$ ./decks-install unapply --kustomize

For example:

$ ./decks-install unapply --kustomize ./unapplymanifest/

The decks-install unapply command does the following:

Marks applications and resources in the provided manifest bundle for deletion, in a pending state. By default, starts the synchronization process, which reconciles the cluster to the desired terminal state. An optional

parameter can defer the synchronization.

See decks-install unapply on page 163 for optional command parameters.

3. Check to ensure that the synchronization completes successfully.

86 Post-install Configuration and Maintenance

4. If the synchronization procedure fails for whatever reason, use the following command to start it again. It is safe to restart the synchronization procedure at any time.

$ ./decks-install sync --kustomize

Reinstall into existing cluster In testing and development scenarios, you may want to uninstall the SDP software from the Kubernetes cluster and start over with a fresh software installation.

Steps

1. Run the decks-install unapply command using all configuration values files that were used for the installation.

2. If long-term storage is on an ECS appliance, manually clean the Pravega bucket before performing another install that uses that bucket.

3. Clear old DNS entries from the external DNS server.

This step applies only if you are using a local DNS provider for external connections (such as CoreDNS). With those types of DNS providers, old entries are not automatically updated. This results in the DNS query returning both old and new entries. The workaround is to manually delete the old DNS entries for the cluster. Use the etcdctl tool.

NOTE: If you are using a cloud DNS provider for external connections (such as AWS Route53), the removal of old

entries is done for you. However, the installation is likely to take more time than the first installation. Propagating new

entries takes time, and depends on the DNS cache configuration in intermediate DNS servers.

a. Identify the entries:

> ETCDCTL_API=3 etcdctl get --prefix=true "" --endpoints http:// :2379| grep

b. Delete the entries:

> ETCDCTL_API=3 etcdctl del --endpoints http:// :2379

Here is an example session:

> ETCDCTL_API=3 etcdctl get --prefix=true "" --endpoints http://10.247.XX.XXX:2379| grep gladiator /skydns/com/dell/desdp/gladiator/keycloak/194bb2b2 {"host":"10.247.NNN.NNN","text":"\"heritage=external-dns,external-dns/ owner=gladiator.desdp.dell.com,external-dns/resource=ingress/nautilus-system/ keycloak\"","targetstrip":1} > ETCDCTL_API=3 etcdctl del /skydns/com/dell/desdp/gladiator/ keycloak/194bb2b2 --endpoints http:// :2379

4. Run the decks-install apply command using the configuration values files for the new installation.

Change ECS credentials after installation If ECS credentials are compromised, an SDP administrator can change the credentials.

About this task

It is required to run the commands in the next two tasks in a UNIX environment or in Windows Subsystem for Linux (WSL) on Windows.

Post-install Configuration and Maintenance 87

Update ECS credential used by the ecs-service-broker This task updates the Management User password that the ecs-service-broker uses to access ECS. For minimal interruption to service, use the following steps.

Steps

1. Calculate the base64 representation of the new Management User password that you intend to use in Step 3.

$ echo -n ChangeMe2 | base64 Q2hhbmdlTWUy

2. Update the secret value using the base64 value.

$ kubectl patch secret -n nautilus-system ecs-broker-connection-auth \ --type='json' -p='[{"op":"replace","path":"/data/password","value":"Q2hhbmdlTWUy"}]'

3. In the ECS UI, update the password for the Management User.

4. Restart the ecs-service-broker pods.

$ kubectl get pods -n nautilus-system | grep ecs-service ecs-service-broker-55d545785c-vxsxd 1/1 Running 0 3d6h

$ kubectl delete pod ecs-service-broker-55d545785c-vxsxd -n nautilus-system pod "ecs-service-broker-55d545785c-vxsxd" deleted

$ kubectl get pods -n nautilus-system ecs-service-broker-55d545785c-qdpfv 1/1 Running 0 9m53s <<<<<<<<<<<

Update ECS credential used by Pravega

This task updates the Object User Secret Key that Pravega uses.

About this task

Steps

1. Using the ECS UI, add a new SecretKey for the Object User.

NOTE: Do not set an expiration on the old key.

2. Calculate the base64 representation of the new SecretKey value.

$ echo -n w32cVABwifvVwwAm2HvIwAHsDn0mtvBCDlMMuggD | base64

dzMyY1ZBQndpZnZWd3dBbTJIdkl3QUhzRG4wbXR2QkNEbE1NdWdnRA==

3. Update the Pravega Secret.

$ kubectl patch secret -n nautilus-pravega nautilus-pravega-tier2-ecs \ --type='json' -p='[{"op":"replace","path":"/data/ SECRET_KEY","value":"dzMyY1ZBQndpZnZWd3dBbTJIdkl3QUhzRG4wbXR2QkNEbE1NdWdnRA=="}]'

secret/nautilus-pravega-tier2-ecs patched

4. Restart each SegmentStore pod, one at a time.

88 Post-install Configuration and Maintenance

NOTE: Wait for each SegmentStore pod to fully start up, validating each pod with the logs command before

attempting to restart the next pod.

$ kubectl get pods -n nautilus-pravega | grep segment-store nautilus-pravega-segment-store-0 1/1 Running 0 7m8s

$ kubectl delete pod nautilus-pravega-segment-store-0 -n nautilus-pravega pod "nautilus-pravega-segment-store-0" deleted

# POD will be recreated. Wait until it's fully started up $ kubectl logs nautilus-pravega-segment-store-0 -n nautilus-pravega

5. When all SegmentStore pods are using the new SecretKey, you may delete the old SecretKey in the ECS UI.

Post-install Configuration and Maintenance 89

Manage Connections and Users

Topics:

Obtain connection URLs Connect and login to the web UI Log in to OpenShift for cluster-admins Log in to OpenShift command line for non-admin users Create a user Assign roles User password changes

Obtain connection URLs Cluster administrators can obtain the connection URLs using kubectl.

Steps

1. Log in as described in Log in to OpenShift for cluster-admins on page 91.

2. List access points into the cluster. Run kubectl get ingress --all-namespaces to list all access points into the cluster.

For example:

kubectl get ingress --all-namespaces NAMESPACE NAME HOSTS ADDRESS PORTS AGE my-project my-flinkcluster my-flinkcluster.my-project.test-psk.abc- lab.com 192.0.2.8... 80, 443 6d my-project repo repo.my-project.test-psk.abc-lab.com 192.0.2.8... 80, 443 6d nautilus-pravega nautilus-pravega-grafana grafana.test-psk.abc-lab.com 192.0.2.8... 80, 443 8d nautilus-pravega pravega-controller pravega-controller.test-psk.abc-lab.com 192.0.2.8... 80 8d nautilus-pravega pravega-controller-api pravega-controller-api.test-psk.abc- lab.com 192.0.2.8... 80 8d nautilus-system keycloak keycloak.test-psk.abc-lab.com 192.0.2.8... 80, 443 8d nautilus-system nautilus-ui test-psk.abc-lab.com 192.0.2.8... 80, 443 8d

All the values in the HOSTS column are valid access points for authorized users.

In the NAME column, locate nautilus-ui, and take note of the value in the HOSTS column. Values in the HOSTS column are URLs for external connections to the User Interface, and is the value to use in the configuration values file.

For example, from the list above, users can connect to the UI from external locations with the following URL:

https://test-psk.abc-lab.com

10

90 Manage Connections and Users

Connect and login to the web UI The SDP User Interface is a web interface, available for external connections over HTTPS.

Steps

1. Type the URL of the SDP User Interface in a web browser. The SDP login window appears.

2. Log in to SDP.

If your administrator provided local user credentials, use those credentials. If LDAP is integrated, use your enterprise credentials.

3. Click Log In.

If your username and password are valid, you are authenticated to SDP. One of the following windows appears:

Window Explanation

The Dashboard page on the UI. The username has authorizations that are associated with it.

A welcome message. The username is authenticated but not authorized to see any data.

4. If you need authorizations, ask an Administrator to make you a member of one or more projects.

Log in to OpenShift for cluster-admins This procedure is for admin users to manage the SDP Kubernetes cluster.

Prerequisites

You must have installed the OpenShift CLI on your local working platform. See https://docs.openshift.com/container- platform/4.6/cli_reference/openshift_cli/getting-started-cli.html.

You must have credentials with cluster-admin role for the SDP cluster in OpenShift.

Steps

1. Run the login command.

$ oc login -u kubeadmin -p $(cat ~/startline/ocp/auth/kubeadmin-password) The server uses a certificate signed by an unknown authority. You can bypass the certificate check, but any data you send to the server could be intercepted by others. Use insecure connections? (y/n): y

Login successful.

You have access to 68 projects, the list has been suppressed. You can list all projects with ' projects'

Using project "default". $

Where: The file used in the -p option is standard. Every Openshift installation creates a kubeadmin password in<openshift

installation folder>/auth/kubeadmin-password.

To use certificates, find them in the kubeconfig file here: <openshift installation folder>/auth/ kubeconfig

2. Answer the prompts for the server URL and your credentials.

See https://docs.openshift.com/container-platform/4.6/cli_reference/openshift_cli/getting-started-cli.html#cli-logging- in_cli-developer-commands for an example.

Manage Connections and Users 91

Results

After successful login, you can manage the SDP cluster. Underlying scripts have copied the kube config files for you.

Log in to OpenShift command line for non-admin users This procedure is for non-admin users (project members) to interact with the SDP K8s cluster on the OpenShift command line.

Prerequisites

Steps

1. The non-admin user who wants to use the OpenShift command line may need to log into the SDP UI one time with their LDAP account.

A Welcome screen indicates that the user account was authenticated but there are no authorizations associated with the account.

In the background, the user account is added into Keycloak. This addition enables the next step.

NOTE: This step is not required if LDAP federation is configured with the Synchronize all users option.

2. An SDP admin makes the user a member of projects. See Add or remove project members on page 108.

When users are assigned to projects, RBAC rules are created. Those rules apply to the SDP UI and OpenShift command line access.

3. The user can log in to the OpenShift command line with the LDAP credentials.

oc login -u

Create a user

When LDAP federation is enabled, create a new user in LDAP.

When federation is not enabled, you may create local users in Keycloak and make them project members. However, these users can access project resources only through the SDP UI. Local users cannot log in to OpenShift.

NOTE: When federation is not enabled, there is no access by non-admin users to the kubectl plane. Federation is required

for that to be possible.

Add new local user on the Keycloak UI

Without LDAP federation, administrators use the Keycloak dashboard to create usernames for access to the SDP UI.

Steps

1. In a browser window, go to the Keycloak endpoint in the SDP cluster.

To list connection endpoints, see Obtain connection URLs on page 74. If the SDP UI is open, you can try prepending keycloak. to the UI endpoint. For example, http://

keycloak.sdp.lab.myserver.com. Depending on your configuration, this might not always work.

2. On the Keycloak UI, click Administration Console.

3. Log in using the Keycloak administrator username (admin) and password.

See Obtain default admin credentials on page 77.

4. Click Manage > Users.

5. On the Users screen, click Add User on the right.

92 Manage Connections and Users

6. Complete the form.

NOTE: The username must conform to Kubernetes and Pravega naming requirements as described in Naming

requirements on page 103.

7. Optionally click the Credentials tab to create a simple initial password for the new user.

Create a temporary password. Enable Temporary, which prompts the user to change the password on the next login.

8. To authorize the new user to perform actions and see data in SDP, make the user a member of projects.

Assign roles SDP supports admin and project member roles. Admins assign roles, as shown in the following table.

User request Role required Procedure

Become a project member

Administrators add users to a project.

See Add or remove project members on page 108.

If federation is enabled, the project member role is granted in Keycloak and in Kubernetes.

Become an Administrator

Administrators assign the admin role to other users.

1. Create the new user in Keycloak. 2. Assign the admin realm role to the user, using the Keycloak UI.

3. Consider whether you want to give this user cluster-admin role in Kubernetes. With or without federation, the corresponding role on the Kubernetes side is not granted.

User password changes

Scenario Description

Federation is enabled. All user accounts are managed in the identity provider, such as LDAP. The user changes a password in LDAP. Even though there is a shadow account in Keycloak, the user can ignore it. A password is not needed there.

Federation not used. All user accounts are managed in the local Keycloak instance. The user changes a password in Keycloak.

Change password in Keycloak

Use this procedure to change a password or other profile attributes in the local Keycloak system.

Steps

1. Log in to the SDP UI with the username whose password or other profile attributes you want to change.

2. In the banner, click the User icon.

3. Verify that the username at the top of the menu is the username whose profile you want to change.

4. Choose Edit Account.

Manage Connections and Users 93

5. To change the password, complete the password-related fields.

6. Edit other fields in the profile if needed.

7. Click Save.

8. To verify the password change:

a. Click the User icon and choose Logout. b. Log back in using the new password.

94 Manage Connections and Users

Expand and Scale the Infrastructure

Topics:

Difference between expansion and scaling Expansion Scaling

Difference between expansion and scaling As your SDP accommodates more projects, more streams, and more analytic applications, it may require expansion and scaling.

NOTE: Dell recommends that you engage Dell Customer Support before using these instructions. The support team can

clearly determine the bottlenecks in your current setup and your expansion needs.

Expansion Expansion means to add resources to the underlying infrastructure. For SDP, expansion activities are: 1. Determine expansion needs. 2. Add capacity by adding new hosts (or racks) to the underlying cluster of hosts. 3. Add the new hosts to the supporting distributed switches. 4. You may also add capacity vertically by adding more disks if slots are available.

Scaling Scaling means to configure the Kubernetes cluster and SDP components to take advantage of the new resources. Scaling tasks are: 1. Determine scaling recommendations. 2. Scale the Kubernetes cluster. 3. Scale components in SDP.

Expansion The following sections describe how to determine SDP expansion needs and how to perform the expansion on an existing platform.

Determine expansion requirements

If the SDP infrastructure needs expanding with additional resources, contact Dell EMC Technical Support to discuss specific requirements for your use cases.

Some indicators that the underlying infrastructure may need additional resources are:

At the Pod LevelAll the pods may have high utilization (CPU, memory, or disk) and scaling up replicas fails. At the host levelHosts show high utilization when you compare current and past utilizations.

The OpenShift cluster administrator should engage with Dell EMC Technical Support to determine the capacity to add. The Technical Support team uses current usage reports from SDP to analyze resource usage and make recommendations.

11

Expand and Scale the Infrastructure 95

Add new rack

To add a rack, contact the Dell EMC support team for guidance. The support team can help determine sizing and configuration requirements.

Add nodes to the OpenShift cluster

You can adjust the number of worker machines in your OpenShift Container Platform cluster. You scale the worker machines by increasing the number of replicas that are defined in the worker machine set.

About this task

To add nodes to a Kubespray cluster, see Manage SDP Edge on page 41.

Steps

1. Engage with the Dell EMC support team for guidance. The support team can help determine sizing and configuration requirements.

2. See https://docs.openshift.com/container-platform/4.6/scalability_and_performance/recommended-cluster-scaling- practices.html.

Add supporting storage

You can expand the capacity of the storage that SDP uses for project artifacts by expanding the OpenShift cluster.

Steps

See https://docs.openshift.com/container-platform/4.6/installing/installing_bare_metal_ipi/ipi-install-expanding-the- cluster.html.

Scaling The following sections describe how to determine appropriate scaling values and how to perform the scaling tasks.

Get scaling recommendations

The provisioner.py script makes scaling recommendations for the OpenShift cluster and the SDP internal components. You provide information about newly added hosts, and the script outputs appropriate scaling recommendations.

Prerequisites

Expand the underlying infrastructure with additional hosts before using this procedure.

Steps

1. Go to the folder where you extracted the decks-installer- .zip file.

2. Run the provisioner.py script.

NOTE: The provisioner.py script must run from inside the scripts directory.

Change directories to the scripts directory and then run the script with the --help option to see all arguments.

$ cd scripts $ python3 provisioner.py --help

96 Expand and Scale the Infrastructure

Run the script with only the --outfile argument, and receive prompts for all the other arguments. The log file saves the prompts and answers.

$ cd scripts $ python3 ./scripts/provisioner.py --outfile mydir/provisioner

Run the script with all or some arguments specified. The --outfile argument is always required.

You receive prompts only for the missing arguments. Skip the input prompts completely by supplying all arguments. Skipping prompts is useful for automating the script.

Option Description

--num-hosts-present n Number of hosts initially in the OC cluster.

--num-hosts-added n Number of hosts added.

--num-host-failures n Number of host failures to tolerate among newly added hosts. 0 is typical and acceptable. During the initial installation, some failures

to tolerate were already considered. If you are doubling the size of the cluster, then 1 is recommended.

--num-physical-cpu n Number of physical CPU cores per each host. This value depends on the underlying infrastructure deployment plan used.

--mem-gb n Memory in GB per each host.

--local-disk-count n Number of local disks used for Bookkeeper per host.

--percent-analytics n Provide the percentage of added resources to allocate to the analytics engine. The system assigns the remainder of added resources to Pravega. Enter a number from 0 to 100.

--outfile pathname Required. Provide the pathname for the output file from this command.

3. Save the output for later use with the scaling.py script.

The output is similar to the following:

bookkeeper: 1 controller: 1 failures_to_tolerate: 0 metadata_heavy_workload: false segment_store: 1 segment_store_cache_max_size: 28991029248 segment_store_jvm_options_direct_memory: 28 segment_store_jvm_options_mx: 4 vm_cpus: 8 vm_local_drives: 0 vm_ram_gb: 32 vms: 3 worker_vm_count_for_applications: 3 worker_vm_count_for_pravega: 3 zookeeper: 1

Scale the K8s cluster

You can scale the cluster by changing the number of worker nodes in the cluster.

About this task

First determine the optimal number of worker nodes to configure and then resize.

Steps

1. Calculate the new number of worker nodes to configure.

Expand and Scale the Infrastructure 97

The new number of worker nodes is the current number plus the recommended increases. The increases are for Pravega and for applications as reported in the output from the provisioner.py script.

= existing + worker_vm_count_for_applications + worker_vm_count_for_pravega

NOTE: For baremetal deployments, read worker_vm_count as worker_count (worker nodes in Kubernetes).

2. Run the following command to change the number of worker nodes in the Kubernetes cluster:

oc resize <cluster-name> --num-nodes

Where: cluster-name is the SDP Kubernetes cluster name. is from the previous step.

Scale SDP

Run the scale.py script to scale internal components in SDP.

About this task

This script uses the following as input: The sizing recommendations file that the provisioner.py script generated. The values.yaml files containing the current configuration settings.

The script generates a file with adjusted values for some configurations. Dell recommends that you always run the script twice:

The first time, use the --dry-run option. This option generates a file of proposed changes but does not apply them. You can review the changes.

The next time, omit the --dry-run option. In addition to generating a file of proposed changes, the script creates a job on the cluster to track scaling. If scaling is blocked, it sends alerts.

Using the output file from this script, the decks-install apply command performs the scaling work.

Steps

1. Go to the folder where you extracted the decks-installer- .zip file.

2. Run the script with the --help option to view all options.

$ python3 ./scripts/scale.py --help

3. Run the scale.py script with the --dry-run option.

$ python3 ./scripts/scale.py --dry-run -p -i <yamlfile1,yamlfile2,...

Where:

is the sizing output from the provisioner.py script.

values are the configuration values filenames that were used during platform installation. Provide all the filenames. Separate filenames with commas and do not include spaces.

4. Review the script output.

The output is a summary of resources that would occur after scaling. If the output is not as you expected, rerun the provisioner.py script with different values.

5. When the dry-run output is acceptable, rerun the scale.py script, this time omitting the --dry-run option.

$ python3 ./scripts/scale.py -p -i <yamlfile1,yamlfile2,...

98 Expand and Scale the Infrastructure

In addition to the resource summary, the scale.py script without the --dry-run option produces the following output:

The resource summary as in the dry run A yaml file that contains the scaling changes to apply to your configuration.

On-screen information about the scaling changes and the location of the generated yaml file.

INFO - Current configuration, zookeeper: 3, bookkeeper: 6, segmentstore: 4, controller: 4 INFO - Proposed configuration, zookeeper: 3, bookkeeper: 7, segmentstore: 5, controller: 5 INFO - {'zookeeper_status': 'None', 'bookkeeper_status': 'None', 'segment_store_status': 'None', 'controller_status': 'None'} INFO - Run decks-install with file /Users/mydir/Downloads/decks-installer-1/ scaling_values_04-19-2020_06-17-24_PM.yaml added at the end of your values file list

6. Run decks-install apply, adding the generated file from the last step onto the end of your standard list of yaml filenames.

For example:

./decks-install-darwin-amd64 apply -k ./manifests/ --repo ./charts/ -f ./my-values1.yaml,./my-values2.yaml,./my-values3.yaml,/Users/mydir/Downloads/decks- installer-1/scaling_values_04-19-2020_06-17-24_PM.yaml

This command applies the configurations that trigger scaling actions.

7. Monitor the configuration changes on the UI.

The changes take time. Give Kubernetes time to cycle through the synchronizations and settle. To monitor the changes, go to Settings > Pravega cluster.

Scale Apache Flink resources

You can scale the resources available for processing Apache Flink jobs.

Prerequisites

This procedure assumes the following: The underlying infrastructure was expanded by adding additional hosts. The underlying SDP cluster was sized and scaled up, creating additional worker nodes.

About this task

Apache Flink supports changing the parallelism of a streaming job. It provides this support by restoring the job from a savepoint using a different parallelism. It supports changing the parallelism for the entire job and the operator parallelism.

You can change any of the following attributes. You can change these attributes while jobs are running.

Expand and Scale the Infrastructure 99

The number of Task Managers (replica count) for a Flink cluster The default parallelism for a Flink application The parallelism specification for a Flink application

Steps

1. To change the number of Task Managers (replica count) for a Flink cluster:

a. Log in to the SDP UI as an admin or project member. b. Click Analytics > > Flink Clusters. c. Locate the cluster name and click Edit in the Action column. d. In the Task Managers section, change the Number of Replicas. e. Click Save.

The Flink operator scales the Task Managers to the requested number.

2. To update the default parallelism for an application:

NOTE: Scaling applications interrupts service.

a. Log in to the SDP UI as an admin or project member. b. Click Analytics > > Apps. c. Locate the application name and click Edit in the Action column. d. Click Properties. e. In the Configuration section, change the Parallelism field. f. Click Save.

NOTE: An application does not necessarily use the changed default parallelism. Usage depends completely on how the

user application is developed. A suggestion is that the application accepts a parameter that defines the parallelism for a

particular step. Changing the parameter would force the application to redeploy.

After changing any properties, Flink automatically does the following: Stops affected applications. If required, uses a Savepoint. Redeploys the affected applications using the new values. If a Savepoint was used, redeploys from the Savepoint.

3. To update the parallelism defined in an application specification (an uploaded artifact):

a. Edit the application artifact. b. Log in to the SDP UI as an admin or project member. c. Click Analytics > project-name > Apps. d. Locate the application name and click Edit in the Action column. e. Click Properties. f. In the Configuration section, upload the updated specification.

After uploading the new specification, Flink automatically does the following: Stops affected applications. If required, uses a Savepoint. Redeploys the affected applications, using the new specification. If a Savepoint was used, redeploys from the Savepoint.

Impact of cluster expansion and scaling

Scaling Pravega

Scaling Pravega segment stores results in a rebalance of segment containers. This rebalance moves some segment containers to segment stores on new hosts.

The impact on ingest is limited to the client timeout value. The timeout occurs only for connections to segment stores that are handling a stream whose containers were moved to a new segment store. The retry after a timeout succeeds, because the client would reconnect to the new segment store.

100 Expand and Scale the Infrastructure

Scaling Analytics

Scaling Pravega stores has some impact on readers. For example, cache on a new segment store does not have any entries from the previous segment store. Tail readers become historical readers (for a short time).

To leverage newly added resources, you may change the number of replicas in a Flink cluster or the application parallelism. These updates cause applications running in the affected Flink cluster to restart.

Expand and Scale the Infrastructure 101

Manage Projects Administrators create, delete, and configure analytic projects.

Topics:

Top-level navigation in the UI Naming requirements Manage projects Manage scopes and streams

Top-level navigation in the UI The SDP UI banner contains icons for browsing the major portions of the interface.

Table 16. Summary of top-level navigation in the UI

Icon Description

Dashboard For admins only

This dashboard. It also shows: Recent stream activity. Unusual fluctuations or nonactivity may indicate a problem to

investigate A Storage dashboard. The Pravega metrics link.

The Pravega metrics link opens the Grafana UI with the Pravega Monitoring plug-in. The graphs in this plug-in provide more detail, for longer time spans, than the summary dashboard on the SDP. The plug-in shows metrics about platform activity, storage, and alerts. See Use Pravega Grafana Dashboards on page 120.

Analytics The Analytics icon shows the Project page. This page lists the projects that your user account has permission to access. From this page, you can: View summary information about your analytic projects. If you are an administrator, create projects. Jump to a project page. From there, you can:

Create Spark, Flink, and Pravega Search clusters. If the project was created with Metrics enabled, you can jump to the Grafana associated

with the project.

See Manage projects on page 104.

Pravega The Pravega icon shows the Scopes page. This page lists the scopes that your login account has permission to access. From this page, you can: View summary information about scopes. Drill into a scope page, and from there, into streams. Create streams. Create a schema group for a stream. If you are an administrator, manage Cross Project Scope Sharing.

See Manage scopes and streams on page 110.

System The System icon shows the following tabs: ComponentsThis tab lists all software components in the platform, their state,

namespace, K8s resources, and version numbers. EventsThis tab lists events. The page includes filtering, search, and acknowledgment

features. See Monitor and manage events on page 116.

12

102 Manage Projects

Table 16. Summary of top-level navigation in the UI (continued)

Icon Description

Storage ViewThis tab shows information about configured storage.

Settings The Settings icon shows the following tabs: LicenseShows information and status for all licenses:

Platform license Spark license Flink license Pravega Search license

SRS GatewayShows information about the SRS Gateway license and status of SRS telemetry.

Pravega ClusterShows Pravega segmentstore, bookkeeper, controller, and zookeeper status.

User The User icon shows a drop-down menu with these options: The username that you used to log in to the current session. (This item is not clickable.) Edit AccountOpens the Keycloak UI (for testing situations only). See Change password

in Keycloak on page 93.

NOTE: When LDAP federation is configured, users manage their accounts in LDAP.

Product SupportOpens the SDP Documentation Hub. From there, you can access: The product documentation. The Product Support page, where you can download product code and get help. The SDP Code Hub, where you can download sample applications, code templates,

Pravega connectors, and view demos. Logout.

Naming requirements These requirements apply to user names and resource names in SDP.

User name and project name requirements

User names and project names must conform to the following Pravega naming conventions:

The characters allowed in project names and user names are: digits ( 0-9 ), lower case letters ( a-z ), and hyphen ( - ).

The names must start and end with an alphanumeric character (not a hyphen). Project names are limited to 15 characters in the UI. For user names, the first two points apply to any user name that will become a project member or

admin, regardless of the registry in which they are defined (Keycloak, LDAP database, or other user database).

Other resource names

All other resource names must conform to Kubernetes naming conventions:

The characters allowed in names are: digits ( 0-9 ), lower case letters ( a-z ), hyphen ( - ) , and period ( . ).

The UI enforces character limitations on some resource names.

Manage Projects 103

Manage projects This section defines the SDP concept of projects and describes administrative tasks for managing them.

About projects

Projects are important resources in SDP.

All analytic processing capabilities are contained within projects. Projects provide support for multiple teams working on the same platform, while isolating each team's resources from the others. Project members can collaborate in a secure way. SDP manages resources for each project (each team) separately.

An SDP project is a Kubernetes custom resource of kind Project. The Project resource is a Kubernetes namespace enhanced with the following resources and services:

Resources and services Function

Maven repository Stores artifacts for analytic applications in the project

Zookeeper cluster Supports fault tolerant clusters for analytic processing

Project storage A persistent volume claim (PVC) for analytic applications

Pravega credentials Allows analytic jobs to communicate with Pravega

Pravega scope Represents a top level construct for grouping all the project streams. The Pravega credentials are configured to have access to this scope.

Project metric stack If metrics is enabled for the project, InfluxDB and Grafana dashboards are created.

Developers and data analysts work within projects. Each project has its own Maven repo, its own set of cluster resources for analytic processing, its own scope and streams, and its own set of project members. Only project members (and platform administrators) are authorized to view the assets in a project, access the streams, upload application artifacts, and run analytic jobs. Project isolation is one way that SDP implements data protection and isolation of duties.

A project must be created by an SDP administrator. The administrator can use either of the following methods to create a project:

SDP UIThis is the quickest and most convenient method. Kubernetes commands and a resource fileUse this method if the default configurations employed by the UI do not satisfy

the project team's needs.

Create a project

Create a project on the SDP UI.

Steps

1. Log in to the UI as an admin.

2. Click the Analytics icon.

The Analytic Projects table appears.

3. Click Create Project at the top of the table.

4. In the Name field, type a name that conforms to Kubernetes naming conventions.

The project name is used for the following: Project name in SDP UI The Kubernetes namespace for the project A local Maven repository for hosting artifacts for applications defined in the project The project-specific Pravega scope Security configurations and settings that allow all analytic applications in the project to have access to all the Pravega

streams in the project's scope

5. In the Description field, optionally provide a short phrase to help identify the project.

104 Manage Projects

6. Provide storage attributes for the project.

The fields that appear are different depending on the type of long-term storage that is configured for SDP.

Long-term storage type

Field name Description

NFS Storage Volume Size Provide the size of the persistent volume claim (PVC) to create for the project. This value is the anticipated space requirement for storing all the streams in the project.

SDP provisions this space in the configured PowerScale file system or node disks.

Maven Volume Size Provide the size of the PVC to create for the Maven repository for the project. This value is the anticipated space requirement for storing application artifacts for the project.

SDP provisions this space in the configured default storage class.

Namespace on ECS Bucket Plan Choose the plan for provisioning the S3 bucket for the project. Plans are defined in the configuration values file. There is always a default plan.

The system provisions the bucket in the configured ECS namespace. You can view the project bucket name on the project page in the UI.

Maven Volume Size Provide the size of the PVC to create for the Maven repository for the project. This value is the anticipated space requirement for storing application artifacts for the project.

SDP provisions this space in the configured default storage class.

7. Under Metrics, choose whether to enable or disable project-level analytic metrics collection.

The option is enabled by default. Data duration policy is set to two weeks.

For more information about Metrics, see the Dell EMC Streaming Data Platform Developer's Guide at https://dl.dell.com/ content/docu103272.

8. Click Save. The new project appears in the Analytic Projects table in a Deploying state. It may take a few minutes for the system to create the underlying resources for the project and change the state to Ready.

Create a project manually

Use this task to create a project on the command line. With this method, you can alter more of the configuration settings.

About this task

An SDP project is a Kubernetes namespace with a single Project resource in it. Project is a custom Kubernetes resource.

In this task, you first create a namespace and then add the resource of kind Project to that namespace. The Project resource triggers deployment of project-related artifacts and services, such as zookeeper, maven, and so on.

NOTE: The following rules are important:

The names of the namespace and the Project resource must match.

Only one Project resource can exist in a namespace.

Steps

1. On the command line, log in to the SDP Kubernetes cluster as an administrator (admin role).

Manage Projects 105

2. Create a Kubernetes namespace, using the project name as namespace name.

$> kubectl create namespace

Where conforms to the Kubernetes naming conventions.

3. Create a yaml file that defines a new resource of kind Project.

a. Copy the following resource file as a template.

apiVersion: nautilus.dellemc.com/v1alpha1 kind: Project metadata: name: spec: maven: persistentVolumeClaim: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi storageClassName: ""

#for PowerScale long-term storage storage: persistentVolumeClaim: accessModes: - ReadWriteMany resources: requests: storage: 10Gi storageClassName: nfs

#for ECS long-term storage storage: plan: default parameters: reclaim-policy: Delete

zookeeper: size: 3 metrics: enabled: true

b. Edit the following values in the yaml file.

Label Description

metadata.name: The name that you assigned to the namespace

This name and the namespace name must match.

spec.maven.persistentVolumeClaim .resources.requests.storage:

The size of PVC storage for the Maven repository

This repository holds artifacts for the project. The UI uses a default of 10Gi. Increase this value if a large number of artifacts are expected for the project.

spec.maven.persistentVolumeClaim .storageClassName:

The PVC storage class name created during infrastructure setup for maven

You may leave this setting blank.

#for PowerScale long-term storage

storage: persistentVolumeClaim: accessModes:

Defines long-term storage resources.requests.storage is the size of PVC storage for

shared storage between all clusters in the project. This space stores all checkpoints and savepoints. Consider the expected state size. If you create a Project on the UI, the default value is 10Gi.

106 Manage Projects

Label Description

- ReadWriteMany resources: requests: storage: 10Gi storageClassName: nfs

#for ECS long-term storage

storage: plan: default parameters: reclaim-policy: Delete

Defines long-term storage plan is optional. If not provided, the default plan is used.

parameters is optional.

zookeeper.size: The number of nodes in the Zookeeper cluster

The Flink clusters use Zookeeper to provide high availability. Under typical conditions, a setting of 3 is sufficient.

metrics.enabled:true Enables the creation of project-specific metrics stacks.

c. Check that the syntax is valid yaml and save the file.

4. Apply the resource file.

$> kubectl create -n -f .yaml

5. Check the project for readiness.

$> kubectl get Project -n

The response indicates the status of the resource. The Status:Ready: flag changes to true when the project resource is ready for use. It may take several minutes for the framework to prepare the supporting infrastructure for the project.

Delete a project

When an administrator deletes a project, SDP cleans up all resources associated with the project.

Steps

1. Log in to the UI as an admin.

2. Click the Analytics icon.

The Analytic Projects table appears, listing all projects.

3. Click Delete in the project row.

SDP deletes all resources in the project namespace, including: Flink applications and clusters Spark applications PSearch queries and the PSearch cluster The project's Maven repository The project's analytic metrics stack The project's scope and all streams in it

The actual stream data might be archived on the storage media, depending on configuration settings. Regardless, the data is not accessible by SDP after the project is deleted. For ECS, archival depends on the Reclaim Policy setting for the bucket plan, which is established when the project is

created. If reclaim policy is Detach, the project bucket is left intact even though SDP cannot access it. Plans are set up in the configuration values file. The default used is Detach. (The data is archived.)

Manage Projects 107

For NFS, the underlying Kubernetes PVC is deleted. Archival depends on the nfs-client- provisioner.storageClass.archiveOnDelete: setting in the configuration values file. The default used is true. (The data is archived.)

For more information about archive settings, see Provision long-term storage on PowerScale on page 49 and Provision long-term storage on ECS on page 49

Add or remove project members

Project members have permission to create, modify, or view the streams and applications within the project.

Prerequisites

The username to add to a project must exist as an SDP username. This task does not create user accounts.

About this task

NOTE: Never add admin users as project members. Admin users can always access all projects.

Steps

1. Log in to the SDP UI as an admin.

2. Go to Analytics > > Members.

A table of existing project members appears, with a textbox for entering a new username in the table header.

3. To add a user to the project, type an existing SDP username in the Username textbox, and click Add Member.

NOTE: Do not add admin users as project members. They have full access to all projects.

The username appears in the refreshed table of members.

4. To remove a member, locate the member name in the table, and click Remove in that row.

List projects and view project contents

Administrators can view summary information about all projects. Other users can view information only about the projects of which they are members.

Steps

1. Click the Analytics icon.

The project table lists the projects that your user credentials give you permission to view.

2. Click a project name to drill into that project.

The page header includes identifying information about the project. Creation Date Description Storage information:

For ECS long-term storage, the system-generated bucket name for the project and the access credential are available.

For PowerScale long-term storage, the project volume size is displayed.

108 Manage Projects

For more information about long-term storage configuration, go to System > Storage.

The remainder of the page is a dashboard showing: Number of Flink clusters and applications defined in the project Number of Spark applications defined in the project Number of application artifacts uploaded (for admins only) Number of members that are assigned to the project Number of streams in the project Number of PSearch clusters defined, number of streams that are configured as searchable, and number of continuous

queries registered

Under the icons, a Messages section shows Kubernetes event messages pertaining to the project, if any are available.

NOTE: Kubernetes events are short-lived. They only remain for one hour.

3. To drill further into aspects of the project, click the tabs along the top of the dashboard:

Tab Description

Artifacts Manage project artifacts. There are two types of artifacts: Maven Upload, download, and delete artifacts in the project's Maven repository. FilesUpload, download, and delete files directly.

Members Manage project members. Administrators can view current members, add members, and delete members.

This tab does not appear for non-admin users.

Flink View, create, and manage Flink clusters and Flink applications.

Spark View, create, and manage Spark applications.

Pravega Search

View, create, and manage PSearch clusters. Make streams searchable. Register continuous queries. This screen also includes a link to the Pravega Search Kibana integration.

4. To view operational metrics about the project, click the Metrics link in the header.

Manage Projects 109

What's next with projects

After project creation, users have various interests and responsibilities towards projects depending on their roles. Administrators maintain the project's member list. A project team usually consists of developers and data analysts. Platform

administrators have access to all projects by default. Administrators should also monitor resources associated with application processing and stream storage. They may also

monitor stream ingestion and scaling. Developers typically create Flink clusters and Spark applications, upload their application artifacts, create the required

streams associated with the application, and run and monitor applications. Data analysts may run and monitor applications. They may also need to monitor or analyze metrics for the project. Developers may create PSearch queries. Administrators and project members may use the metrics in the project's Grafana UI for analysis and troubleshooting.

More information

The Dell EMC Streaming Data Platform Developer's Guide at https://dl.dell.com/content/docu103272 describes how to add Flink and Spark applications toSDP, associate streams to applications, start, stop, restart and monitor applications. It also describes how to use the embedded Flink and Spark UIs.

Monitor Health on page 114 and Use Pravega Grafana Dashboards on page 120 describe administrative tasks for ensuring that adequate storage and processing resources are available to handle stream volume and analytic jobs.

Manage scopes and streams This section defines Pravega scopes and streams and describes administrative tasks for managing them.

About scopes and streams

Scopes and streams are Pravega constructs that exist within the context of SDP projects.

Pravega streams A Pravega stream is an unbounded stream of bytes or stream of events. Pravega writer REST APIs write the streaming data to the Pravega store. Before that can happen, the stream must be created and configured.

Administrators or project members can create and configure streams in the SDP UI. Applications may also create new streams or be associated with existing streams.

A stream must be created within a scope.

Pravega scopes A Pravega scope provides a name for a collection of streams. The full name for a stream is scope- name/stream-name.

In SDP, each Analytic project has its own Pravega scope. The scope name is the same as the project name. SDP creates the scope for the project when a project is created, and the new scope appears in the list of scopes on the Pravega page of the UI.

The new scope is registered as a protected resource in Keycloak. Also, a project-wide service account is created with authorization to access this scope and all the streams under it.

Access to scopes and streams

Project members and applications running in a project have READ and UPDATE access to the streams in their project's scope.

The cross project scope sharing feature provides applications with READ access to scopes in other projects. In that way, applications in one project can read stream data from other projects. A project scope can be shared in a read only mode with many other projects.

Administrators manage cross project scope sharing on the UI, granting and removing access.

Pravega schema registry

The Pravega schema registry manages schemas, policies and codecs associated with Pravega streams. The registry supports schema groups that let you evolve schemas as streams are enhanced or changed over time. All schemas in the group are associated with the stream and Pravega can apply them as appropriate. For more about using the schema registry, see the Dell EMC Streaming Data Platform Developer's Guide at https://dl.dell.com/content/docu103272.

110 Manage Projects

Create and manage streams

Use this procedure to define a new stream.

Steps

1. Log in to the SDP UI as admin or a project member.

2. Click the Pravega icon in the banner.

A list of scopes appears. Scope names are the same as project names.

3. Click a scope.

4. Click Create Stream.

5. Complete the configuration screen as described in Stream configuration attributes on page 111.

Red boxes indicate errors.

6. When there are no red boxes, click Save. The new stream appears in the Pravega Streams table. The new entry includes the following available actions: Edit Change the stream configuration. Delete Delete the stream from the scope (project).

Stream configuration attributes

The following tables describe stream configuration attributes, including segment scaling attributes and retention policy.

General

Property Description

Name Identifies the stream. The name must be unique within the scope and conform to Kubernetes naming conventions. The stream's identity is:

scopename/streamname

Scope The scope field is preset based on the scope you selected on the previous screen and cannot be changed.

Segment Scaling

A stream is divided into segments for processing efficiency. Segment scaling controls the number of segments used to process a stream.

There are two scaling types: Dynamic With Dynamic scaling, the system determines when to split and merge segments for optimal performance.

Choose dynamic scaling if you expect the incoming data flow to vary significantly over time. This option lets the system automatically create additional segments when data flow increases and to decrease the number of segments when the data flow slows down.

Static In Static scaling, the number of segments is always the configured value. Choose static scaling if you expect a uniform incoming data flow.

You can edit any of the segment scaling attributes at any time. It takes some minutes for changes to affect segment processing. Scaling is based on recent averages over various time spans, with cool down periods built in.

Scaling type Scaling attributes Description

Dynamic Trigger Choose one of the following as the trigger for scaling action: Incoming Data Rate Looks at incoming bytes to determine when segments

need splitting or merging. Incoming Event Rate Looks at incoming events to determine when

segments need splitting or merging.

Manage Projects 111

Scaling type Scaling attributes Description

Minimum number of segments

The minimum number of segments to maintain for the stream.

Segment Target Rate

Sets a target processing rate for each segment in the stream. When the incoming rate for a segment consistently exceeds the specified

target, the segment is considered hot, and it is split into multiple segments. When the incoming rate for a segment is consistently lower than the specified

target, the segment is considered cold, and it is merged with its neighbor. Specify the rate as an integer. The unit of measure is determined by the trigger choice. KB/sec when Trigger is Incoming Data Rate. The default value in the UI is set

to 5120 KB/s. You can refine your target rate after performance testing. Events/sec when Trigger is Incoming Event Rate. Settings would depend on

the size of your events, calculated with the MB/sec guidelines above in mind.

To figure out an optimal segment target rate (either MB/sec or events/sec), consider the needs of the Pravega writer and reader applications. For writers, you can start with a setting and watch latency metrics to make

adjustments. For readers, consider how fast an individual reader thread can process the

events in a single stream. If individual readers are slow and you need many of them to work concurrently, you want enough segments so that each reader can own a segment. In this case, you need to lower the segment target rate, basing it on the reader rate, and not on the capability of Pravega. Be aware that the actual rate in a segment may exceed the target rate by 50% in the worst case.

Scaling Factor Specifies how many colder segments to create when splitting a hot segment.

Scaling factor should be 2 in nearly all cases. The only exception would be if the event rate can increase 4 times or more in 10 minutes. In that case, a scaling factor of 4 might work better. A value higher than 2 should only be entered after performance testing shows problems.

Static Number of segments

Sets the number of segments for the stream. The number of segments used for processing the stream will not change over time, unless you edit this attribute. The value can be increased and decreased at any time.

We recommend starting with 1 segment and increasing only when the segment write rate is too high.

Retention Policy

The toggle button at the beginning of the Retention Policy section turns retention policy On or Off. It is Off by default.

Off (Default) The system retains stream data indefinitely. On The system discards data from the stream automatically, based on either time or size.

Retention Type Attribute Description

Retention Time Days The number of days to retain data. Stream data older than Days is discarded.

Retention Size MBytes The number of MBytes to retain. The remainder at the older end of the stream is discarded.

Manage cross project scope sharing

Administrators can grant a project read access to scopes in other projects.

Steps

1. Log in to the SDP UI as admin.

112 Manage Projects

2. Click the Pravega icon in the banner.

A list of scopes appears. Scope names are the same as project names.

3. Click the scope that needs read access added to (or removed from) other projects.

4. Click Manage Cross Project Access.

5. To grant read access to other projects:

a. In Grant READ access on ..., choose one or more project names from the drop-down list. Your selections appear in the text box.

b. Click Save.

6. To remove read access to one or all previously granted projects:

a. Navigate back to the Manage Cross Project Access page. b. In Grant READ access on ..., click the X next to a project name to remove it from the list. c. Click Save.

Next steps

These administrative actions complete the setup for read only access for cross project scope sharing.

Project applications must be configured to talk to a particular stream or set of streams in a shared scope. For more information, see the Dell EMC Streaming Data Platform Developer's Guide at https://dl.dell.com/content/docu103272.

Start and stop stream ingestion

Stream ingestion is controlled by native Pravega applications or Flink applications.

The SDP UI creates and deletes scope and stream entities, and monitors various aspects of streams. The UI does not control stream ingestion.

Monitor stream ingestion

You can monitor performance of stream ingestion and storage statistics using the Pravega stream page in the SDP UI.

Steps

1. Log into the SDP UI as a project member or admin.

2. Go to Pravega > > .

This page shows: Ingestion rates General stream parameter settings Segment heat charts show segments that are hotter or colder than the current trigger rate for segment splits. For

streams with a fixed scaling policy, the colors on the heat chart can indicate ingestion rates. The redder the segment is, the higher the ingestion rate.

The heat chart provides visualization of the data flow based on the routing key.

Manage Projects 113

Monitor Health

Topics:

Monitor licensing Temporarily disable SRS connectivity or telemetry uploads Monitor and manage events Run health-check Monitor Pravega health Monitor stream health Monitor Apache Flink clusters and applications Monitor Pravega Search resources and health Logging

Monitor licensing The SDP User Interface shows licensing status.

To view the status of your SDP licenses, log onto the UI and navigate to Settings > License.

Figure 8. Licensing information in the UI

NOTE: If you installed the product with an evaluation license, no licenses are listed.

The following table describes information on the License screen.

Section Field Name Description

Header Entitlement SWID The SDP product Software ID

Instance SWID The Secure Remote Services (SRS) activation Software ID

Body Name Two types of licenses are tracked within the SDP product license: Streaming Flink CoresTracks the number of virtual CPUs dedicated to the Flink

analytic engine. Streaming Platform CoresTracks the number of virtual CPUs dedicated to other

processing within the platform.

Type Licenses for SDP are subscription licenses.

Start Date Shows the date when the license was obtained.

13

114 Monitor Health

Section Field Name Description

End Date Shows the date when the subscription ends. On this date, you begin to receive warning events about an expired license. Contact Dell EMC to renew a subscription.

Grace Period Shows the date when the grace period ends. On this date, you begin to see only critical events collected on the events screen. The product does not shut down. Dell EMC contacts you about subscription renewal.

Quantity Shows the number of cores in your subscription

Usage This metric tracks usage on the cores in each category. In the Flink Cores category, usage thresholds may apply. If your usage rises above the threshold, you may be required to increase the number of cores in the subscription.

NOTE: The product does not shut down because of an expired subscription. However, if you upload an expired license or

alter the license file, the signature is invalidated and your product is no longer licensed.

NOTE: Be careful when performing network transfers of the license file with tools such as FTP. To avoid any signature

changes when using FTP, use the binary option.

Temporarily disable SRS connectivity or telemetry uploads The SDP UI provides a way to temporarily disable SRS connectivity to Dell EMC or disable telemetry upload. These actions can be convenient for maintenance activities.

To use this procedure, telemetry uploads must have been enabled during installation.

These UI actions are temporary. If the decks-install apply command is run for any reason to change any configurations, the settings in the values.yaml files override these actions on the SDP UI.

To view or change SRS status, log in to the SDP UI and go to Settings > SRS Gateway.

Figure 9. SRS Gateway information in the UI

Field Description

FQDN/IP The fully qualified domain name or IP of the SRS Gateway

Port The port that is configured for communication with the SRS Gateway

Instance SWID The Software ID of the SRS license

Product The product name that is licensed for connection with SRS

Registered Shows whether the Dell EMC backend systems have registered this SRS

Test Dial Home Results Shows the results of the last dial home test

Test Dial Home Time Shows the time of the last dial home test

Telemetry Upload Enables or disables telemetry uploads to the SRS Gateway. Click Telemetry Upload and then click the Enable or Disable action that appears. When you click Enable, a legal agreement appears. You must then click Accept Agreement to enable telemetry uploads.

Monitor Health 115

Field Description

Actions TestTest the dial home feature. Dial home connects the SRS Gateway at the customer site with the Dell EMC support systems and allows support teams to connect remotely.

DisableDisable connectivity to the SRS Gateway. The events continue to queue and are delivered when the feature is enabled.

EnableEnable SRS Gateway connectivity.

Monitor and manage events The SDP UI displays collected events and provides convenient features for managing events.

To show collected events, log in to the UI and go to System > Events. Events are messages that are collected from the Streaming Data Platform applications and their associated k8s resources.

Figure 10. Events list in the UI

Filtering messages by type

You can filter the events that appear in your view by type. Select a type in the Type dropdown box.

Acknowledging and managing events

You can mark an event as Acknowledged, which can help to separate events that you can safely ignore from events that need action. To acknowledge an event, click the Acknowledge button.

You can filter events by whether they are acknowledged by making a selection in the Acknowledged dropdown box.

By combining type and acknowledged filters, you can, for example, display only critical events that are unacknowledged.

Searching events by text strings

Use the Search Text text box to filter by a text search. The search operates on the Component, App Name, and Reason fields. Wildcards are not supported.

Run health-check This script checks the state of various components in the SDP cluster. It may be run at any time after SDP is installed.

Steps

1. Navigate to the folder where you unzipped the decks-installer- .zip file.

2. Run the script.

$ ./scripts/health-check.py

116 Monitor Health

The output looks similar to the following:

Starting health check... - Checking pod health - Checking pod health for namespace : nautilus-system - Checking pod/container state - All pods/containers seem healthy - Checking container restarts - No containers have high restart counts - Checking pod health for namespace : longevity-0 - Checking pod/container state - All pods/containers seem healthy - Checking container restarts - No containers have high restart counts - Checking pod health for namespace : catalog - Checking pod/container state - All pods/containers seem healthy - Checking container restarts - No containers have high restart counts - Checking pod health for namespace : nautilus-pravega - Checking pod/container state - All pods/containers seem healthy - Checking container restarts - No containers have high restart counts - Checking pravega cluster health - Pravega-cluster state is healthy - Check for failed helm deployments - No failed helm deployments were detected - Check Tier2 - Tier2 check succeeded

Monitor Pravega health Dashboards in the SDP UI and the Pravega Monitoring Plugin in Grafana provide information about Pravega operations.

Monitor for the following issues concerning Pravega:

Network issues Slow or stopped throughput without an obvious reason might indicate a network issue. You can monitor throughput on these dashboards: The first chart on the Dashboard screen on the SDP UI The Pravega Operation Dashboard on the Grafana UI

Adequate memory

The Pravega System Dashboard on the Grafana UI shows various memory-related metrics.

NOTE: Pravega InfluxDB only keeps data for a month. Data older than one month is not available.

Monitor stream health Monitor streams for the following issues.

Hot streams Segment heat charts show segments that are hotter or colder than the current trigger rate for segment splits.

For streams with a fixed scaling policy, the colors on the heat chart can indicate ingestion rates. Red segments indicate high ingestion rates.

To view the segment heat chart, on the SDP UI, click Pravega > scope-name > stream-name and then View Stream.

Unusual activity Monitor streams for unusual fluctuations on the following Grafana dashboards: The Pravega Stream Dashboard shows stream-specific metrics at a higher level than the heat

charts. The Pravega Scope Dashboard lets you compare streams in the same scope.

Pravega storage View metrics that identify problems with Pravega interacting with storage on the Grafana UI: Pravega Alerts Dashboard

Monitor Health 117

Pravega Operation Dashboard

Monitor Apache Flink clusters and applications Monitor the status and details of Apache Flink applications running in SDP.

Application events

To monitor applications, in the SDP UI, click Analytics > project-name > Dashboard. Events for all application in the project appear in the Message section below the dashboard tiles. Events include scheduled, started, savepointed, stopped, and canceled.

Job status Details about running Flink applications are available on the Apache Flink Web UI. SDP contains direct links to the Apache Flink Web UI in two locations: Click Analytics > project-name > Flink Clusters > cluster-name. The cluster name is a link to the

Apache Flink Web UI which opens in a new browser tab. It displays the Overview screen for the Flink cluster you clicked. From here, you can drill into status for all jobs and tasks.

Figure 11. Apache Flink Web UI Analytics > project-name > Apps > Flink > application-name. Each application name is a link to a

Flink Web UI page that shows the running Flink Jobs in that application. These pages also appear in a new browser tab.

Flink cluster health

For projects with Metrics enabled, you can monitor Flink cluster health with the help of Grafana dashboards available in the project metrics stack. 1. In the SDP UI, on the project page, click the Metrics link in the header. 2. On the Grafana main page that appears, click the dashboard for the Flink cluster you want to

investigate. There is a dashboard for each Flink cluster, each Spark application, and each Pravega Search cluster.

118 Monitor Health

Monitor Pravega Search resources and health Monitor the resource availability and other health metrics of a Pravega Search cluster.

Efficiency and resources

See the Dell EMC Streaming Data Platform Developer's Guide at https://dl.dell.com/content/ docu103272 for details about checking on the health of the cluster, allocating resources for efficiency, and scaling the cluster.

Pravega Search cluster health

For projects with Metrics enabled, you can monitor Pravega Search cluster health with the help of Grafana dashboards available in the project metrics stack. 1. In the SDP UI, on the project page, click the Metrics link in the header. 2. On the Grafana main page that appears, click the dashboard for the Pravega Search cluster.

Logging Administrators can access platform logs.

Pravega Errors and Warnings that are logged by Pravega are reported as metrics and available on the Pravega Alerts Dashboard in the Grafana UI.

Kubernetes logs SDP generates all the standard logs in native Kubernetes. Users with cluster-admin role on the SDP cluster can access these logs using native Kubernetes commands.

Monitor Health 119

Use Pravega Grafana Dashboards

Topics:

Grafana dashboards overview Connect to the Pravega Grafana UI Retention policy and time range Pravega System dashboard Pravega Operation Dashboard Pravega Scope dashboard Pravega Stream dashboard Pravega Segment Store Dashboard Pravega Controller Dashboard Pravega Alerts dashboard Custom queries and dashboards InfluxDB Data

Grafana dashboards overview The Grafana dashboards show metrics about the operation and efficiency of Pravega.

The Streaming Data Platform installer deploys the following metrics stack in the same Kubernetes namespace (nautilus- pravega) with Pravega.

InfluxDB is an open-source database product. Grafana is an open-source metrics visualization tool. The Pravega Monitoring Plugin is a custom plugin to Grafana and is not open source.

The InfluxDB instance that is deployed in SDP contains a preconfigured pravega database. The database is defined with four retention policies and a set of continuous queries to move aggregated data from shorter retention policies to the longer ones.

Pravega reports metrics automatically and continuously into InfluxDB. SDP adds processes to continuously aggregate and delete the metrics according to the defined retention policies. The result is a self-managing database.

The Pravega Monitoring Plugin contains predefined dashboards that visualize the collected Pravega metrics. The predefined dashboards are:

Dashboard Description

Pravega Alerts Monitors the health of Pravega in the cluster

Pravega System Dashboard Shows details about heap and non-heap memory, buffer memory, garbage collection memory, and threads

Pravega Operation Dashboard Shows various operational latencies and read/write throughputs

Pravega Scope Dashboard Shows scope total throughput rates, throughput by stream, and maximum per segment rates

Pravega Stream Dashboard Shows stream-specific throughput, segment metrics, and transaction metrics

Pravega Segment Store Dashboard Shows segment store operational metrics

Pravega Controller Dashboard Shows operational metrics for the Pravega Controller

You may create additional customized dashboards using any of the metrics that are stored in InfluxDB.

Some of the Pravega metrics are shown in the SDP UI, on the main Dashboard and the Pravega Stream pages. Administrators can inspect the reported data in more detail on the Grafana dashboards. Administrators can identify developing storage and

14

120 Use Pravega Grafana Dashboards

memory problems by monitoring the dashboards. The dashboards also help identify stream-related inefficiencies, and provide a way to drill into problems.

The dashboards are available only to users with admin role.

Connect to the Pravega Grafana UI The Grafana dashboards are available to SDP users with admin role.

Steps

1. Choose one of the following ways to access the Grafana dashboards: If you are already logged on to the SDP UI as an admin, click the Dashboard icon and then click Pravega Metrics on the

left in the bannder.

NOTE: The link appears only for admin users.

Use the Grafana endpoint URL in your browser. See Obtain connection URLs on page 74. On the login screen that appears, enter your SDP admin credentials.

The Grafana UI appears.

2. In the Tools strip on the left, click Dashboards > Manage. A list of predefined dashboards appears. The dashboard names are links.

3. Click a dashboard name to display that dashboard.

4. Most dashboards have controls, in the form of dropdown menus, that let you fine-tune the data to display.

For example, some dashboards have a Retention control that lets you choose the retention policy from which to pull the data.

Use Pravega Grafana Dashboards 121

Retention policy and time range On the Pravega dashboards, the time range and retention policy settings work together to define the data that is displayed.

Time range

The time range control is a standard Grafana feature. In any dashboard banner, click the clock icon on the right side of the banner to display the time range choices. Click a range to select it, or define your own absolute range.

Figure 12. Time range on Grafana dashboards

Retention

The retention control is specific to SDP. It selects the aggregation level of the data to display. The following table shows the internally defined retention policies and associated aggregation levels.

Retention policy Aggregation level Description

two_hour Original metrics reported by Pravega every 10 seconds

The original 10-second metrics are deleted after 2 hours.

Use with time ranges that are between 10 seconds and 2 hours. If you want to examine metrics older than 2 hours, use one of the other retention choices.

one-day 1-minute periods, aggregrated from the 10- second metrics.

The 1-minute aggregated metrics are deleted after 1 day.

122 Use Pravega Grafana Dashboards

Retention policy Aggregation level Description

Use with time ranges that are between 1 minute and 1 day.

one_week 30-minute periods, aggregated from the 1- minute metrics.

The 30-minute metrics are deleted after 1 week.

Use with time ranges that are between 30 minutes and 1 week.

one_month 3-hour periods, aggregated from the 30-minute metrics.

The 3-hour aggregated metrics are deleted after 1 month.

Use with time ranges that are between 3 hours and 1 month.

Interactions between time range and retention

Some time range and retention combinations may not show any data. If the time range specified is less than the aggregation period in the retention choice, the combination results in no data. As examples:

The two_hour retention choice shows data that exists in the data base for a maximum of two hours. A time range of Last 12 hours can only show data for the last two hours.

The one_week retention choice shows data in 30-minute periods. A time range of Last 5 minutes does not show any data. Any range of 30 minutes or less will not show any data. A time range of Last month can only slow data for the last week.

The one-month retention choice shows data in 3-hour periods. A time range of Last hour does not show any data. Any range of 3 hours or less does not show any data. A time range of Last year can only show data for the last month.

Pravega System dashboard The Pravega System Dashboard shows the JVM metrics for Pravega controllers and segment stores, one host container at a time.

Controls

host Choose the reporting container.

retention Choose a retention policy, which controls the aggregation periods of the displayed data.

Retention Aggregation period

two_hours 10 seconds

one_day 1 minute

one_week 30 minutes

one_month 3 hours

Be sure to choose a time range that is compatible with the retention choice. See Retention policy and time range on page 122 for more information.

Description

This dashboard has three sections that you can expand and contract using the dropdown arrows.

The Totals section shows the memory usage by the host JVM for heap and non-heap areas.

NOTE: Watch for Used or Committed memory approaching the Max memory. If this happens, you might need to tweak

the Pravega deployment parameters. Either increase the memory per container or increase the number of the component

replicas, as your K8s environment permits.

Use Pravega Grafana Dashboards 123

The GC section of the dashboard shows garbage collector metrics.

124 Use Pravega Grafana Dashboards

The Threads section shows thread counts and states.

Pravega Operation Dashboard The Pravega Operation Dashboard shows the operational metrics for multiple components that are involved in Pravega operations, including segmentstores and the Pravega Bookkeeper client.

Controls

host Choose a specific segmentstore or choose All.

retention Choose a retention policy, which controls the aggregation periods of the displayed data.

Retention Aggregation period

two_hour 10 seconds

one_day 1 minute

one_week 30 minutes

one_month 3 hours

Be sure to choose a time range that is compatible with the retention choice. See Retention policy and time range on page 122 for more information.

Use Pravega Grafana Dashboards 125

Description

This dashboard has six sections that you can expand and contract with dropdown arrows.

The Current Latency Stats section shows the current values of different levels of Read/Write latencies. These values are color-coded and turn red if their value goes above 50 millisecond.

NOTE: Monitoring for red values can help you catch problems.

The Throughput Rates section shows the total throughput rates for Tier 1 (Bookkeepers) and long-term storage. For more information about Pravega tiered storage, see this section in the Pravega documentation. This section includes both the user-created streams and the system streams needed for Pravega operation.

The Segmentstore - Segment Latencies section reports Tier 1 read/write latencies.

The latency graphs show percentile groups, as follows:

Legend indicator Percentile

p0.1 10% percentile

p0.5 50% percentile

p0.9 90% percentile

p0.99 99% percentile

p0.999 99.9% percentile

p0.9999 99.99% percentile

The Segmentstore - Storage Latencies section shows Read/Write latencies about long-term storage.

126 Use Pravega Grafana Dashboards

NOTE: Monitoring these metrics can provide hints about communication problems with long-term storage.

The Segmentstore - Container Latencies section shows metrics for Pravega segment container entities (not to be confused with the Docker containers running the segment stores). The following metrics are included:

Container Processors in Flight Distribution Container Operation Queue Size Distribution Container Batch Size Distribution Container Operation Commit Count Distribution

Container Operation Processor Delay Latency (grouped by Throttler) Container Queue Wait Time Latency Container Operation Commit Latency Container Operation Commit Memory Latency Container Operation Latency

The Segmentstore - Bookkeeper section contains Bookkeeper client metrics. The native Bookkeeper metrics are not available here.

Pravega Scope dashboard The Pravega Scope dashboard shows the total throughput rates and maximum per segment rates for user streams in a Pravega scope.

Controls

scope Choose the scope name that you want to see metrics for.

stream type Choose to show metrics for system streams, user-defined streams, or all streams.

retention The retention choice defines the aggregation level of the displayed data. The default retention is two- hours. It shows data in 10-second intervals.

Also choose a compatible time range.

See Retention policy and time range on page 122 for more information.

Description

This dashboard has 3 sections that you can expand and contract using the dropdown arrows. Write bytes Read bytes Write events

All three sections are organized in a similar way.

The panels on the left show individual throughput rates for each stream in the scope, plus a total for the scope.

NOTE: These charts show which streams have high load and which ones do not have any load.

The panels on the right show the write or read rate for the segment with the highest rate within the scope.

NOTE: If you see something alarming at the scope level, you can drill down into the problem on the Pravega Stream

dashboard.

Use Pravega Grafana Dashboards 127

Pravega Stream dashboard The Pravega Stream dashboard shows details about specific streams.

Controls

stream Choose a stream name within the selected scope. When the scope selection changes, the stream dropdown menu is repopulated with appropriate stream names.

stream type Choose to show metrics for system streams, user-defined streams, or all streams.

scope Choose a scope name.

retention Choose a retention policy, which controls the aggregation periods of the displayed data.

Retention Aggregation period

two_hour 10 seconds

one_day 1 minute

one_week 30 minutes

one_month 3 hours

128 Use Pravega Grafana Dashboards

Be sure to choose a time range that is compatible with the retention choice. See Retention policy and time range on page 122 for more information.

Description

This dashboard contains a row of metrics followed by five sections that you can expand and contract using the dropdown arrows.

The row of metrics at the top shows the latest values available for this stream in the chosen retention policy. For example, if you choose one_month retention, the values can be as old as three hours ago because the data points are aggregated only every three hours for that retention policy.

The Segments section shows the number of segments, segment splits, and segment merges over time.

NOTE: The Pravega controller reports these metrics. When no changes are happening, the controller does not report

metrics, and this could be reflected in the charts if there are no metrics reported during the time period selected. You can

always view the current metrics on the Stream page in the SDP UI. Those metrics are collected using the REST API rather

than relying on reported metrics from the controller. Another advantage of the SDP UI's Stream page is the heat charts for

stream segments. Those are not available in Grafana.

The following three sections appear next.

Write Bytes Read Bytes Write Events

These sections are all organized in the same way. The panels on the left show totals for the stream. The panels on the right show maximum segment rates.

NOTE: Inspecting the maximum per segment rate is complementary to using the heat charts in the SDP UI.

Use Pravega Grafana Dashboards 129

The Transactions section appears last. This section contains data only if the stream performs transactional writes.

NOTE: In the left panel, monitor the number of aborted transactions. Too many aborted transactions could indicate a

networking problem or a problem in the business logic of the Flink or Pravega application.

130 Use Pravega Grafana Dashboards

Pravega Segment Store Dashboard Use this dashboard to view segment store activity.

Controls

retention Choose a retention policy, which controls the aggregation periods of the displayed data.

Retention Aggregation period

two_hours 10 seconds

one_day 1 minute

one_week 30 minutes

one_month 3 hours

Be sure to choose a time range that is compatible with the retention choice. See Retention policy and time range on page 122 for more information.

Description

This dashboard has the following sections that you can expand and contract with dropdown arrows.

The Segment Store Cache Metrics section provides insight into space usage in the segment store cache.

The Segment Store Storage Writer Metrics section shows the activity of segment stores moving data to long-term storage.

The Segment Store Table Segments Metrics section shows the performance and rate of requests that are related to table segments. (The Controller uses table segments to store Pravega metadata.)

The Segment Store Container Metrics section shows segment activity, such as segment counts, segment creation, deletion, and merges, and log file size.

The Segment Store SLTS Metrics section.

Use Pravega Grafana Dashboards 131

Pravega Controller Dashboard Use this dashboard to view metrics about the Pravega Controller.

Controls

retention Choose a retention policy, which controls the aggregation periods of the displayed data.

Retention Aggregation period

two_hours 10 seconds

one_day 1 minute

one_week 30 minutes

one_month 3 hours

Be sure to choose a time range that is compatible with the retention choice. See Retention policy and time range on page 122 for more information.

Description

This dashboard has three sections that you can expand and contract with dropdown arrows.

The Controller Transaction Operations section shows counts for created transactions, commits, cancels, and open transactions.

The Controller Stream Operations section shows metrics for stream creation, stream seals, stream deletes, stream truncation, and stream retention.

The Controller Container Operations section shows metrics for segment containers within segment stores, including segment store instance failures.

Pravega Alerts dashboard The Pravega Alerts Dashboard reports critical and warning conditions from various checks on Pravega health.

Controls

retention Choose a retention policy, which controls the aggregation periods of the displayed data. NOTE: The retention control applies only to the Pravega Errors and Warnings charts. The Critical

Alerts and Warning Alerts charts always use the "two_hours" retention policy.

retention aggregation period

two_hours 10 seconds

one_day 1 minute

one_week 30 minutes

one_month 3 hours

Be sure to choose a time range that is compatible with the retention choice. See Retention policy and time range on page 122 for more information.

132 Use Pravega Grafana Dashboards

Description

This dashboard has three sections that you can expand and contract with dropdown arrows. The Pravega Logged Errors and Warnings section shows summary metrics about numbers of logged errors and warnings. The Critical Alerts section shows counts for specific critical messages that were issued over the time range. The Warning Alerts section shows counts for specific warnings that were issued over the time range.

Obtain condition descriptions for the alert charts

In the Critical Alerts and Warning Alerts sections, each chart includes an information icon in its upper left corner. Hover your cursor over the icon to view a description of the condition that the chart is monitoring.

Custom queries and dashboards You can create custom queries or new dashboards using any data in InfluxDB.

Queries

You can explore the Pravega metrics available in InfluxDB by creating ad hoc queries. This feature gives you a quick look at metrics without having to define an entire dashboard.

Click the Explore icon on the left panel of the Grafana UI. For datasource, choose pravega-influxdb.

Create your query against any measurement available in the database.

You cannot save the queries. Create a custom dashboard to save queries.

Custom Dashboards

You may create new, custom dashboards from any data available in the pravega-influxdb datasource. See the next section for an introduction to the metrics structure.

If you want to customize the predefined dashboards, Dell strongly recommends that you save the changes as custom dashboards, rather than overwriting the original ones. You are logged in as a Grafana Editor which enables you to edit and overwrite the dashboards.

Use Pravega Grafana Dashboards 133

NOTE: If you overwrite the original dashboards, your changes are lost if the Pravega Monitoring Plugin is updated in a

subsequent SDP release.

NOTE: There is one exception to the previous note. Dell recommends, as part of the installation or upgrade processes, to

save and overwrite the Pravega Alerts Dashboard (without any changes). This step is required to integrate the Pravega

alerts (if any) into the SDP events collection.

InfluxDB Data This section provides an overview of the metrics that are stored in InfluxDB.

Pravega metrics

Pravega metrics are stored in InfluxDB according to the naming conventions described in the MetricsNames.java file, with underscores ( _) replacing the periods (.). For example, segmentstore.segment.write_bytes is stored as segmentstore_segment_write_bytes. All metrics are tagged with their host, which is the Pravega pod reporting the metric. Some of the metrics are tagged with scope, stream, segment, or container (if applicable).

The original metrics from Pravega are prefixed with pravega_. Most of the metrics on the Grafana dashboards do not have that prefix because they represent an aggregation over the original Pravega metrics. For example, typical metrics in the dashboards are rates that are calculated on the originally reported counts.

For more information about Pravega metrics, see Pravega documentation.

Calculated rates

In addition to the original Pravega metrics, the database contains some precalculated rates to enable faster InfluxDB queries for certain inquiries.

Segment Read/Write rates are tagged with scope, stream, and segment. They are stored in the following measurements with the Rate field:

segmentstore_segment_read_bytes segmentstore_segment_write_bytes segmentstore_segment_write_events

Stream-level Read/Write rate aggregates are tagged with scope and stream and stored in the following:

segmentstore_stream_read_bytes segmentstore_stream_write_bytes segmentstore_stream_write_events

Global Read/Write rate aggregates over all segments, streams, and scopes are tagged with the segmentstore instance in the host tag. They are stored in the following:

segmentstore_global_read_bytes segmentstore_global_write_bytes segmentstore_global_write_events

Pravega long-term storage Read/Write rates are available as storage rates:

segmentstore_storage_read_bytes segmentstore_storage_write_bytes

Bookkeeper client write rate is stored here:

segmentstore_bookkeeper_write_bytes

134 Use Pravega Grafana Dashboards

Transactional rates are available at the stream level. They are tagged with scope and stream. They are reported only if transactional writes are happening on the stream.

controller_transactions_aborted controller_transactions_created controller_transactions_committed

There are also two gauges for transactions:

controller_transactions_opened controller_transactions_timedout

Use Pravega Grafana Dashboards 135

Troubleshooting

Topics:

View versions of system components Kubernetes resources Log files Useful troubleshooting commands FAQs Application connections when TLS is enabled Online and remote support

View versions of system components The SDP UI shows version numbers for the installed software components in the platform.

About this task

For troubleshooting and maintenance tasks, it is useful to know the SDP version and versions of software components.

Steps

1. Log in to the SDP UI as an admin.

2. To view the SDP version, click System > Product.

3. To view versions of components, click System > Components.

The Version column shows the installed version for each service, broker, and other software components.

Kubernetes resources This section describes the Kubernetes resources in an SDP cluster.

Namespaces

The SDP cluster contains the following namespaces .

catalog Contains the service catalog for the cluster.

cluster- monitoring

Contains services that monitor Kubernetes cluster components and SDP nodes. Sends alerts through the KAHM monitoring service to Grafana dashboards.

nautilus-system Contains SDP software.

nautilus-pravega Contains the Pravega store and Pravega software.

Project-specific namespaces

Each user-created project has its own namespace. The namespace name is the project name.

Kubernetes- specific namespaces

There are many other namespaces specific to the Kubernetes environment.

15

136 Troubleshooting

Components in the nautilus-system namespace

The nautilus-system namespace contains components to support SDP functions.

Components in nautilus-system

Subsystem Name Description

Core SDP Operator

Cert Manager Provisions and manages TLS certificates.

External DNS Dynamically registers DNS names for platform services and ingress connections.

Metrics Operator Manages InfluxDB and Grafana metrics stack.

Nautilus UI Provides the web UI for managing the platform.

NFS Client Provisioner or

ECS service broker

Provisions persistent volumes within the configured NFS server.

Provisions ECS storage buckets.

NGINX Ingress Ingress controller and load balancer

Zookeeper Zookeeper-operator Manages the Pravega Zookeeper cluster and all the Zookeeper clusters for all projects.

Security Keycloak Provides identity and access management for applications and services.

Keycloak-webhook Injects Keycloak credentials into relevant pods.

Keycloak-postgresql Handles Keycloak roles and Keycloak clients.

Flink services Flink-operator Manages Flink clusters and Flink applications.

Project-operator Manages analytic projects.

Spark services spark-operator Runs the Spark engine in the cluster.

Serviceability DECKS Manages SRS registration, call-home, and licensing.

KAHM Provides event and health management services.

Monitoring Provides monitoring of resource usage.

PSearch psearch-operator Creates PSearch resources as needed.

CRs in nautilus-system

The nautilus-system namespace defines the following custom resources (CRs). Their operators are included in the list of components above. ProjectSystem ZookeeperCluster Project FlinkCluster FlinkApplication FlinkClusterImage InfluxDB Grafana Telegraf InfluxDBDatabase GrafanaDashboard GrafanaDashboardTemplate SparkApplication

Troubleshooting 137

PravegaSearch FlinkSavepoint

Components in the nautilus-pravega namespace

The nautilus-pravega namespace contains components to support Pravega functions within the SDP platform.

Components in nautilus-pravega

Component name Description

pravega-operator Manages Pravega clusters

pravega-cluster Pravega software

pravega-service-broker Provisions Pravega scopes

bookkeeper-operator Manages the bookkeeper resource

bookkeeper-cluster Manages the bookies

schema-registry Manages the Pravega schema registry

schema registry pods

Pravega InfluxDB pod

Pravega Grafana pod

Pravega Grafana gatekeeper pod

Custom resources in nautilus-pravega

PravegaCluster BookkeeperCluster

Components in project namespaces

Each analytic project has a dedicated Kubernetes namespace.

A project's namespace name is the project name. For example, a project that you create with the name test has a namespace name of test.

For additional information about project namespaces, see Manage projects on page 104.

Components in cluster-monitoring namespace

The cluster-monitoring namespace supports SDP monitoring of node health and the Kubernetes cluster health.

Components in cluster-monitoring

Component name Description

cluster-monitoring Monitors the following components: K8s deployments missing replicas K8s statefulset missing replicas K8s daemons missing replicas K8s nodes disk health

138 Troubleshooting

Component name Description

K8s nodes CPU/mem and file system resources

It sends alerts through KAHM to the SRS backend.

Components in the catalog namespace

The catalog manages service instances and service bindings. It manages instances for resources such as keycloak client, keycloak role, a pravega scope, and so on. Service bindings provide access (including the URL and credentials) to those resources.

Components in catalog

Component name Description

service-catalog Manages service instances and service bindings.

CRs in catalog

clusterservicebrokers clusterserviceclasses clusterserviceplans servicebindings servicebrokers serviceclasses serviceinstances serviceplans

Log files This section describes how to obtain useful logs.

Get installation logs

To track installation progress, you can monitor the installation logs. From the installation folder, look at decks- install.logs.

Get pod logs for a namespace

List pod names:

kubectl get pods --all-namespaces

Get information about a pod:

kubectl describe pod -n

For example:

kubectl describe pod keycloak-service-broker-797849c678-52pnl -n nautilus-system

Troubleshooting 139

Get the logs for a pod in a namespace:

kubectl logs -n

Useful troubleshooting commands This section introduces CLI commands that can help you get started with researching problems in an SDP deployment.

OpenShift client commands on page 140 helm commands on page 140 kubectl commands on page 140

OpenShift client commands

Use OpenShift client (oc) commands to manage the environment in which your SDP cluster exists.

oc --help | grep cluster

List commands related to cluster information.

oc cluster-info Get information about the cluster.

oc adm ... Tools for managing a cluster

oc run ... Run a specified image on the cluster.

helm commands

Use helm commands to manage the Kubernetes packages that are installed in your cluster. You can also check the current helm version number.

The following helm commands are useful when getting started with troubleshooting SDP deployments. For descriptions of all helm commands and their syntax, see https://helm.sh/docs/.

helm version Shows the client and server versions for Helm and Tiller.

helm ls --all Lists all the releases that are installed in the cluster.

helm ls --all --short

Generates an abbreviated output of the above command.

kubectl commands

Use kubectl commands to investigate Kubernetes resources in the cluster.

The following kubectl commands and flags are useful for troubleshooting SDP deployments. For descriptions of all kubectl commands and their syntax, see https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands.

Common flags

The following flags apply to many commands.

--all- namespaces

Applies a command, such as kubectl get, to all namespaces rather than to a named namespace.

-o yaml Outputs a yaml formatted API object.

140 Troubleshooting

Useful commands

kubectl config use- context name>

Switch your current command-line context to the named cluster. Use this command when you are logged into two clusters in the same session.

kubectl cluster-info

Display addresses of the master and services with the label kubernetes.io/cluster- service=true.

kubectl api- resources ...

Print supported API resources. Some useful flags in this command are: --verbs=[verbs] to limit the output to resources that support the specified verbs.

--namespaced={true | false} to include or exclude namespaced resources. If false, only non-namespaced resources are returned.

-o {wide | name} to indicate an output format .

The following example displays all resources that support the kubectl list command. It includes namespaced resources and shows the output in the shortened name format.

kubectl api-resources --verbs=list --namespaced=true -o name

kubectl get ...

List resources of the specified resource type. Some useful kubectl get commands for SDP are:

kubectl get pods kubectl get pods --all-namespaces kubectl get services kubectl get deployments kubectl get deployment kubectl get nodes kubectl get events kubectl get storageclass kubectl get serviceaccounts

kubectl describe ...

Show details of a specific resource or group of resources.

kubectl logs ...

Display logs for a container in a pod or specified resource. If the pod has only one container, the container name is optional.

For example:

kubectl logs -n

kubectl exec ...

Run a command in a container.

kubectl attach ...

Attach to a process running inside a container. For example, you might want to get output from the process.

kubectl run ...

Create a deployment or a job by running a specified image.

Troubleshooting 141

FAQs These frequently asked questions (FAQs) include common installation conditions and operational observations.

My installation does not have all the components installed.

If you invoked the installer with the decks-install apply command, the decks-install sync command is safe to use to resume an existing installation. The command tries to install the remaining components. You can run the decks-install sync command more than once.

Run decks- status. Results should look similar to the following:

[core@service-startline rc3]$ ./decks-install-linux-amd64 status APPLICATION STATE VERSION MANAGED BY bookkeeper-cluster Succeeded 0.7.1-54-29d5ca9 nautilus bookkeeper-operator Succeeded 0.1.3-54-29d5ca9 nautilus catalog Succeeded 0.3.1 nautilus cert-manager Succeeded v0.15.2 nautilus cert-manager-resources Succeeded 1.2-RC3-0-f4a0565c8 nautilus cluster-scaling Succeeded 1.1.0-HF1.RC2-2-3d4a82c nautilus decks Succeeded 1.2.5 nautilus default-runtime-images Succeeded 1.2-RC3-0-f4a0565c8 nautilus dellemc-streamingdata-license Succeeded 1.2.5 nautilus ecs-service-broker Succeeded 1.2-RC3-0-f4a0565c8 nautilus external-dns Succeeded 3.2.4 nautilus external-dns-resources Succeeded 1.2-RC3-0-f4a0565c8 nautilus flink-default-resources Succeeded 1.2-RC3-0-f4a0565c8 nautilus flink-operator Succeeded 1.2-RC1-3-050c9ff00 nautilus kahm Succeeded 1.2.5 nautilus keycloak Succeeded 1.2-W25-1-fcc4562 nautilus keycloak-injection-hook Succeeded 1.2-W14-2-addc3b3 nautilus keycloak-service-broker Succeeded 1.2-W20-4-23bb482 nautilus metrics-operator Succeeded 1.2-RC1-1-70b9d2f nautilus monitoring Succeeded 1.2-W16-4-c899548 nautilus nautilus-ui Succeeded 1.2-RC1-7-0cb1f6b0 nautilus nginx-ingress Succeeded 1.40.3 nautilus openshift-sdp-resources Succeeded 1.2-RC3-0-f4a0565c8 nautilus pravega-cluster Succeeded 1.2-RC3-0-f4a0565c8 nautilus pravega-operator Succeeded 0.5.2-221-5dce68d3 nautilus pravega-service-broker Succeeded 1.2-RC2-2-28c8432 nautilus project-operator Succeeded 1.2-RC1-2-58878d7 nautilus psearch-cluster Succeeded latest nautilus psearch-operator Succeeded 1.2-W26-4-75f8e393 nautilus psearch-resources Succeeded 1.2-RC3-0-f4a0565c8 nautilus schema-registry Succeeded 0.0.1-61-f1b6734 nautilus sdp-operator Succeeded 1.2-W26-2-31ac657 nautilus spark-operator Succeeded 1.2-RC1-4-b0500de nautilus srs-gateway Succeeded 1.2.5 nautilus zookeeper-cluster Succeeded 1.2-RC3-0-f4a0565c8 nautilus zookeeper-operator Succeeded 0.2.9-153-e100c87 nautilus [core@service-startline rc3]$

My uninstall failed. I still see components in helm list.

If the decks-install unapply command fails, try it again. If the rerun does not solve your problem, you can manually remove the charts using helm del --purge --no-hooks chart name. Then retry the decks-install unapply command. This command is necessary in the uninstall process because it deregisters the custom resource definitions that are used with the product. .

My pods are showing that they cannot pull the images.

For sites that do not have access to image repositories and registries, the installer comes with a set of tar ball of docker images. You can register those images in your local registry where the pods can pull them successfully.

142 Troubleshooting

My pods are showing as running but the user interface displays errors.

The running status indicates the readiness of the pod. The status does not indicate anything about errors concerning the applications or services running within the pod. To research errors, check pod logs with the kubectl logs command. The logs include a timestamp that you can correlate with the logs in the other pods to sequence together the chain of actions.

Is there a way to see events for a specific project or application?

Yes. All projects have their corresponding Kubernetes namespace. You can use kubectl get events --namespace to get events only from the namespace that corresponds to a project. Also, the user interface lists system events and logs that correspond to a single application.

My logs show intermittent failures but my pods are all healthy.

Check to see that your applications and services are reachable by both names and IP addresses. This check holds true for ingress, proxy, load-balancer and all the pods. DNS records, text entries, and registrations must be accurate. The DNS provider may have listings of entries that were created during installation. Check with your system administrator or cloud services provider. You can also ping the pods and connect to the services from the containers to ensure that they are reachable within the cluster.

My pods complain that the volume mounts failed.

Delete the pod. Deleted pods come back, and volume mounts are refreshed.

My Keycloak service broker complains that Keycloak is not available.

Most likely, Keycloak is running, but is not resolving by name. Check to see that the Keycloak endpoint is accessible from the keycloak-service-broker. Otherwise, uninstall and reinstall the product.

My user interface does not install. I get a 503 service unavailable or 404 default backend error.

The User Interface is not installed properly. You can use the helm chart to delete the User Interface and install it again. Otherwise, uninstall and reinstall the product . A 404 error in the User Interface implies that the ingress named nautilus-ui in the nautilus-system namespace is not set up properly. When you uninstall and reinstall the product, the ingress is set up automatically.

My ingress and services do not have IP addresses.

This condition may occur if the public IP pool is exhausted. For example, the NSX-T environment defines a public IP address pool. To check for this issue, run the following command, and look for the value pending in the public IP column.

kubectl get svc -n nautilus-system

A Network Administrator may be able to resolve this issue.

My DNS records are not showing, or the installer is not adding these records.

1. The external-dns credentials for the DNS provider may be incorrect, or you may have exceeded the rate limit. For example, with a Route53 DNS service provider, there is a rate limit of 5 requests per second per account. To research the issue, check the logs for the external-dns pod.

Troubleshooting 143

2. Ensure that unique values are used for txtOwnerId and domainFilters in the external-dns configuration. If the same values are used across clusters and policy is set to sync, a new cluster could overwrite all entries with the same txtOwnerId.

yaml txtOwnerId: " . " ## Modify how DNS records are sychronized between sources and providers (options: sync, upsert-only ) # if sync policy is used Please make sure the txtOwnerId has to be unique to the cluster, using . ensures uniqueness policy: sync domainFilters: [ . ]

My DNS records are showing, but the DNS records are not propagated.

Run nslookup keycloak. . If it resolves, see if it is resolving from the pod network as well. Start the dnstools pod using this command: kubectl run -it --rm --image infoblox/dnstools dnstools. From the dnstools pod spawned above, run nslookup keycloak. . If it is not resolving from the dnstools pod, contact an administrator to look into the network configuration of the cluster.

The cert-manager does not issue the certificates.

If the certificate issuer is Let's Encrypt, and you see an entry for Keycloak in the ingress with kubectl get ingress -n nautilus-system, then check if the ingress has a certificate. If the certificate is issued, the output of keycloak-tls certificate issue should be as follows:

kubectl get secret keycloak-tls -n nautilus-system -o jsonpath="{.data.tls\.crt}" | base64 --decode | openssl x509 -text|grep Issuer Issuer: C=US, O=Let's Encrypt, CN=Let's Encrypt Authority X3 CA Issuers - URI:http://cert.int-x3.letsencrypt.org/

If the output looks like the following instead, then check the cert-manager pod logs for error messages. A limit may exist, such as 50 certificates per week per domain.

Issuer: O=cert-manager, CN=cert-manager.local

If you are frequently installing and uninstalling the product, remember to reuse the hostname. Certificate reissues do not count towards the certificate limit. If the certificate was reissued already several times, and you see a message like the following, then use another domain that has not reached the certificate limit.

urn:ietf:params:acme:error:rateLimited: Error creating new order :: too many certificates already issued for exact set of domains: .com: see https://letsencrypt.org/docs/rate-limits/" "key"="nautilus-system/nautilus-ui-tls-2473631805"

If you are using self-signed certificates, you would see the following message instead, and you can use the same steps as above.

kubectl get secret cluster-wildcard-tls-secret -n nautilus-system -o jsonpath="{.data.tls\.crt}" | base64 --decode | openssl x509 -text|grep Issuer Issuer: CN = Self-Signed CA

Is Keycloak ready with TLS?

Check whether you can connect to Keycloak:

kubectl get ingress keycloak nautilus-system openssl s_client -showcerts -servername -connect :443

Then, check that the certificate in the secret is the same as the certificate that Keycloak returns:

kubectl get secret cluster-wildcard-tls-secret -n nautilus-system -o jsonpath="{.data.tls\.crt}" | base64 --decode | openssl s_client

144 Troubleshooting

-showcerts -servername keycloak.cluster1.desdp.dell.com -connect 10.247.114.101:443

My pods are in the containerCreating state.

Check whether the issue is local to the pod. For example, the following message indicates that nautilus-ui is waiting for its secrets. It waits until the keycloak-service-broker starts servicing the service instance requests.

MountVolume.SetUp failed for volume "nautilus-ui" : secrets "nautilus-ui" not found

Check the keycloak-service-broker logs. Eventually after timeout, the keycloak-service-broker starts servicing and creates the secrets.

The cert-manager is not renewing certificates when the LetsEncrypt provider is used.

In certain cases, when cert-manager cannot connect to LetsEncrypt, certificate orders to LetsEncrypt get stuck in the pending state indefinitely. If you see that expired certificates are not renewing, delete the orders that are in a pending state. The cert-manager then creates new orders automatically and completes the certificate renewal process.

kubectl get orders -n nautilus-system kubectl delete order -n nautilus-system

A reader job fails on a Flink cluster.

When you see that a reader job is not progressing (no longer reading), check whether it is stuck on UnknownHostException. Some of the symptoms you may see are:

1. Flink Dashboard shows jobs in a Restarting status instead of Running. 2. Flink job manager continuously throws UnkownHostException with message "Temporary failure in name

resolution".

When the job manager is in this state, it does not recover from exceptions even if you fix the DNS issues that caused the resolution errors.

Here is a workaround example:

kubectl get sts -n longevity-0|grep jobmanager longevity-0-jobmanager 1/1 8d kubectl scale sts longevity-0-jobmanager -n longevity-0 --replicas=0 kubectl scale sts longevity-0-jobmanager -n longevity-0 --replicas=1

After using the workaround, check the logs and the Flink dashboard to see if jobs have a status of Running. Here is an example kubectl command that checks the logs:

kubectl logs -f longevity-0-jobmanager -n longevity-0 -c server

Application connections when TLS is enabled This section describes TLS-related connection information in Pravega and Flink applications.

When TLS is enabled in SDP, Pravega applications must use a TLS endpoint to access the Pravega datastore. The URI used in the Pravega client application, in the ClientConfig class, must start with:

tls://:443

Troubleshooting 145

If the URI starts with tcp://, the application fails with a javax.net.ssl.SSLHandshakeException error.

To obtain the Pravega ingress endpoint, run the following command.

kubectl get ing pravega-controller -n nautilus-pravega

The HOST column in the output shows the Pravega endpoint.

Online and remote support

The Dell Technologies Secure Remote Services (SRS) and call home features are available for SDP. These features require an SRS Gateway server configured on-site to monitor the platform. The SDP installation process configures the connection to the SRS Gateway.

Detected problems are forwarded to Dell Technologies as actionable alerts, and support teams can remotely connect to the platform to help with troubleshooting.

Online Support:

https://www.dell.com/support

Telephone Support:

United States: 800-782-4362 (800-SVC-4EMC)

Canada: 800-543-4782

Worldwide: +1-508-497-7901

146 Troubleshooting

Reference Information

Topics:

Configuration Values File Reference Summary of Scripts Installer command reference

IV

Reference Information 147

Configuration Values File Reference

Topics:

Template of configuration values file

Template of configuration values file The following template shows all configuration attributes for SDP.

global: ## Type of External storage, valid values are nfs or ecs_s3 ## If `nfs` then a "nfs-client-provisioner" section is required ## If `ecs_s3` then a "pravega-cluster" section is required #storageType: nfs

## Type of Bookkeeper deployment, valid value is k8s ## If `k8s` then a BookkeeperCluster resource is created #bookkeeperDeployment: "k8s"

## Whether to install the cluster in dev mode. Note this should not be used for production deployments. ## Among other things, this is going to enable the SOCKS5 proxy. #devel: false

#distribution: openshift # or omitted completely #pravegaSearchEnabled: true | false #whether to deploy PravegaSearch or not #srsNotifier: "streamingdata-srs" #skipNFSClientProvisioner: true | false #skip installing the nfs-client-provisioner (in case it already exists) #skipNginxIngress: true | false #skip installing the nginx-ingress (in case it already exists)

external: ## The full TLD to use for external connectivity. A blank string means no external connectivity. ## If you change domain here, change domain in domainFilters of external-dns config below ## and hostedZoneID in cert-manager-resources #host: " . " ## The name of your Kubernetes context. This is a required value. ## This can be found using `kubectl config current-context` #clusterName: " " ## Whether to enable TLS or not. ## If `true` then TLS is going to be enabled ## If `false` then TLS is going to be disabled #tls: false ## Whether the cluster is installed in a dark-site environment (i.e. no or partial external connectivity) ## Among other things, this is going to disable the SRS Gateway which requires external access. #darksite: false # ingress: # annotations: # kubernetes.io/ingress.class: nginx # kubernetes.io/tls-acme: "true" ## Custom CA trust certificates in PEM format. These certificates are injected into certain Java components.

16

148 Configuration Values File Reference

## The main use for them at the moment is when communicating with an ECS Object endpoint that uses custom trust, i.e. Self Signed Certificates # tlsCABundle: # ecsObject: |- # -----BEGIN CERTIFICATE----- # MIIDnDCCAoSgAwIBAgIJAOlxdGLbM3vBMA0GCSqGSIb3DQEBCwUAMBYxFDASBgNV # BAMTC0RhdGFTZXJ2aWNlMB4XDTIwMDIxOTE5MzMzNVoXDTMwMDIxNjE5MzMzNVow # ...

catalog: # image:

zookeeper-operator: # image: # repository: # tag:

metrics-operator: # image: # repository: # tag: # "srsNotifier": "streamingdata-srs"

nautilus-ui: # image: # repository: # tag: # zookeeperPerProject: 3 #exposed for small env developement. LEAVE AS DEFAULT FOR PROD.

nginx-ingress: # controller: # image: # repository: # tag: # config: # proxy-buffer-size: "128k" # proxy-buffers: "4 256k" # fastcgi-buffers: "16 16k" # fastcgi-buffer-size: "32k" # max-worker-open-files: "16384" # podAnnotations: # ncp/ingress-controller: true # resources: # limits: # cpu: 1 # memory: 1Gi # requests: # cpu: 500m # memory: 500Mi # defaultBackend: # image: # repository # tag:

keycloak: # keycloak: # image: # repository: # tag: ## Keycloak admin password and UI admin password cam be set at deployment time (not recommended for prod) ## If not set up, the keycloak chart would generate a random one ## (it outputs a kubectl command to retrieve the secret) #password: "..." #DESDPPassword: "..."

keycloak-injection-hook: # image: # repository: # tag:

Configuration Values File Reference 149

keycloak-service-broker: # image: # repository: # tag:

## To be used if storageType is nfs nfs-client-provisioner: # image: # repository: # tag: # nfs: # server: # path: # mountOptions: # - nfsvers=4.0 # - sec=sys # - nolock # storageClass: # archiveOnDelete: "false"

project-operator: # image: # repository: # tag: # mavenImage: # repository: # tag: # zkImage: # repository: # tag:

flink-operator: # image: # repository: # tag: # flinkImage: # repository: # tag_1_7_2:

spark-operator: # image: # repository: # tag: # sparkImage: # repository: # tag_1_7_2:

bookkeeper-operator: # image: # repository: # tag: # testmode: # enabled: true | false # whether to enable test mode without minimum replicas check and with custom versions # version: "" | "0.8.0"

pravega-operator: # image: # repository: # tag: # testmode: # enabled: true | false # whether to enable test mode without minimum replicas check and with custom versions # version: "" | "0.8.0"

pravega-service-broker: # image: # repository: # tag: # storage: # className: "" # size: 5Gi

150 Configuration Values File Reference

zookeeper-cluster: # zookeeper_cluster: # replicas: 3 # image: # repository: # tag: # storage: # volumeSize: 20Gi # className: # domainName: ""

## to be used with "global.bookkeeperDeployment: k8s" bookkeeper-cluster: # image: # repository: # version: # replicas: 6 # zookeeperUri: zookeeper-client:2181 # blockOwnerDeletion: true # pravegaClusterName: nautilus

# probes: {} # readiness: # initialDelaySeconds: 20 # periodSeconds: 10 # failureThreshold: 9 # successThreshold: 1 # timeoutSeconds: 5 # liveness: # initialDelaySeconds: 60 # periodSeconds: 15 # failureThreshold: 4 # successThreshold: 1 # timeoutSeconds: 5

# storage: # ledger: # volumeSize: 250Gi # className: # journal: # volumeSize: 250Gi # className: # index: # volumeSize: 10Gi # className:

## Overridable bookkeeper options # options: # useHostNameAsBookieID: "true" # minorCompactionThreshold: "0.4" # minorCompactionInterval: "1800" # majorCompactionThreshold: "0.8" # majorCompactionInterval: "43200" # isForceGCAllowWhenNoSpace: "true" # journalDirectories: "/bk/journal/j0,/bk/journal/j1,/bk/journal/j2,/bk/journal/j3" # ledgerDirectories: "/bk/ledgers/l0,/bk/ledgers/l1,/bk/ledgers/l2,/bk/ledgers/l3" # ledgerStorageClass: "org.apache.bookkeeper.bookie.InterleavedLedgerStorage" # flushEntrylogBytes: "134217728" # enableStatistics: "false"

# jvmOptions: # memoryOpts: # - "-Xms2g" # - "-XX:MaxDirectMemorySize=8g"

# resources: # limits: # cpu: 8000m # memory: 16Gi # requests: # cpu: 2000m # memory: 4Gi

Configuration Values File Reference 151

pravega-cluster: # pravega_debugLogging: false # pravega_version: 0.5.0-2269.6f8a820-0.9.0-019.007be9f # credentialsAndAcls: base64 encoded password file: https://github.com/pravega/pravega- tools/blob/2c2dcb327a289f1f861deb96e23c2bf29e6b7f6c/pravega-cli/src/main/java/io/pravega/ tools/pravegacli/commands/admin/PasswordFileCreatorCommand.java # pravega_security pravega.client.auth.token & credentialsAndAcls are coupled, if one changes, the other must

## Below are the overridable settings that are being passed ## to "pravega_options" block of pravega deployment ## (controllers & segment stores) pravega_container_count: 48 ## expect it to be 8 containers per segmentstore if not reducing memory

segment_store: #cacheMaxSize: "11811160064" #cacheMaxTimeSeconds: "600" #storageLayout: "ROLLING_STORAGE" | "CHUNKED_STORAGE"

bk_client: # bkEnsembleSize: "3" # bkWriteQuorumSize: "3" # bkAckQuorumSize: "3" # bkWriteTimeoutMillis: "60000" # maxOutstandingBytes: "33554432"

write_size: # blockSize: "67108864" # flushThresholdBytes: "67108864" # maxFlushSizeBytes: "67108864"

controller: # retention_bucketCount: "10" # service_asyncTaskPoolSize: "20" # retention_threadCount: "4"

## any extra pravega options that are not set via above overrides pravega_options: # log.level: "DEBUG"

segment_store_jvm_options: # - "-XX:MaxDirectMemorySize=8g"

pravega_storage: # tier2: # size: 250Gi # class_name: "nfs" # cache: # size: 100Gi # class_name:

pravega_replicas: # controller: 1 # segment_store: 3

pravega_resources: controller: # limits: # cpu: 500m # memory: 1Gi # requests: # cpu: 250m # memory: 512Mi segment_store: # limits: # cpu: "1" # memory: 2Gi # requests: # cpu: 500m # memory: 1Gi

152 Configuration Values File Reference

pravega_security: # TOKEN_SIGNING_KEY: "..." # pravega.client.auth.method: "Basic" # pravega.client.auth.token: "..." #note, if this changes credentialsAndAcls needs to change # autoScale.security.auth.token.signingKey.basis: "..." # AUTHORIZATION_ENABLED: "true" # autoScale.controller.connect.security.auth.enable: "true"

pravega_externalAccess: # enabled: true # type: LoadBalancer | NodePort

# More details on service types # controllerExtServiceType: ClusterIP # controllerSvcAnnotations: {} # segmentStoreExtServiceType: # segmentStoreSvcAnnotations: {} # segmentStoreLoadBalancerIP: # segmentStoreExternalTrafficPolicy: Local

pravega_image: # repository:

grafana_image: # repository: # tag:

influxdb_image: # repository: # tag:

metrics_cluster_storage: # className: "" # influxdbSize: 10Gi # grafanaSize: 1Gi

grafana_notifiers: # kahm: # namespace: nautilus-system # username: # password:

external-dns-resources: # externalDNSSecrets: # - name: # value: | # { # .... # }

external-dns: ## see https://github.com/helm/charts/blob/8ab12e10303710ea3ad9d771acdd69d7658b7f47/ stable/external-dns/values.yaml

cert-manager-resources: # certManagerSecrets: # - name: # value: | # { # .... # }

# clusterIssuer: # name: # server: # email: # acmeSecretKey: # solvers: # ## you can specify multiple solvers using labels and selectors. see: # ## https://docs.cert-manager.io/en/latest/tasks/issuers/setup-acme/index.html # - dns01:

Configuration Values File Reference 153

# clouddns: # serviceAccountSecretRef: # name: # key: # project: # - dns01: # route53: # # hosted zone id taken from route53 Hosted Zone Details # hostedZoneID: # region: # accessKeyID: # secretAccessKeySecretRef: # name: #TODO need to put this keey above in certManagerSecrets # key: #TODO need to put this keey above in certManagerSecrets

cert-manager: ## see https://github.com/jetstack/cert-manager/blob/v0.8.0/deploy/charts/cert-manager/ values.yaml

## Serviceability:

decks: # storageClassName: "" # see https://github.com/EMCECS/charts/tree/master/decks

srs-gateway: ## see https://github.com/EMCECS/charts/tree/master/srs-gateway

kahm: # storageClassName: "" # see https://github.com/EMCECS/charts/tree/master/kahm dellemc-streamingdata-license: ## see https://github.com/EMCECS/charts/tree/master/dellemc-license monitoring: # image: # repository: devops-repo.isus.emc.com:8116/nautilus/monitoring # tag: latest # pullPolicy: Always

# license: # name: dellemc-streamingdata-license # namespace: nautilus-system

# schedule: "*/10 * * * *"

# storage: # storageClassName: ""

# subjects: # - name: Streaming Flink Cores # code: STRM_FLINK_CORES # uom_code: ZC # uom_name: Individual CPU Cores # niceName: Flink # selectors: # - component=taskmanager # - component=jobmanager # - name: Streaming Platform Cores # code: STRM_CORES # uom_code: ZC # uom_name: Individual CPU Cores # niceName: Platform # namespaces: # - nautilus-system # - nautilus-pravega

## To be used if storageType is ecs_s3 ecs-service-broker: # namespace: abc # replicationGroup: RG # # api: # endpoint: "http://192.2.3.4:9020"

154 Configuration Values File Reference

# # ecsConnection: # endpoint: "https://192.2.3.4:4443" # username: abc-sdp # password: ChangeMe # certificate: |- # -----BEGIN CERTIFICATE----- # MIIDCTCCAfGgAwIBAgIJAJ1g36y+tM0RMA0GCSqGSIb3DQEBCwUAMBQxEjAQBgNV # BAMTCWxvY2FsaG9zdDAeFw0yMDAyMTkxOTMzMjVaFw0zMDAyMTYxOTMzMjVaMBQx # ... # -----END CERTIFICATE----- # ## Custom plans for Bucket creation. Plan named "default" overrides standard default SDP bucket plan # s3Plans: # ## Unique UUID is required # - id: 9e777d49-0a78-4cf4-810a-b5f5173b019d # name: small # settings: # quota: # limit: 5 # warn: 4

Configuration Values File Reference 155

Summary of Scripts

Topics:

Summary of scripts

Summary of scripts The following scripts are included with SDP.

The scripts are in extracted contents of the decks-installer- .zip file, under the /scripts folder. There are software requirements for your local machine for most scripts. See Prepare the working environment on page 68.

health-check.py

This script may be run at any time after SDP is installed. It checks the state of various components in the SDP cluster and generates a summary as output.

See Run health-check on page 116.

post-install.sh

Run this script after running the decks-install apply command. This script confirms that your latest run of the decks- install apply command left the cluster in a healthy state. This script invokes the health check script. You may run this script at any time.

See Run the post-install script on page 72. Also see Change applied configuration on page 83.

post-upgrade.sh

Run this script after upgrading the SDP with a new distribution of manifests and charts. It confirms that the cluster was upgraded properly and is healthy. It runs the health checks.

prereqs.sh

This script ensures that your environment is ready for installation by verifying the following:

Checks your local environment for the required tools and versions of those tools Checks the SDP cluster for a default storage class definition

The decks-install apply runs this script. Dell recommends that you run this script before running the decks-install apply command for the first time (or the first time on a new local machine). If you run this script in those conditions, you ensure that your environment is ready when you want to run the installation. You may run this script at any time.

See Run the prereqs.sh script on page 69.

pre-install.sh

This script must be run one time before installing SDP.

It generates non-default credentials for Pravega components. It also generates a file named pre-install.yaml that contains those credentials. You must include that file with every run of the decks-install apply command.

17

156 Summary of Scripts

See Run pre-install script on page 70.

pre-upgrade.sh

This script must be run before upgrading the SDP version with a new distribution of manifests and charts. The script ensures that the environment is healthy, including running the health checks. Do not update a cluster that is unhealthy.

Make sure to use the pre-upgrade.sh script from the new SDP distribution. The script is version-specific.

provisioner.py

This script recommends configurations for scaling SDP. Use this script after adding new ESXi hosts to expand the cluster.

See Get scaling recommendations on page 96.

scale.py

This script scales SDP using recommended values from the provisioner.py script.

See Scale SDP on page 98.

validate-values.py

This script is part of the installation and change configuration processes. It reads the configuration values files and checks the values over certain criteria. For example, it validates the values that are configured for external connectivity and serviceability.

The decks-install apply command runs this script automatically. You may run this script independently at any time to verify the configuration values files.

See Run the validate-values script on page 71. Also see Change applied configuration on page 83.

Summary of Scripts 157

Installer command reference

Topics:

Prerequisites Command summary decks-install apply decks-install config set decks-install push decks-install sync decks-install unapply

Prerequisites To use the installer, you must meet the following prerequisites. The SDP Kubernetes cluster must exist. You must have direct or network access to the Kubernetes cluster. You must have authentication access rights to the Kubernetes cluster. The installer tool runs in the Kubernetes shell

environment, outside of the Kubernetes cluster. The user must have Kubernetes administrator privileges. Your working environment must have kubectl installed.

A default registry must be configured. The installer applies the default registry pathname to any unqualified image names in the application manifest, producing a path name of registry-path/image-name.

The decks-install push command must be used first, to push images to the default registry. Then you can use the other decks-install commands.

Command summary The installer tool is a command-line executable that installs and uninstalls applications and resources in a Kubernetes cluster, and configures the cluster.

decks-install apply

Applies a given manifest bundle (with optional overrides) to a remote Kubernetes cluster.

decks-install unapply

Unapplies a manifest bundle from a Kubernetes cluster.

decks-install sync

Starts a reconciliation loop between applications and Helm releases.

decks-install config list

Lists the current configuration values.

decks-install config set

Sets a config value.

decks-install push

Pushes an image bundle to a registry.

decks-install version

Shows the version and build information of the installer tool.

decks-install check

Runs health checks on the installed components.

decks-install status

Lists installed components and their state.

18

158 Installer command reference

decks-install apply Applies the custom resource definitions (CRDs) and applications that are specified in a manifest bundle to the SDP Kubernetes cluster. By default, this command also starts the synchronization process that installs the Helm charts for each application.

Syntax

decks-install apply --kustomize --repo [--values , ,...] [--dry-run] [--skip-sync] [--simple-output] [--set = ] [--set-file = ]

Or alternate syntax:

decks-install apply --config

Options

--kustomize

Required. Specifies the location of the manifest bundle. Include the slash to indicate a directory. For example:

--kustomize ./manifests/

The manifest bundle is an artifact delivered in the root of the installer zip file, under manifests/. Manifest files must conform to Kubernetes Kustomize format, as described here.

--repo

Required. Specifies the location of the Helm charts directory. Include the slash to indicate a directory. For example:

--repo ./charts/

The charts are artifacts delivered in the root of the installer zip file, under charts/.

--values , [,...]

Specifies the pathnames of configuration values files. Separate multiple file names with commas and no spaces.

NOTE: SDP requires a configuration file to define required attributes.

--dry-run Currently not used.

--skip-sync If specified, prevents the synchronization process from starting. You can start the synchronization process later, using the decks-install sync command.

The apply step adds CRDs and applications to the cluster in a pending state. The synchronization step reconciles the applications to the desired state.

--simple-output Displays logs to standard out and standard error, which are typically on the command line terminal.

If this flag is omitted, the command writes logs to decks-install.log and decks- install.stderr.

Installer command reference 159

--set =

Provides the value for a configuration parameter, where: is the complete parameter name from the configuration values file, using periods to separate

the components in a name. is the configuration value. You can specify multiple key-value pairs in a comma-separated list.

The following example sets two configuration parameters on the command line:

--set global.storageType=ecs_s2,ecs-service-broker.namespace=mynamespace

The precedence for setting configuration values is: Values that are provided on the command line with the --set or --set-file options take precedence. Values that are provided in configuration values files are next. When multiple configuration values

files are specified and the same parameter appears in more than one file, the value in the right-most file takes precedence.

The installer uses internal default values if no other value is provided.

--set-file =

Provides the path of a file that contains a configuration parameter value, where: is the complete parameter name from the configuration values file, using periods to separate

the components in a name. is the file path name of the file that contains the value for the parameter. You can specify multiple key-file-path pairs in a comma-separated list.

The following example provides the path of the license file. The required configuration value in this case is the contents of the license file. The installer extracts the file contents.

--set-file dellemc-streamingdata-license.licensefile=

--config

Provides installer options in a YAML file.

With many values, overrides, and manual flag settings, it may be difficult to keep track of all flags and settings. For convenience, you can store some common flags in an installer configuration file rather than providing them manually every time you run the decks-install apply command.

Options that are set on the command line take precedence over those in the config file.

In the referenced yaml file, the keys are the option names. s and key value pairs are the command line option names and the must be in standard yaml format.

option1: option-value option2: - value 1 - value 2

Examples

Options on the command line

$ ./decks-install apply --kustomize ./manifests/ --repo ./charts/ \ --values /path/to/values.yaml>,/path/to/pre-install/values.yaml> \ --set-file=dellemc-streamingdata-license.licensefile=/path/to/license.xml>

Options in a config file

The following command and YAML file combination is equivalent to the command line example above.

$ ./decks-install --config config.yaml

160 Installer command reference

Where config.yaml contains the following:

kustomize: manifests/ repo: charts/ values: - /path/to/values.yaml> - /path/to/pre-install/values.yaml> set-file: - dellemc-streamingdata-license.licensefile=/path/to/license.xml>

Output

The command shows progress as components are installed. The default output lists the component name, status in the install process (for example: Pending, Updating, Succeeded), and explanatory information when the status is Pending.

You can change the contents of the third column by using the keys that appear at the bottom of the screen: d changes the third column to a description of the component being installed. v changes the third column to show versions being installed. i changes the third column back to the default view, which shows explanatory information about the reconciliation stage for

a component. With this view, you can see when a component is waiting for dependencies.

decks-install config set Sets a configuration value.

Usage

This command uses a key value pair to set a configuration value for an installation setting.

Syntax

decks-install config set key value

Options

key

A configuration field name. See the configuration file template here.

value

The setting value.

Set the registry

The following example sets the container registry.

$ ./decks-install config set registry gcr.io/stream-platform/reg

Installer command reference 161

decks-install push Pushes an image bundle to a configured container registry.

Usage

The image bundle (a tar archive) for SDP contains several large images. The push operation may take hours to complete.

The registry is typically preconfigured, using the decks-install config command:

decks-install config set registry example-registry.com/some/path

You may override the configured registry URL with the --registry option.

Syntax

decks-install push --input [--registry ]

Options

--input

A .tar file of images. Do not extract files. The installer expects the .tar file as input.

--registry

Optional. Overrides the configured registry URL, to push the images to a different registry URL.

decks-install sync Synchronizes the Kubernetes cluster to the wanted terminal state.

Usage

Synchronization consists of installing, upgrading, reconfiguring, or uninstalling components in the cluster as needed to match the wanted state for each application. The synchronization procedure ends when all components are installed, configured, or removed in accordance with the wanted configuration as recorded from previously applied or unapplied configurations. Synchronization usually takes a few minutes.

A synchronization process begins automatically after you use the decks-install apply or decks-install unapply command. If the synchronization process fails for whatever reason, use the decks-install sync command to resume the process.

It is safe to restart the synchronization process at any time. Be sure to specify the correct manifest and repo chart locations.

Syntax

decks-install sync [--kustomize ] [--repo ]

162 Installer command reference

Options

[--kustomize ]

Required. Specifies the path and directory name of the manifest bundle that describes the applications and resources to synchronize. Include the final slash indicating the directory. For example:

--kustomize ./manifests/

This option is optional on the sync command. If omitted, the installer only synchronizes the application resources based on the information found on the cluster. Namespace and CRD cleanup operations are not run since they use the information from the manifest file. It is recommended to pass --kustomize if you have the manifests.

--repo

Required to synchronize application installations. Specifies the path and directory name of the Helm charts . Include the final slash indicating the directory. For example:

--repo ./charts/

decks-install unapply Marks applications for removal from the Kubernetes cluster and starts the synchronization process. Use this command to uninstall SDP from a cluster.

Syntax

decks-install unapply --kustomize --repo [--dry-run] [--skip-sync] [--simple-output]

Usage

Use this command if you need to start over with a completely new SDP installation due to corruption or a major system failure.

If you run decks-install unapply against the same manifest bundle used for installation, it uninstalls all Streaming Data Platform components and it deletes all Streaming Data Platform data. When command execution completes, you have an empty Kubernetes cluster. You can then start over with a new installation into that cluster.

WARNING: In Streaming Data Platform V1.2, this process deletes all user data that was ingested into Pravega.

Options

--kustomize

Required. Specifies the location of the manifest bundle that defines the applications to uninstall. Include the slash to indicate a directory. For example:

--kustomize ./manifests/

Manifest files must conform to Kubernetes Kustomize format, as described here.

--repo

Installer command reference 163

Specifies the location of the Helm charts directory to reconcile with. Include the slash to indicate a directory. For example:

--repo ./charts/

The charts are artifacts originally delivered in the root of the installer zip file, under charts/.

--dry-run Currently not used.

--skip-sync If specified, prevents the synchronization process from starting. You can start the synchronization process later, using the decks-install sync command.

The apply step adds CRDs and applications to the cluster in a pending state. The synchronization step reconciles the applications to the desired state.

--simple-output Displays logs to standard out and standard error, which are typically on the command line terminal.

If this flag is omitted, the command writes logs to decks-install.log and decks- in

Manualsnet FAQs

If you want to find out how the 1.2 Dell works, you can view and download the Dell Streaming Data Platform 1.2 Software Installation Guide on the Manualsnet website.

Yes, we have the Installation Guide for Dell 1.2 as well as other Dell manuals. All you need to do is to use our search bar and find the user manual that you are looking for.

The Installation Guide should include all the details that are needed to use a Dell 1.2. Full manuals and user guide PDFs can be downloaded from Manualsnet.com.

The best way to navigate the Dell Streaming Data Platform 1.2 Software Installation Guide is by checking the Table of Contents at the top of the page where available. This allows you to navigate a manual by jumping to the section you are looking for.

This Dell Streaming Data Platform 1.2 Software Installation Guide consists of sections like Table of Contents, to name a few. For easier navigation, use the Table of Contents in the upper left corner.

You can download Dell Streaming Data Platform 1.2 Software Installation Guide free of charge simply by clicking the “download” button in the upper right corner of any manuals page. This feature allows you to download any manual in a couple of seconds and is generally in PDF format. You can also save a manual for later by adding it to your saved documents in the user profile.

To be able to print Dell Streaming Data Platform 1.2 Software Installation Guide, simply download the document to your computer. Once downloaded, open the PDF file and print the Dell Streaming Data Platform 1.2 Software Installation Guide as you would any other document. This can usually be achieved by clicking on “File” and then “Print” from the menu bar.