Metadata-Version: 2.4 Name: prometheus-monitoring-scripts Version: 0.0.0 Summary: prometheus-monitoring-scripts Author-email: Jordan Tardif License: Apache Software License 2.0 Requires-Python: >=3.8 Description-Content-Type: text/markdown Requires-Dist: prometheus-client Requires-Dist: storable Provides-Extra: test Requires-Dist: pytest; extra == "test" Requires-Dist: pytest-cov; extra == "test" Requires-Dist: coverage; extra == "test" # Prometheus Monitoring Scripts A collection of custom exporters for Prometheus monitoring, designed to collect metrics from various systems and services. ## Installation Clone the repository and set up the development environment using the Makefile: ```bash git clone https://git.dreamhost.com/dreamhost/infra/prometheus-monitoring-scripts.git cd prometheus-monitoring-scripts make setup source env/bin/activate ``` This will create a virtual environment, install the package in development mode, and install all required dependencies. > Note: The `make setup` command requires the `uv` tool, a modern Python package manager. If you don't have `uv` installed, you can install it following the instructions at [https://docs.astral.sh/uv/getting-started/installation/](https://docs.astral.sh/uv/getting-started/installation/). ## Development The project includes several Makefile targets to help with development: - `make setup` - Set up the development environment - `make style` - Check code style using Ruff - `make autopep` - Automatically fix code style issues using Ruff - `make test` - Run functional tests - `make test_smoke` - Run smoke tests To run specific tests, use: ```bash make test test=tests/path/to/test ``` ## Usage The monitoring scripts are organized as modules that can be called using the `custom_exporter` command: ```bash /opt/prometheus-monitoring-scripts/bin/custom_exporter [arguments] ``` In a dev environment, the paths are configured such that you can run `custom_exporter` without a full path. ### Available Modules - `disk.xfs` - XFS quota metrics for user disk usage - `disk.podman` - Disk usage metrics for Podman containers - `mailq.generic` - Postfix mail queue metrics - `mailq.podman` - Postfix mail queue metrics for Podman containers - `mailq.mailman` - Mailman queue monitor - `backups.users` - User backup metrics - `backups.vms` - VM backup metrics - `service.podman` - Podman service metrics - `dphactl.core` - Core dp-ha-ctl metrics for monitoring Redis connection and service status - `dphactl.systemctl` - Systemctl-based checks for Podman socket and HAManager service - `dphactl.logging_checks` - Log-related checks for Redis auth failures and timeouts - `dphactl.containers` - Container difference checks between hosts - `dphactl.verify` - Missing file verification checks from dp-ha-ctl ## Module Documentation ### disk.xfs Collects XFS quota metrics for disk usage by user and filesystem. ```bash /opt/prometheus-monitoring-scripts/bin/custom_exporter disk.xfs ``` #### Metrics | Metric Name | Type | Description | Labels | |-------------|------|-------------|--------| | `xfs_disk_avail_bytes` | Gauge | Available disk space in bytes | `filesystem`, `user` | | `xfs_disk_used_bytes` | Gauge | Used disk space in bytes | `filesystem`, `user` | The exporter runs `xfs_quota` on the specified mount point (default: `/home`) to gather usage data. ### mailq.generic Collects Postfix mail queue metrics by counting files in queue directories. ```bash /opt/prometheus-monitoring-scripts/bin/custom_exporter mailq.generic ``` #### Metrics | Metric Name | Type | Description | Labels | |-------------|------|-------------|--------| | `postfix_queue_size` | Gauge | Number of emails in queue | `queue` | Monitors all standard Postfix queues: active, bounce, deferred, incoming, and maildrop. ### mailq.podman Extends mail queue monitoring to Postfix instances running in Podman containers. ```bash /opt/prometheus-monitoring-scripts/bin/custom_exporter mailq.podman [container_filter] ``` #### Metrics | Metric Name | Type | Description | Labels | |-------------|------|-------------|--------| | `postfix_queue_size` | Gauge | Number of emails in queue | `machine`, `queue` | | `postfix_queue_errors` | Gauge | Increment for each error accessing queues | `machine` | The optional container filter parameter allows monitoring specific containers. ### mailq.mailman Monitors Mailman queue sizes by counting files in queue directories across multiple mailman instances. ```bash /opt/prometheus-monitoring-scripts/bin/custom_exporter mailq.mailman [base_path] ``` #### Metrics | Metric Name | Type | Description | Labels | |-------------|------|-------------|--------| | `mailman_queue_size` | Gauge | Number of files in queue directories | `service`, `queue` | This exporter scans mailman instances in the specified base path (default: `/dh/mailman`) and counts files in both "in" and "out" queue directories. Each service (mailman instance directory) is tracked separately with the directory name as the service label. ### backups.users Collects metrics about user backup status and history. ```bash /opt/prometheus-monitoring-scripts/bin/custom_exporter backups.users ``` #### Metrics | Metric Name | Type | Description | Labels | |-------------|------|-------------|--------| | `backup_last_successful` | Gauge | Timestamp of last successful backup | `user`, `machine`, `vmhost` | | `backup_last_rsync_exit_code` | Gauge | Exit code of last rsync operation | `user`, `machine`, `vmhost` | | `backup_last_attempted_backup` | Gauge | Timestamp of last backup attempt | `user`, `machine`, `vmhost` | | `backup_last_user_state` | Gauge | State of last backup (1=active state) | `user`, `machine`, `vmhost`, `state` | | `backup_state_retrive_failed` | Gauge | Indicates backup state retrieval failed | `machine`, `vmhost` | | `backup_status` | Gauge | Overall backup status | `machine`, `vmhost`, `status` | Reads backup state from `/usr/local/dh/var/localdata/backup.state` and user information from `/usr/local/dh/etc/localdata/users.json`. ### backups.vms Extends backup monitoring to virtual machines using the same metrics as user backups. ```bash /opt/prometheus-monitoring-scripts/bin/custom_exporter backups.vms ``` Uses the same metrics as `backups.users` but applied to VM guests defined in `/usr/local/dh/etc/localdata/guests.json`. ### service.podman Monitors systemd services running within Podman containers. ```bash /opt/prometheus-monitoring-scripts/bin/custom_exporter service.podman [container_filter] ``` #### Metrics | Metric Name | Type | Description | Labels | |-------------|------|-------------|--------| | `node_systemd_unit_state` | Gauge | State of systemd units (1=in this state) | `machine`, `name`, `state`, `type` | | `podman_exec_errors` | Gauge | Increments for each failed exec into container | `machine` | The first argument is required and should be a comma-separated list of service names to monitor. The optional second argument filters which containers to check. ### dphactl.core Provides core metrics for the DP-HA-CTL system, focusing on basic functionality like Redis connection and service status. ```bash /opt/prometheus-monitoring-scripts/bin/custom_exporter dphactl.core ``` #### Metrics | Metric Name | Type | Description | Labels | |-------------|------|-------------|--------| | `dp_hactl_redis_connected` | Gauge | Status of Redis backend connection (1=connected) | | | `dp_hactl_service_up` | Gauge | Status of dp-ha-ctl service (1=up) | | The module implements timeouts for all commands and preserves subprocess exit codes for detailed error reporting. ### dphactl.systemctl Monitors system services related to DP-HA-CTL functionality using systemctl. ```bash /opt/prometheus-monitoring-scripts/bin/custom_exporter dphactl.systemctl ``` #### Metrics | Metric Name | Type | Description | Labels | |-------------|------|-------------|--------| | `podman_socket_active` | Gauge | Status of podman.socket systemd unit (1=active) | | | `hamanager_service_active` | Gauge | Status of hamanager.service systemd unit (1=active) | | This module can be easily removed or replaced when node-exporter checks are enabled on the hosts. ### dphactl.logging_checks Examines log files to track Redis-related issues. ```bash /opt/prometheus-monitoring-scripts/bin/custom_exporter dphactl.logging_checks ``` #### Metrics | Metric Name | Type | Description | Labels | |-------------|------|-------------|--------| | `hamanager_redis_auth_failures` | Gauge | Count of Redis authentication failures in logs | | | `hamanager_redis_timeouts` | Gauge | Count of Redis timeouts in logs | | Scans the log files in `/var/log/hamanager/` for specific patterns related to Redis errors. ### dphactl.containers Monitors container differences between hosts. ```bash /opt/prometheus-monitoring-scripts/bin/custom_exporter dphactl.containers ``` #### Metrics | Metric Name | Type | Description | Labels | |-------------|------|-------------|--------| | `dp_hactl_missing_containers_total` | Gauge | Count of missing containers | `type`, `problem` | Reports missing standby and primary containers, with specific error labels for timeout or connection issues. ### dphactl.verify Runs verification checks on the DP-HA-CTL system to ensure proper configuration. ```bash /opt/prometheus-monitoring-scripts/bin/custom_exporter dphactl.verify ``` #### Metrics | Metric Name | Type | Description | Labels | |-------------|------|-------------|--------| | `dp_hactl_verify_problems_total` | Gauge | Total number of problems per container | `machine` | | `dp_hactl_verify_problem_details` | Gauge | Detailed breakdown of problem types | `machine`, `problem` | Runs the `dp-ha-ctl verify` command and categorizes detected problems for detailed monitoring. All dphactl modules implement proper error handling with specific subprocess exit code preservation for detailed alerting and debugging.