Introduction

Apache DolphinScheduler is a powerful, distributed, and easy-to-use workflow scheduling system. It is widely used in data integration, big data processing, and other scenarios that require the scheduling of workflows and tasks. As with any complex system, logging is an essential aspect of DolphinScheduler. Logs help in monitoring, debugging, and analyzing the system’s performance and behavior. However, logs can grow rapidly and consume significant disk space if not managed properly. Therefore, it is crucial to regularly delete log instances to maintain system efficiency and prevent storage issues.

Importance of Regular Log Deletion

Before delving into the specifics of how to delete logs, it is important to understand why regular log deletion is necessary:

  1. Disk Space Management: Logs can consume a lot of disk space, which can be problematic, especially in environments with limited storage capacity.
  2. Performance Optimization: Excessive logs can slow down the system, making log searches and analysis cumbersome.
  3. Compliance and Security: Retaining logs longer than necessary can pose security risks and violate data retention policies.
  4. Operational Efficiency: Regular log maintenance ensures the system runs smoothly without interruptions due to storage issues.

Setting Up DolphinScheduler

Before we proceed to the log deletion process, let’s ensure DolphinScheduler is properly set up. This section assumes you have already installed DolphinScheduler. If not, you can follow the official installation guide.

Identifying Log Locations

DolphinScheduler stores its logs in specified directories. The location of these logs can be configured in the common.properties file. The default log directory is typically /logs.

To identify your log directory, check the dolphinscheduler.env.path and dolphinscheduler.log.basepath properties in the common.properties file:

properties

# common.properties
dolphinscheduler.env.path=/your/dolphinscheduler/env/path
dolphinscheduler.log.basepath=/your/dolphinscheduler/log/path

Automated Log Deletion Using Shell Script

One common method to automate log deletion is using a shell script combined with a cron job. This approach is platform-independent and flexible.

Step 1: Create a Shell Script

Create a shell script named delete_old_logs.sh:

sh

#!/bin/bash

# Set the log directory
LOG_DIR=“/your/dolphinscheduler/log/path”

# Set the number of days to retain logs
RETENTION_DAYS=30

# Find and delete logs older than the retention period
find $LOG_DIRtype f -mtime +$RETENTION_DAYSexec rm -f {} \;

echo “Logs older than $RETENTION_DAYS days have been deleted.”

Step 2: Make the Script Executable

Make the script executable by running:

sh

chmod +x delete_old_logs.sh

Step 3: Schedule the Script with Cron

To run this script regularly, schedule it using cron. Open the cron table for editing:

sh

crontab -e

Add the following line to run the script daily at midnight:

sh

0 0 * * * /path/to/delete_old_logs.sh

Automated Log Deletion Using Python Script

For more complex log management, a Python script can be used. Python provides more flexibility and can be easily extended for additional features.

Step 1: Create a Python Script

Create a Python script named delete_old_logs.py:

python

import os
import time
from datetime import datetime, timedelta
# Set the log directory
LOG_DIR = “/your/dolphinscheduler/log/path”# Set the number of days to retain logs
RETENTION_DAYS = 30# Calculate the cutoff time
cutoff_time = datetime.now() – timedelta(days=RETENTION_DAYS)# Iterate through the log directory and delete old log files
for root, dirs, files in os.walk(LOG_DIR):
for file in files:
file_path = os.path.join(root, file)
file_modified_time = datetime.fromtimestamp(os.path.getmtime(file_path))
if file_modified_time < cutoff_time:
os.remove(file_path)
print(f”Deleted: {file_path})print(f”Logs older than {RETENTION_DAYS} days have been deleted.”)

Step 2: Schedule the Python Script

To run this script regularly, use a cron job or any task scheduler. For cron:

sh

crontab -e

Add the following line to run the script daily at midnight:

sh

0 0 * * * /path/to/python /path/to/delete_old_logs.py

Using DolphinScheduler’s Built-in Features

DolphinScheduler also provides built-in features for log management, which can be configured in the common.properties file. These properties allow you to set up log retention policies directly within DolphinScheduler.

Configure Log Retention

Edit the common.properties file to include log retention settings:

properties

# Log retention period in days
dolphinscheduler.log.retention.days=30

With this configuration, DolphinScheduler will automatically handle log deletion based on the specified retention period.

Monitoring Log Deletion

Regularly deleting logs is only effective if it is monitored and verified. Monitoring can be done through:

  1. System Logs: Check system logs to ensure the log deletion scripts are running as expected.
  2. Disk Usage Reports: Use tools like df and du to monitor disk usage and verify that log deletion is freeing up space.
  3. Alerts and Notifications: Set up alerts for disk space usage to get notified before running out of space.

Troubleshooting Common Issues

  1. Permission Issues: Ensure the scripts have the necessary permissions to access and delete log files.
  2. Script Failures: Regularly check the cron logs (/var/log/cron) to ensure the scripts are running without errors.
  3. Retention Policy Misconfiguration: Double-check the retention settings to avoid unintentional data loss.

Conclusion

Efficient log management is crucial for maintaining the performance and stability of Apache DolphinScheduler. Regularly deleting log instances helps manage disk space, optimize system performance, and comply with data retention policies. By using shell scripts, Python scripts, or DolphinScheduler’s built-in features, you can automate this process and ensure your system remains efficient and responsive.

Implementing these log deletion strategies requires careful planning and monitoring, but the benefits far outweigh the effort. Proper log management not only helps in maintaining operational efficiency but also enhances the overall reliability of your workflow scheduling system.

With the methods outlined in this article, you can ensure that your DolphinScheduler setup remains clean, efficient, and ready to handle the demands of your data processing workflows. Regular log deletion is a simple yet powerful practice that every administrator should adopt to keep their systems running smoothly.