Is your server suddenly running slower, and applications no longer responding to queries? The culprit is most often a single process that greedily consumes CPU or RAM resources without restraint. Learn how to quickly identify the digital intruder and effectively rein it in using Linux’s built-in tools.
Introduction: When Linux Loses Its Breath
The Linux operating system is renowned for its stability, performance, and excellent management of hardware resources. Nevertheless, even the most optimized system can be brought to its knees by a single unruly process. Whether you administer a home server, manage a distributed cloud infrastructure, or simply use Linux on your laptop, sooner or later you will encounter a situation where the system begins to slow down dramatically. Fans will spin at maximum speed, interface response times will stretch to several seconds, and network services will start reporting timeout errors.
In such moments, a rapid and precise response is crucial. Before reflexively using the legendary kill -9 command, it’s worth understanding why a process behaves this way, how to accurately identify the culprit, and what subtler methods we can use to curb its appetite for CPU cycles and RAM space. In this article, we’ll walk through the full diagnostic and repair path—from basic real-time monitoring, through advanced priority control, to modern kernel mechanisms such as control groups (cgroups) and integration with systemd.
1. Most Common Causes of Excessive CPU and RAM Usage
Before diving into tools, we need to understand the nature of the problem. Excessive system load is rarely accidental. It is most often caused by one of the following:
- Software bugs: Poorly designed conditional loops can lead to a thread getting stuck in an infinite loop, consuming 100% of one or more CPU cores. Memory leaks occur when a program allocates RAM but fails to release it after operations complete. As a result, RAM usage steadily grows until resources are exhausted.
- Suboptimal service configuration: Popular network services like web servers (Apache, Nginx), code interpreters (PHP-FPM), or database management systems (MySQL, PostgreSQL) require precise tuning to match the machine’s resources. If you configure too many worker processes (e.g., the
max_childrenparameter in PHP-FPM) relative to available RAM, the server will quickly resort to swap space, drastically reducing performance. - High-intensity computational tasks: Processes such as compiling software from source, generating backups (especially with strong compression), video transcoding (e.g., using FFmpeg), or training machine learning models naturally tend to saturate available resources. In these cases, we’re not dealing with a bug but with normal workload characteristics that we must control to avoid disrupting other services.
- High I/O wait: Sometimes the CPU appears overloaded, but the real bottleneck lies at the level of the hard drive or network. Processes spend most of their time waiting for data writes or reads (I/O wait state), blocking subsequent operations and paralyzing the system.
- Malware and cryptocurrency miners: Unauthorized software, such as hidden cryptocurrency miners, can instantly seize full computational power of a server, masking its presence under innocuous-sounding process names.
2. Diagnostic Tools: How to Track Down the Culprit?
To effectively combat the problem, we must first know what we’re dealing with. Linux offers a rich set of built-in tools that allow for rapid diagnosis of system status. If you’re preparing for professional system administration, check out 50 popular Linux interview questions, which often cover some of these tools in detail.
Interactive Classic: The top Command
The top tool is present in nearly every Linux distribution. Launch it by simply typing:
topAfter startup, you’ll see a dynamically refreshed screen. The first five lines provide key information about the overall system health:
- Load average: Three numbers indicating the average queue length of processes waiting to execute over the last 1, 5, and 15 minutes. If you have a 4-core processor and the load average is, say, 8.00, it means the system is twice as overloaded—processes must wait in line for free CPU cycles.
- %Cpu(s): Shows the percentage breakdown of CPU time. The most important indicators are
us(user—time spent on user applications),sy(system—time spent in kernel space),id(idle—idle time), andwa(iowait—waiting for disk operations). A highwavalue suggests the issue isn’t a weak processor but a slow disk. - MiB Mem / Swap: Statistics for physical memory and swap space. Pay attention to
usedandbuff/cache.
To make working with top easier, it’s worth memorizing these basic keyboard shortcuts that work in real time:
P– sorts the process list by CPU usage.M– sorts the process list by physical memory (RAM) usage.k– allows quick termination of a process (you’ll be prompted for the PID and signal number, defaulting to 15, i.e., SIGTERM).r– changes the priority of an existing process (renice).q– exits the program.
Modern Alternative: htop
While top is universal, its interface can feel somewhat raw. Most administrators prefer the htop tool, which offers colorful progress bars, full mouse support, vertical and horizontal scrolling of the process list, and intuitive searching.
htopIn htop, at the bottom of the screen you’ll find a cheat sheet of function keys:
F3– search for a process by name.F4– filter the list (shows only processes matching the entered phrase).F5– tree view (great for analyzing which parent process spawned child processes, e.g., in the case of an Apache server).F6– advanced sorting by any column.F9– send a signal (kill) to the selected process.
Command-Line Diagnostics: ps, free, and vmstat
Sometimes you don’t want to launch an interactive interface but need a one-time, precise report. Then, the ps command becomes indispensable. To find the 10 processes consuming the most RAM, you can use the following construct:
ps aux --sort=-%mem | head -n 11Similarly, to find processes most heavily loading the CPU:
ps aux --sort=-%cpu | head -n 11Let’s briefly explain the flags used in the ps aux command:
a– displays processes of all users.u– presents information in a user-friendly format (includes %CPU, %MEM, VSZ, RSS columns).x– includes processes not tied to any terminal (e.g., system daemons running in the background).
For a quick RAM usage assessment, the free -h command is useful (the -h flag ensures a readable format, e.g., in gigabytes or megabytes):
free -hIf you want to monitor system behavior over a longer period, the vmstat tool will be helpful. It displays statistics on processes, memory, paging, I/O operations, and CPU activity. For example, the vmstat 2 5 command will show 5 reports at 2-second intervals, allowing you to spot sudden load spikes.
3. Methods for Limiting Process Resources
Once you’ve identified the process destabilizing your system, you have several options. Of course, the simplest solution is to shut it down, but in a production environment, this is rarely feasible. We need to ensure the application continues to run but in a controlled manner, without disrupting other system components.
Managing Priorities: nice and renice
The Linux operating system allocates CPU time among processes using a task scheduler (scheduler). This entire architecture, pioneered by Linus Torvalds, is based on fair time-sharing of the processor. However, we can influence the scheduler’s decisions by assigning processes an appropriate "niceness" value—known as the nice value.
The nice value ranges from -20 (highest priority, the process is "least nice" and takes precedence over others) to 19 (lowest priority, the process is "very nice" and yields to others). By default, every new process starts with a value of 0.
If you want to run a heavy task (e.g., compressing a large archive) in the background so it doesn’t slow down your daily work, use the nice command:
nice -n 15 tar -czf backup.tar.gz /var/www/html/But what if a process is already running and suddenly starts generating high load? Then the renice command comes to the rescue, allowing you to change the priority of a running process based on its PID (which you can easily find using top or pgrep):
sudo renice -n 19 -p 4321The above command sets the lowest possible priority for the process with PID 4321. Remember that a regular user can only increase the nice value (i.e., lower the priority of their own processes). Only the administrator (root) can assign a negative value (increase priority) or modify processes belonging to other users.
Important limitation: The nice/renice method does not impose a hard limit on CPU usage. If your system is idle and has no other tasks to execute, a process with a priority of 19 can still consume up to 100% of a CPU core. The scheduler will only restrict it when other higher-priority processes demand CPU time.Active Throttling: cpulimit
If you need to impose a strict, unbreakable percentage limit on CPU usage for a process, nice tools won’t suffice. In such cases, the cpulimit tool is ideal. It’s not usually preinstalled but can be found in the official repositories of most distributions (e.g., sudo apt install cpulimit on Debian/Ubuntu systems).
The cpulimit tool works by sending the process SIGSTOP (suspend) and SIGCONT (resume) signals at very short intervals. This artificially doses the process’s execution time, keeping its average CPU usage at the desired level.
To limit a running process with PID 5678 to a maximum of 40% of a single CPU core, enter:
sudo cpulimit --pid 5678 --limit 40You can also launch a new program with a predefined limit:
cpulimit --limit 25 -- firefoxThis tool works great for single-threaded applications, but remember that on multi-core processors, limits can exceed 100%. If you have a 4-core processor, the maximum theoretical limit is 400%. Limiting a multi-threaded process to 100% means it can fully saturate one core or distribute its work across several cores, without exceeding the power of a single core.
Advanced Control: Control Groups (cgroups v2)
Both nice and cpulimit have their drawbacks—the first is too soft, while the second operates at the user signal level and can cause anomalies in latency-sensitive applications. The most professional, low-level solution built into the Linux kernel is control groups (cgroups). Modern container technologies like Docker and LXC are built on this foundation.
In modern distributions, the cgroups v2 standard is used by default. It allows grouping processes into hierarchical structures and assigning them hard limits for CPU, RAM, I/O operations, and many other resources.
To manually create a new control group and apply limits, we need to work with the virtual filesystem mounted in the /sys/fs/cgroup directory. Let’s go through this process step by step:
- Create a subdirectory representing your new group (the kernel will automatically generate the necessary configuration files inside):
sudo mkdir /sys/fs/cgroup/limitowana_grupa - Set a RAM limit for this group (e.g., a maximum of 512 MB). The value is given in bytes (512 * 1024 * 1024 = 536870912):
echo "536870912" | sudo tee /sys/fs/cgroup/limitowana_grupa/memory.max - Set a CPU time limit. In cgroups v2, the CPU limit is defined using two values in the
cpu.maxfile: the time limit (in microseconds) and the period (default 100000 microseconds, i.e., 100 ms). To limit the group to a maximum of 20% of a single core’s power, enter the value 20000 (20 ms) with the default period of 100000:echo "20000 100000" | sudo tee /sys/fs/cgroup/limitowana_grupa/cpu.max - Now, simply assign the PID of the process (e.g., 7890) you want to limit to the
cgroup.procsfile in your group:echo 7890 | sudo tee /sys/fs/cgroup/limitowana_grupa/cgroup.procs
From this point on, the process with PID 7890 and all its child processes will share the limits applied to the limitowana_grupa group. If the process attempts to allocate more than 512 MB of RAM, the kernel will prevent it, and in extreme cases, trigger the defensive mechanism we’ll discuss later in the article.
Production Standard: Limits in systemd
Manually working with the /sys/fs/cgroup filesystem is an excellent educational exercise, but in daily administrator work, limits are rarely configured this way. Modern Linux distributions use the systemd init system, which fully integrates with cgroups and allows defining limits directly in service configuration files (unit files).
If you have a system service (e.g., a database server, a Node.js bot, or a Python script running as a daemon), you can easily limit its resources. The best practice is to use the systemctl edit command, which creates a safe override file and prevents changes from being lost during system package updates:
sudo systemctl edit nazwa_uslugi.serviceIn the opened text editor, enter the following configuration:
[Service]
CPUQuota=50%
MemoryMax=1GThe above ensures the service never consumes more than 50% of CPU time (equivalent to half of one core) and does not exceed 1 GB of RAM. After saving the file and exiting the editor, systemd will automatically reload the configuration and immediately apply the new limits to the running service. You can check the status and applied limits using the well-known command:
systemctl status nazwa_uslugi.serviceTraditional Session Limits: ulimit and limits.conf
In multi-user systems where multiple users work on a single server (e.g., shell servers at universities or shared development environments), it’s crucial to prevent situations where one user paralyzes the entire machine. The traditional ulimit mechanism serves this purpose.
The ulimit command allows setting resource limits for the current shell session and processes launched from it. For example, you can limit the maximum virtual memory size to 1 GB using the command:
ulimit -v 1048576However, to make these limits permanent and automatically apply to every user after login, you need to edit the /etc/security/limits.conf configuration file. This file allows defining hard limits (hard—impossible for the user to exceed) and soft limits (soft—the user can increase them up to the hard limit).
Here’s an example entry in the /etc/security/limits.conf file, which limits the maximum number of processes (nproc) and the maximum address space size (as) for a user named developer:
developer soft nproc 100
developer hard nproc 150
developer hard as 2097152The value 2097152 is given in kilobytes, meaning a hard limit of 2 GB of address space for this user’s processes. This mechanism is based on the PAM (Pluggable Authentication Modules) module and is applied during the user authentication process in the system.
4. Automating Resource Control
In dynamic environments, manually configuring limits for each process can be cumbersome. Fortunately, Linux offers advanced automation methods that allow dynamic resource allocation based on execution context.
Ad-Hoc Execution with Limits: systemd-run
What if you want to run a one-off script that you know is poorly optimized but don’t want to create a dedicated systemd service for it? The systemd-run command is the perfect solution. It allows launching any command inside a temporary unit (scope) managed by systemd, with immediate cgroup limits applied.
Example usage:
sudo systemd-run --scope -p CPUQuota=30% -p MemoryMax=512M python3 ciezki_skrypt.pyThis ensures the script runs in the background and systemd ensures it doesn’t consume more than 30% CPU and 512 MB of RAM. Once the script completes, the temporary control group will be automatically removed from the system.
5. Potential Risks and the Ruthless OOM Killer
Imposing resource limits is a powerful tool in an administrator’s hands, but like any weapon, it carries serious risks. Overly aggressive throttling can backfire and lead to failures of critical system services.
Understanding the Out-of-Memory (OOM) Killer
The most dramatic consequence of running out of RAM in a Linux system is the intervention of the OOM Killer (Out-of-Memory Killer). When free physical memory and swap space are completely exhausted, the kernel faces a dire choice: either freeze the system entirely (kernel panic) or sacrifice some processes to save the stability of the entire machine.
The Linux kernel continuously analyzes running processes and assigns them points on a "badness" scale. The more memory a process consumes and the shorter its runtime, the higher its score. When a critical situation arises, the OOM Killer ruthlessly kills the process with the highest score, sending it the SIGKILL signal.
Information about OOM Killer interventions is always logged in the kernel message buffer. You can check it using the dmesg command or by searching system logs:
sudo dmesg -T | grep -i -E 'oom|kill'If you see an entry like Killed process 1234 (mysqld) total-vm:4194304kB, anon-rss:2097152kB, file-rss:0kB, shmem-rss:0kB in the logs, it means your database server fell victim to memory exhaustion.
How to Protect Critical Processes from Death?
Not all processes are equal. While losing a worker process of a web server is painless (it will be automatically restarted), suddenly killing the main database process or the SSH daemon (sshd) could cut you off from remote server management. Fortunately, we can manually adjust the OOM Killer’s tendency to kill specific processes.
This is done using the oom_score_adj parameter available in the /proc/[PID]/ directory. It accepts values from -1000 (complete immunity from killing—the OOM Killer will never touch this process) to 1000 (highest elimination priority).
To ensure your SSH server (assuming its PID is 1111) survives even the worst memory crisis, you can set its adjustment to the minimum value:
echo "-1000" | sudo tee /proc/1111/oom_score_adjFor services managed by systemd, a much cleaner solution is to add the OOMScoreAdjust directive directly in the unit file:
[Service]
OOMScoreAdjust=-1000Other Risks Associated with Limits
In addition to the OOM Killer, administrators must be aware of other consequences of imposing restrictions:
- Application instability: Some programs aren’t prepared for sudden resource constraints. If a multi-threaded application expects quick responses from its secondary threads and they are drastically slowed by
cpulimitornice, internal deadlocks or timeout errors may occur within the application. - Misdiagnosis of the problem: Limiting resources is symptomatic treatment. If an application has a memory leak, restricting it with
MemoryMaxwill only cause the application to be restarted more frequently by the OOM Killer or systemd, without solving the underlying faulty code issue. The real solution is to fix the code or update the software.
6. Best Practices in Process Management
To keep your system stable and free of surprises, it’s worth implementing a set of good process management practices:
- Proactive monitoring and alerts: Don’t wait until the server stops responding. Install continuous monitoring tools like Prometheus with Node Exporter, Netdata, or Zabbix. Configure alerts that notify you when average CPU load exceeds 80% for more than 15 minutes or when free RAM drops below 10%.
- Service configuration tuning (capacity planning): Always match software configuration to the physical parameters of the machine. If your server has 8 GB of RAM and the MySQL database is configured to buffer 12 GB of data in memory (the
innodb_buffer_pool_sizeparameter), a crash is just a matter of time. Regularly review configurations and conduct load tests. - Principle of least privilege: Never run user applications with root privileges. If a process is hijacked by an attacker or runs amok due to a bug, as root it has unrestricted access to all system resources. Running services in dedicated containers or as system users with limited privileges is an absolute foundation of security and stability.
- Regular software updates: Software developers continuously optimize their code and fix discovered memory leaks. Regularly updating the operating system and installed packages is the simplest way to avoid performance issues.
In daily administrative work, broad knowledge of operating system architecture is also extremely useful. To learn more about administration nuances and terminal commands, see another 50 popular Linux interview questions, where we detail the daily challenges of system engineers and advanced optimization techniques.
Conclusion
Managing processes in Linux is an art of balancing between providing adequate resources for critical applications and protecting the system from overload. Thanks to a rich set of tools—from simple diagnostic commands like top and htop, through traditional priority mechanisms like nice, to modern and powerful control groups cgroups and integration with systemd—administrators have full control over every running process.
However, the key to success is always precise diagnosis. Before taking any corrective action, ensure you’ve correctly identified the bottleneck—whether it’s insufficient CPU computational power, exhausted RAM, or slow disk writes (I/O wait). Remember that imposing limits is a powerful tool: used wisely, it will ensure your server runs smoothly for months; used carelessly, it can paralyze services. Test the limits you impose in a safe test environment and continuously monitor the health of your systems.
Sources
- https://www.tecmint.com/limit-cpu-memory-usage-linux-process/
- https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQFbEnsG7fsZeLzmZjIhArYRuc4GI0eM4UjclFrOFnIzMKOVXIN0mE1G4kUhp0HbH_xPcuCH21iJFdmlKVCaqbwXFIplF2ckK9LwVzcy0J_gKhRX2BtQjYvjo8iRFBSM3YSb4rpABoR2givKCZlYsc3fSQivpvn-yF_Tx9BqsTY8xOT7faxFvf1aejT4Bc2pybLUXmHuYO8=
- https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQH6tw60CZ2GxPSrer3Tvrgho5yf7dKhBFBb6HkMHibUPZL0NNiu76C-BWVwPHBO7nEd66m-QUFckwYtHwrIk8qGbvzG8Z8W1v53Iwa96NeHTyhCQ5PTA9wVFPgaypIBX2OxW_OBSenjJEKzXNbIs5XObGOnogAPeyLFdDB9Vdp3UYDjIwl0UQ==
- https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQFHObDkGaqo_AzUIj_-zllcFVtAkIS5qnyFDyXb1p9Hdf8fVEWSjmSM7F2Lm3PpfRkBOopzz-F1wFjEoPl7tt5sx-7urX-Bq600li6EiGKllpOrLAsAZIROkm14mmVOlJh7P0bym3IvkXcU0jDY7mGx7FsnF432wnKmU7HvmGNkIaobAg9z0bhB5tLstg==
- https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQF_DM1vEWaUjkIuh9alZ2jD46HmEdXd3oMWk1eR3OJS6eYjTXP0Cmm6IH0HyF6SN7UMZJw95oduUasJGh8TMvnHQTz0uMz6xWe3mlZUb8mpTpcb1SnIfkOPW144Gu7Xk3nP8fZFEQ4=
- https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQFic4zccfjGyWmztxcNwL5N-7Rs9i12nJt5AoH6TJFIuS0VQl-oxg5D0SNa-Kh39E5lL3N5DhxWb5jrKyOO0TwED73gbbuxRxPHtjVjoU8PWqUAgVf0CsHQz6FXPU9BzAl_rcrdHKBLAY346qcocRpnawnqB7nI2SCNni54BT75W7RYr1sheAaNooB6ABVRwHZDNotBt9BfeVLMD5ZAwELiTsACK5CmUA07azDB
- https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHof5_JQZEUCeQOM58XMzC26yJhLFQLZadjk6eVOnla3cUYJ8JfpiWt07Mw2KoNFqzYIYXx5g3-tpEnm67sMueZDAdNRCKzPvwV5L3RKVAYTY0AsMpg2Bb_oHEJ-1PlscH040OyP0M7I512xVKFptM8NiY-mKJEq83M9hvgW1NjuP9jzYxez4VfJ90l
Comments