Apache/PHP Bug childs are got stucked?

2016/12/01 02:25:16
Apache/PHP Bug childs are got stucked?

Don't let the users to destroy your server by hammering the F5 key!

Every sysadmin seen that apache/fpm/fcgi just "GONE", especially with shared hosting.

Why its freezeing? Why hammering F5 killing your webserver?

What can you do if you have a lot of WordPress site which has a speed as much as you can smoke a cigarette?

Are the processes is going stucked? mod_status page is telling W everywhere, or running 2300 php processes in the background?

Here are the explain and a solution!

Background

With a good site and a server this problem never happened!  NO - Its not true just can be very rare. As sysadmin got some legacy machines with uncountable these kinds of mistery errors.
LoadBalance not help me out at all nor anything else.

Is the Apache the only affected? - Unfortunately not, but easier to caught.

The problem is in the PHP session what I found randomly when I launched a long task with ajax then I could not navigate on main site. For first I thought caused by browser then I realised that the issue on the server. 

Unfortunately I wasnt the first who got this thing, but maybe we are the first who can offer a reasonable solution for this.


The Session Problem

There is a POSIX call named flock() which is for one reason: locking files. Mostly this function called by C/C++ for first but PHP has its own function with same name, but dont messing them they're different.

When we launch a session from PHP with session_start() the PHP calling this method to put a write lock onto the file until the PHP is running, this is important, because the session as files have to protected against async writes. There is a solution to ask the webdelevopers to use session_write_close() but I dont think that its even possible.

High density or loaded server are more affected because PHP has its own Garbage Collector (GC) which locking these files too and if its already locked or non-exists will be stucked and its become a zombie process however not exactly zombie because its working and not defunct just stucked at flock() and does not do anything and never timeouts.

WWWW__WW__WWW_WWWW___W______WW_____W___W___WWWW___W_____W___K___
____RRRR____W____R_____W__KKK__KKK_____W_____________W__________
W___WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW___W____R______
W_W_____R________WWWWWWWWWWWWWWWWWWWWWWWW_________W______W__WWW_
W__WWWW____WWWWWWWWWWWWWWWWWWWWW_________W__W_________R_WWWWWWWW
W_WWWWWWWWWWWWW__WWWW___W______WWWWWWWWWWWWWWWWWWWW__W______W___
WWWWWWWW___WW_______W__WWWWW___WW__WWWWWWWWWWWWW___WWW__W_WW____
___.............................................................
................................................................
................................................................
................................................................
................................................................

 

Srv PID Acc M CPU SS Req Conn Child Slot Client    
0-0 4339 0/97/97 _ 0.03 22768 59 0.0 1.49 1.49 xxx.xxx.xxx.xxx    
1-0 4340 0/82/82 _ 0.03 13879 81 0.0 1.53 1.53 xxx.xxx.xxx.xxx    

 

When you have a lot of at mod_status page and the Seconds since beginning of most recent request (SS) is too high you are in trouble!
They are spreading and once filled up your server your web will just hanging.

Unfortunately same thing with PHP-FPM and F/CGI.

The killer F5 Button

Remote controller not working and you pressing the buttons stronger?
Your phone is too silent then you louding into it?
Sysadmins be attention, when your webserver not working or slow the visitors will sit down on F5 key.


Until your webserver serve pages under 200ms you are good, but when your page takes about 2-4 seconds to serve will cause the webserver can be killed by pressing F5 for about 2 minutes.

The attackers dont matter who script kiddie or DDoS master they are always try to find out your slowest page, because they can kill your server with less power and more effective,

The Solution

The solution is an apache module that we developed (mod_fence) for the reasons above, which 2in1 because its seeking for stucked processes and closing them if necessary and protecting webserver against F5 killers and DoS / DDoS attacks.

And it has a big advantage it is counting an active / working / busy processes per IP Address instead of Requests / s that means it will triggered when processes is over the limit for each IP.

 

We have a test page, you can try your luck with sleep(30): https://web-flood-test.npulse.net/
And also our module is available on our repo: https://devel.npulse.net/npulse-public/mod_fence