Quantcast
Channel: linuxadmin: Expanding Linux SysAdmin knowledge
Viewing all articles
Browse latest Browse all 17770

CENTOS 7 freezing after running for a week -help!

$
0
0

system: Supermicro Server w/2 Xeon E5 v2 Core I7 processors

Memory: 128GB

LSI RAID: 928CV-8e with 48 bay Supermicro enclosure

Storage: 62TB

Apps: MYSQL & JBOSS 4.2.3ga

This server is running as a JBOSS server solely. It runs a website (DCM4CHEE Archive) which is a java application that accepts DICOM images 24/7.

JAVA Settings:

JAVA_OPTS="-Xms25g -Xmx25g -XX:PermSize=256m -XX:MaxPermSize=256m -XX:NewSize=12g -XX:MaxNewSize=12g -XX:SurvivorRatio=1 -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -Xloggc:/opt/dcm4chee-2.18.0-mysql/A4-gc.log-date +%Y-%m-%d-%H-%M -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=8M -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCCause -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000"

I have JBOSS setup as a service and after a week or so, the application comes to a crawl and I can no longer send dicom studies to it over port 3140. I keep a SSH session up for instances like this and when this happens I can't get any response from the terminal. Trying a 'free -m' for example may never come back, but yet sometimes it will. I then will have to be physically infront of the server and do a 'ctrl-alt-F2' to get to a terminal and then usually I can kill the PID, but not through a SSH session.

this time 'free -m', came back with:

 total used free shared buff/cache available Mem: 128741 127563 632 81 546 775 Swap: 4095 4095 0 

swappiness is set to '10'

At this point I usually have to do 'ps aux | grep java' find the pid and do a 'kill -9 PID of java grep' After this, its all hunky dory and I can wait for my cron job to watch if java is running or not and it will stop/start my service again.

here is about three days & 16 hours of my JAVA log opened with jClarity (i'll include this soon)

Where should I start looking for the cause? Do I need more memory? I can't use a newer version of JBOSS because that is the latest version that is supported by this software as of now, even though JBOSS 7 is in development.

At first I thought I needed to do some more JAVA tuning, but I think I have that pretty well tuned, but I'm no expert at all. I have pretty low pause times overall with no full GC collections.

As you can imagine the server locks up at the most inopportune time, and then our ER dept starts calling me.

(I know this formatting looks very bad and its hard to read- I'm sorry)

submitted by docjay
[link][2 comments]

Viewing all articles
Browse latest Browse all 17770

Trending Articles