Setting up an ELK Logging Server
-
Update: DigitalOcean has a new build of the ELK image that is fully up to date since I started this thread and you need it in order for things to work. If you are experiencing the issues that I listed above, stop and start over with the latest build. Things "just work" again. I already have CentOS running on CloudatCost sending logs over to ELK on DigitalOcean.
-
If you have a central jump server like we do, it is super easy to push out keys. Once you have the key in place on the Jump server, you can do this to update it at client machines (very easy to script.)
scp /etc/pki/tls/certs/logstash-forwarder.crt root@dny-lnx-pbx1:/etc/pki/tls/certs/
-
Just got a third server feeding into the ELK system. This is working perfectly after the latest update.
-
Here is my working /etc/logstash-forwarder configuration file (x.x.x.x = my IP address, of course)
{ "network": { "servers": [ "x.x.x.x:5000" ], "timeout": 15, "ssl ca": "/etc/pki/tls/certs/logstash-forwarder.crt" }, "files": [ { "paths": [ "/var/log/messages", "/var/log/secure" ], "fields": { "type": "syslog" } } ] }
-
Next step is to see if the ElasticSearch YUM repos work for this, because that will be far better than the one off RPM install that DO has us doing in their docs. So let's see.
Here is the docs from ELK.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-repositories.html
-
Here is what some heavy log ingest looks like on the CPU...
-
This is working awesome now, several servers are feeding in and the reports look fantastic.
-
Here is a view of the log reading portion of the interface.
-
Here is digging into the details of a single log entry:
-
Here is the SAR report for the server. Remember we are running at half the cores, half the memory that is recommended - mostly just as an experiment to see how much is really needed for things to be responsive. And so far, ingesting five servers, it is working just fine. We will be adding more servers and keeping an eye on things to see how the performance is and will grow the server if we need to. We are trying to learn from this so that we will have better capacity information. But for a smaller company it looks like a very small server will work just fine. No question that the server is busy, but now that it is up and running and no longer handling the initial setup, it's nowhere near being fully loaded.
02:25:01 PM CPU %user %nice %system %iowait %steal %idle 02:35:01 PM all 12.91 19.61 4.53 0.37 0.00 62.59 02:45:01 PM all 2.68 6.86 2.34 0.20 0.00 87.91 02:55:01 PM all 2.73 6.42 2.25 0.21 0.00 88.40 03:05:01 PM all 2.26 9.77 2.07 0.19 0.00 85.71 03:15:01 PM all 3.56 6.49 2.57 0.30 0.00 87.07 03:25:01 PM all 3.52 12.39 2.90 0.26 0.00 80.93 03:35:01 PM all 2.97 6.45 2.37 0.27 0.00 87.95 03:45:01 PM all 2.54 11.15 2.17 0.17 0.00 83.97 03:55:01 PM all 1.44 5.42 1.69 0.10 0.00 91.35 04:05:02 PM all 0.98 4.86 1.52 0.06 0.00 92.58 04:15:01 PM all 1.54 5.07 1.75 0.09 0.00 91.54 04:25:01 PM all 1.52 10.37 1.91 0.11 0.00 86.10 04:35:01 PM all 3.74 6.99 2.65 0.23 0.00 86.38 04:45:01 PM all 3.11 10.70 2.42 0.24 0.00 83.53 04:55:01 PM all 1.02 5.07 1.59 0.05 0.00 92.26 05:05:01 PM all 1.76 5.64 1.89 0.15 0.00 90.57 05:15:01 PM all 0.93 9.27 1.64 0.05 0.00 88.11 05:25:01 PM all 1.71 5.45 1.86 0.13 0.00 90.85 05:35:01 PM all 2.58 5.40 2.24 0.14 0.00 89.64 05:45:01 PM all 4.18 11.75 2.92 0.25 0.00 80.90 05:55:02 PM all 3.16 5.85 2.13 0.26 0.00 88.60 06:05:01 PM all 3.54 6.36 2.32 0.20 0.00 87.58 06:15:01 PM all 3.14 10.63 2.14 0.16 0.00 83.92 06:25:01 PM all 4.87 11.22 3.27 0.24 0.00 80.40 Average: all 9.22 10.60 3.03 0.41 0.00 76.74