Creating an Error Monitor

Last year I created a utility to help monitor our deploy quality, as well as spot new errors appearing in production.

We parse and store our error logs to our database. These Tomcat logs are parsed to fields of:

  • Runtime Class (that the error generated came from)
  • The full stacktrace
The final result: An Error Monitor

Technology Used

Logic

In Grails I created a controller to hash out my ideas for collecting and organizing error log data. Initially there were two main pieces of logic:

  • Error Counter
  • Graph

Graphs could be both a count of aggregate errors (all errors) in a time range, or a graph of specific counts on a specific error.

Controller: Error Counter

%MINIFYHTML66b5f0f197fafd4ce78c2440eb323a9223%
def logMap = [:] Integer.metaClass.mixin TimeCategory Date.metaClass.mixin TimeCategory TimeZone.setDefault(TimeZone.getTimeZone('UTC')) def now = new Date() def yesterday = now - 1.day def sql = Sql.newInstance(... you create your connect string here ...) sql.eachRow("SELECT error_name, stacktrace_detail FROM my_logs where date_created > ${yesterday} and error_level = 'error' ORDER BY date_created "){ row -> def errorName = row[0] if(logMap.size() == 0 || logMap.get(errorName) == null){ logMap.put(row[0], 1) }else{ def count = logMap.get(errorName) logMap.put(row[0], count + 1) } sql.close() logMap = logMap.sort { -it.value }
Code language: JavaScript (javascript)

I am using a simple Groovy to to connect to the datasource itself and pull in selected data.

Time ranges are simply set using the Groovy TimeCategory libraries. This allows simple syntax like: now – 1.day.

What’s returned is a map of the name of the error and the counts of it occurring in the specified SQL Query.

Controller Action: Graph

def dataArray = [] def errorsByDay = [:] def myRange = 24..0 // Looping in desc order myRange.each { n -> // Dynamically create date range to iterate through def moment = now - (n).hours def nextMoment = now - (n+1).hours sql.eachRow("SELECT count(*) FROM my_logs where date_created < ${moment} and date_created > ${nextMoment} and log_level='error'") { row -> def count = row[0] errorsbyDay.put(moment, count) } sql.close() dataArray.add(["'${nextMoment.format('MM/dd HH:mm')}'", errorsByDay[moment]])
Code language: JavaScript (javascript)

Similar concept – this select is just getting a count of all records with a log_level (field) with the value “error.”

These time stamps create an hour time slot, and I’m moving that hour time slot along the 24 hour range.

This process loops until all the counter ends, and we have a full set of dates and errors in the map.

This dataArray is required for the Google Visualization library. This library expects me to pass in data like so [[“6/7 10:22”, 8],[“6/7 11:22”, 22], etc.

View: Google Visualizations

Below is the JavaScript that pulls in the Google Visualization libraries and sets up the data to read.

<script type="text/javascript" src="https://www.gstatic.com/charts/loader.js"></script> <script type="text/javascript"> google.charts.load('current', {'packages':['line']}); google.charts.setOnLoadCallback(drawChart); function drawChart() { var data = new google.visualization.DataTable(); data.addColumn('string', 'Hour (UTC)'); data.addColumn('number', 'Error Count'); <g:applyCodec encodeAs="none"> data.addRows(${(timeSeries)}); </g:applyCodec> var options = { chart: { title: 'Total Errors by Hour (UTC)', subtitle: '24HR Period', height: 300, backgroundColor: '#37598a', lineWidth:10 }, chartArea:{ backgroundColor: { fill: '#37598a', opacity: 80, lineWidth:10 } }, backgroundColor: '#37598a', vAxis: {direction: -1, textStyle:{color:'#FFF'}}, hAxis: {format: '##:##', textStyle:{color:'#FFF'}}, colors: ['#609BF0'], series: { 0: { lineWidth: 5} }, titleTextStyle:{color:'#FFF'}, legend:{textStyle: {color:'#d2d8ff'}} }; var chart = new google.charts.Line(document.getElementById('curve_chart')); chart.draw(data, google.charts.Line.convertOptions(options)); }
Code language: HTML, XML (xml)

I initially hit a hurdle where sending data to the view via a GString brought the data in encoded. The browser’s inspector showed the data just like Google expected, but a co-worker/friend of mine suggested I look at the raw source and I saw that the code appeared with encoding:

[[&#39;05/18&#39;, 8]]

The above won’t work with that JavaScript library, so I dug into this a bit and found there’s a GSP tag library called encodeAs. Setting this parameter to none will resolve the encoding issues.

<g:applyCodec encodeAs=”none”>

Notice on the vAxis params, I specified a direction of -1. Without that the data renders in reverse order. Further along the page I set up a div for the graph like so:

<div class="well-lg" align="center"> <div id="curve_chart" class="well-lg"></div> </div>
Code language: HTML, XML (xml)

The data came in, loaded into the Google JavaScript Visualization library and rendered out the appropriate graph.

The second half of the page needs to have tabular data (as seen above in the completed project.)

The first column of this example data is the error name itself, which links to a detail page of the stack trace details itself. Next to the error name, I wanted the counts to appear in red(how many times this error appears in the logs during the specified time frame.)

The second column offers two buttons for two timeframes on this specific error. So unlike the aggregate graph above, clicking one of these buttons will load a graph ONLY for that specific error over a timeframe.

In the view I built out a table like so:

<div class="pre-scrollable"> <table ><th > Error counts by Error Name (24HR) </th><th > Time Range Data </th><g:each in="${results}" var="result"><tr ><td> <g:link type="link" action="errorDetails" params="[v:result.key]" class="btn btn-link tool"><font color="white"><b> ${result.key}</b></font> <span class="badge badge-light float-xs-right">${result.value}</span></g:link></td><td><g:link action="timeSeriesIntraDay" params="[v:result.key]" class="btn btn-info btn-xs" >24HR</g:link> <g:link action="timeSeries" params="[v:result.key]" class="btn btn-info btn-xs">15 Day</g:link></td></tr></g:each> </table> </div>
Code language: HTML, XML (xml)

The Grails each tag iterates the data, and in this case, it creates a new row <tr> for each value.

The GString ${results} is calling the returned value results from the controller.

A link/button is setup to pass the parameter of the key (error name), to an action called errorDetails.

I also constructed graphs on different time frames, like this “one hour” graph:

Conclusion

Building monitoring tools is a useful skill set for Quality Assurance and DevOps. These tools will help find error deltas after each release (are we introducing more errors, or less?)

Certainly products like Tableau are easier to make dashboards. I could do something similar in Tableau in about 20m. If you’re going to create a dashboard like you’re going to pay $800+ for Tableau (which isn’t a bad price.)

There are other languages and frameworks. I picked Grails as it was simple and easy to setup on a VM with a Tomcat instance.

About Author /

Leave a Comment

Your email address will not be published.

Start typing and press Enter to search