Wednesday, October 30, 2013

Looking for memory leaks



Several of our web applications installed on Tomcat were persisting threads even after reloading the app. This was causing the entire WebappClassLoader instance to be held in memory. The first week, we just bumped up the memory. In the second week when this was still not satisfying Tomcat, we started debugging.

The inbuilt capability of Tomcat to detect memory leaks is pretty awesome. All you have to do is click on that "Find Leaks" button on the manager gui, and the most obvious leaks will come up. Of course, we did not know about it, and we tried to analyze memory dumps to finally figure out that each app had two or three WebappClassLoader instances in memory.

The second most obvious place to look, which we had conveniently avoided was the log. Most applications were logging interesting things about threads not being cleaned up, and we chose to ignore them till something hit the fan.

On to learnings.

1. Understand what your libraries do. 

HibernateSessionFactory needs to be closed when you reload the app(duh...). And if you are using connection pools (and you should, if you don't), and if you are using c3p0 (which is pretty neat), you should know that sessionFactory.close() does not clean up these threads. Some special plumbing is required here. Something along the lines of

    public static void closeSessionFactory() {
        SessionFactoryImpl sf = (SessionFactoryImpl) sessionFactory;
        ConnectionProvider conn = sf.getConnectionProvider();
        if (conn instanceof C3P0ConnectionProvider) {
            ((C3P0ConnectionProvider) conn).close();
        }
        sessionFactory.close();
    }

Put it wherever you get a hook to close your application, but definitely put it. We call this from a servletContextListener. If you don't, you will find this in your logs.
SEVERE: The web application [/app] appears to have started a thread named [Timer-25] but has failed to stop it. This is very likely to create a memory leak.
SEVERE: The web application [/app] appears to have started a thread named [com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#0] but has failed to stop it. This is very likely to create a memory leak.
SEVERE: The web application [/app] appears to have started a thread named [com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#1] but has failed to stop it. This is very likely to create a memory leak.
SEVERE: The web application [/app] appears to have started a thread named [com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#2] but has failed to stop it. This is very likely to create a memory leak.


2. If you start threads in your application, make sure they are stopped on app unload. 

If you are using Quartz schedulers in your app, make sure to shut them down when the app unloads. We found this through an error on catalina.out
SEVERE: The web application [/app] appears to have started a thread named [App-Scheduler_Worker-1] but has failed to stop it. This is very likely to create a memory leak.


3. If you are going to use deprecated code, don't come back complaining. 

In our case, we were using the Spring RestTemplate to send some http requests. To set some timeouts, we were using the following code.

    public void setTimeout(int replyTimeoutInMilliseconds) {
        if (!isTimeoutSet) {
            try {
                CommonsClientHttpRequestFactory requestFactoryWithTimeout = new CommonsClientHttpRequestFactory();
                requestFactoryWithTimeout.setReadTimeout(replyTimeoutInMilliseconds);
                restTemplate.setRequestFactory(requestFactoryWithTimeout);

                isTimeoutSet = true;
            } catch (Throwable e) {
                logger.error(e.getMessage(), e);
            }
        }
    }

The problem here was that CommonsClientHttpRequestFactory was deprecated, and was causing hanging threads. The logs helped here too.
SEVERE: The web application [/openerp-atomfeed-service] appears to have started a thread named [MultiThreadedHttpConnectionManager cleanup] but has failed to stop it. This is very likely to create a memory leak.

And the straightforward fix.
    public void setTimeout(int replyTimeoutInMilliseconds) {
        if (!isTimeoutSet) {
            try {
                HttpComponentsClientHttpRequestFactory requestFactoryWithTimeout = new HttpComponentsClientHttpRequestFactory();
                requestFactoryWithTimeout.setReadTimeout(replyTimeoutInMilliseconds);
                restTemplate.setRequestFactory(requestFactoryWithTimeout);

                isTimeoutSet = true;
            } catch (Throwable e) {
                logger.error(e.getMessage(), e);
            }
        }
    }

4. Read the docs when you get time. Or at least when its crunch time. Tomcat has a very good wiki page on why memory leaks typically happen, and is a must read.

Enough learning for one day. 

Thursday, November 22, 2012

Being cryptic

Interesting way of finding the square root of a number. This uses a recursive approach, starts with a seed of 1 and uses the Newton's method to find the square root. Took 5 minutes to create (although I already cannot understand how this works). The test passes, so it must work.




public class SquareRootFinder {
    public double root(double x) {
        return root(1, x);
    }

    private double root(double guess, double x) {
        return (guess * guess - x > 0 ? guess * guess - x : -(guess * guess - x)) < 0.001 ? guess : root((guess + x / guess) / 2, x);
    }
}



Warning: Use judiciously. This is meant for pleasure, not business. 

Tuesday, January 3, 2012

Log4net does not drop appenders so easily

Yesterday, we had to make a log4net change in a running system. Basically, we had to add a new root logger at a FATAL level. Like thus.
    <root>
        <level value="FATAL"></level>
        <appender-ref ref="RollingFileAppender">
    </root>
Don't ask me why there was no root logger in the first place. It is too long a discussion, and I don't want to bore you with the details. Moreover, that is not the point.

We added the logger to look at some specific logs that we wanted to inspect. Once it was done, we removed the root logger. We had specified log4net to watch the file, and we expected the logging to go away. However, it didn't.

The logs were growing at around 1MB per minute, and we didn't have the luxury of bouncing the server. It was by trial and error that we realised that removing the logger was not the right way of stopping the logs, but removing the appender was. We put back the root logger without the appender, and things worked like a charm.
    <root>
        <level value="FATAL"></level>
     </root>

Wednesday, December 28, 2011

Expose SQLAgent job run times as perfmon counters

Came across this blog that provides a way to hook up perfmon counters from procedures inside SQL Server.  We had a requirement to expose the run time of a SQLAgent job as a perfmon counter. Using information from the blog with the sysjobs and sysjobhistory table, I created a little procedure that takes in a SQLAgent job name and posts the run results (precision of seconds) into one of the 10 perfmon counters exposed by SQL Server.

Here is the procedure for those who are interested.

Sunday, September 25, 2011

Interview experience

Our company has a long and interesting process of recruitment. There are slightly different procedures for different roles. I come across several candidates, and make decisions based on their actions and behavior at the time of the interview. I won't elaborate on the recruitment process as it is easy to find over the web, but will discuss my views on the interview alone.

Most of my interview candidates were undergraduate students. I do not have too many free weekends, which mostly when experienced candidates are interviewed. My criteria for undergraduate students is to ensure that the candidate has proven interest in the field, learns fast and is enthusiastic in general.

Proven interest in the field - A person who is interested in the software field would have written atleast one decent program. He/she would have the passion in programming, and through it would have come across atleast a few interesting concepts. A few questions directed to the project, or a problem that we give to solve would give this away. I would also skim around different pieces of the curriculum trying to find out areas that the candidate is strong in. If one area seems good, ask a few questions that provoke more thought in the field and confirm interest.

Learns fast - This is a bit tricky to find. What I normally do is to give questions in a field that the candidate does not know about, and nudge him/her to find a solution. Per my scale, atleast a few logical solutions need to be arrived at.

Enthusiastic - This one is easy. One of the reasons this is important is because I want the environment that I work in to be intellectually stimulating. I don't want to be a teacher or a learner all the time. I also want people working with me to engage in conversations that I think interesting. I try to anwer the question- "Would I be happy working with this person? "

Now, is it possible to develop a process by which an interviewer is able to come to an unambiguous decision? If so, would that result always be the right one for both parties? I am not sure. I have not worked with the candidates that I have interviewed. I do not get feedback from candidates who are not selected. I am not even sure if the guideline that I keep for myself is correct. Personally, we try to receive feedback about the process to candidates informally. However, for a company that thrives on a culture of open and continuous feedback, I feel we lack enough in this field. Does anyone seen other places where this is done better? 

Saturday, September 17, 2011

nant mkiisdir misbehaving on IIS7


A few days ago, I came across a strange issue in the standard nant mkiisdir task. Trying to create a virtual directory spit out the following stack trace.

Error creating virtual directory 'App' on 'localhost:80' (website: Default Web Site).:
NAnt.Core.BuildException:
Error creating virtual directory 'App' on 'localhost:80' (website: Default Web Site). ---> System.IO.DirectoryNotFoundException: The system cannot find the path specified.
   at System.DirectoryServices.Interop.UnsafeNativeMethods.IAds.SetInfo()
   at System.DirectoryServices.DirectoryEntry.CommitChanges()
   at NAnt.Contrib.Tasks.Web.CreateVirtualDirectory.GetOrMakeNode(String basePath, String relPath, String schemaClassName)
   at NAnt.Contrib.Tasks.Web.CreateVirtualDirectory.ExecuteTask()
   --- End of inner exception stack trace ---
   at NAnt.Contrib.Tasks.Web.CreateVirtualDirectory.ExecuteTask()
   at NAnt.Core.Task.Execute()
   at Macrodef.MacroDefInvocation.ExecuteInvocationTasks(XmlNode invocationTasks)
   at Macrodef.MacroDefInvocation.Execute()
   at Macrodef.MacroDefTask.Invoke(XmlNode xml, Task task)
   at Macrodef.MacroDefTask.ExecuteTask(String name, XmlNode xml, Task task)
   at nant984559eed3f243d4a16c2f92ed27e81c.ExecuteTask()
   at NAnt.Core.Task.Execute()
   at NAnt.Contrib.Tasks.NestedTaskContainer.ExecuteChildTasks()
   at NAnt.Contrib.Tasks.NestedTaskContainer.Execute()
   at NAnt.Contrib.Tasks.ChooseTask.ExecuteTask()
   at NAnt.Core.Task.Execute()
   at NAnt.Core.Target.Execute()
   at NAnt.Core.Project.Execute(String targetName, Boolean forceDependencies)
   at NAnt.Core.Project.Execute()
   at NAnt.Core.Project.Run()

Per code in NantContrib (and per the stack trace given), we are using the ADSI interface of IIS to create virtual directories. I checked for prerequisites to run the task. I was running as an admin. UAC was disabled on the system. IIS 6 compatibility was installed (it was a win 2008 server running IIS7). I was able to create the virtual directory manually from the inetmgr console. I looked on the web for other possible causes. None of the links proved useful.

After a lot of "Google" research and head banging, I happened to take a look at the metabase of the IIS. The metabase is a bunch of configurations present in the C:\windows\System32\inetsrv\config directory. The applicationHost.config had some random entries corresponding to "App", and a mysterious app pool named "@app-pool-name@". These were never being removed from the config when I deleted these entries from the console. Deleted this and ran the task again. This time, the task ran like a charm.

Tuesday, May 24, 2011

Mac a non-developer machine?


One of my friends just told me that a Mac is not a developer's machine. Of course, I hear a lot of people saying that  a Mac is an artist's computer. That it is for dumb computer illiterates who just want stuff done. That it is just beauty and no brawns. 

I am a Mac user, and a happy one at that. I am an application developer by profession, and love my Macbook that I use as my home laptop. I use a Dell Latitude for my office purposes where I do .Net programming. I have no qualms about the Dell, but I just love my Mac. And I fully disagree with the statement that a Mac is not a developer machine. Here's why. 

It is good enough to do the kind of number crunching that I expect out of a dev machine. Of course, it depends on what machine you've got. I have a MacBook Pro 13''  with 4GB RAM running Leopard. And a lot of Macs definitely come with good (enough?) hardware. 


Capabilities. The great UI does not prevent me from getting down and dirty. 

  • I can code in shell scripts, Java, C, Python, Ruby and a lot of other languages. 
  • Several servers/databases can be hosted on a Mac. 
  • There are several application development platforms on it, if you are an IDE guy. 
My argument is NOT that any of these are not available on a Windows system. Just that unless you are doing MS stuff, or OS internals, or a lot of stuff that limit you to a specific operating system, I can assure you will feel at home on a Mac just like you would on a Windows machine. It comes installed with Apache, Ruby, JDK, Python and a lot of stuff that I care about. And you get Ant, cc etc by downloading the developer package. It is not a huge plus point, but I feel good knowing that the guys building the system had developers in their mind too. 

The system runs pretty smooth. Not too many crashes, not too much of slowdown every day, and I have never come across malware issues. 

I am somehow attracted to the nice way they've built the *nix systems. I am, in fact addicted by it. One of the beauties they've really got right is the simple and powerful command line. Maybe I am not well versed with the terminals that Windows provides, but I sometimes feel it being very limited in its capabilities and not as well organized. This, and the fact that they've built some great UI make it very appealing. 


Ok. I get tired of reasoning very soon, so let me finish with this. I call a machine a developer machine if the machine is powerful enough for my needs, can run the necessary software required for my development, is not a pain in the ass to maintain, and if you like using it in general. I know that the first and third points hold true for a mac. Point 4 holds good for me. The second depends greatly on what you are working on, so that would probably be the only reason I would have to choose between operating systems. Both Windows and OS-X are monsters. Both have the capability to start running slow. Both have their own quirks and gotchas. It is like a "choose your poison" thing, but Windows definitely does not trump OS-X completely when it comes to choosing a developer machine.