May 22 2013

5 Steps to Improve E-Commerce Performance for Increased Sales: Backend Performance

External RSS Feed Content provided by Compuware

In the previous post we presented problems encountered by our client TescaraHats (name changed for commercial reasons), a European market leader in manufacturing customized hats. The company quickly realized that e-commerce performance is critical to the success of an e-commerce platform and that sales will not increase just because you have an application online.

x default

We argued that while improving search engine ranking is important, you should never forget about the performance and usability in an e-commerce application. In this episode of our e-commerce performance series, we will analyze the impact that the backend performance has on your online sales.

Improve Backend Performance

The effort put into building a visually appealing and easy to use e-commerce site might be lost if you do not manage the performance problems at the backend.

The e-commerce site at TescaraHats consisted mainly of two tiers: TescaraHats Front (see Figure 1) and TescaraHats DB (see Figure 2). The APM tool used by the Operations team indicated that both tiers had performance problems. The front-end tier (see Figure 1) had most of the operation time spent on the backend; the DB tier (see Figure 2) indicated that most of the time was spent on the network.

02 slow urls

Figure 1. A lot of time spent at the backend processing

03 is it slow query

Figure 2. Most of the backend time is spent on transferring data with database query results

To understand the cause of the performance problem the Operations team initially analyzed the DB tier and discovered that one of stored procedures was always slow in delivering results (see Figure 3). By looking at KPIs of all DB operations (see Figure 4) the team confirmed that the high network time component reported at the DB tier was, in fact, contributed by this single stored procedure: [Update_GalDisplayedhats].

04 which transaction causes delay

Figure 3. One of the stored procedures is always slow in rendering results

05 display hats slow procedure

Figure 4. Only one stored procedure introduces delays on the network

The Operations team wanted to further understand why this stored procedure was so slow. They drilled down to PurePaths representing slow stored procedure to analyze the transaction flow where this stored procedure is executed. The report in Figure 5 shows that most of the time is spent on acquiring connection handle, which usually means that the connection pooling is improperly configured or there are transactions that block the connection.

20 slow db dt trans flow info

Figure 5. Transaction flow representing slow stored procedure indicates problems with acquiring connection handles

The team opened another report showing database calls (see Figure 6) and drilled down through the slow stored procedure to a report (see Figure 7) showing PurePaths where this stored procedure was called. With that report the Operations team could determine which transactions were affected, i.e., where this slow stored procedure was called from. This information helped the team to resolve the problem faster.

21 slow db dt operations

Figure 6. Report showing the most affected database calls at TescaraHats: the stored procedure [Update_GalDisplayedhats] ranks at the top

22 slow db dt pure paths

Figure 7. Report showing PurePaths affected by the slow stored procedure: by seeing where this query is called from we can speed up the problem resolution

What About the Frontend ?

When it comes to backend performance problems caused by the database calls there are some problem patterns that you should check for:

  • When the query execution takes too long you notice higher database time (see Figure 5).
  • When the query produces a lot of data that needs to be pushed through the network your APM tool will show large network time contributing to the database operation time.
  • When there are connection problems you will see them on the database report (see Figure 6).
  • Finally, you will see in the transaction flow report (see Figure 8) when there is a single querycalled multiple times unnecessarily.

05_transaction_flow

Figure 8. For each transaction the same query is called too many times

Once we weeded out all potential causes of problems from the backend, it was time to move closer to the customer and check a number of potential network-related performance problems; continue reading our mini-series on e-commerce performance to learn more.


(This series is based on materials contributed by Pieter Jan Switten, Pieter Van Heck, and Paweł Brzoska based on original customer data. Some screens presented are customized while delivering the same value as out of the box reports.)

Source Article from http://apmblog.compuware.com/2013/05/22/5-steps-to-improve-e-commerce-performance-for-increased-sales-backend-performance/
5 Steps to Improve E-Commerce Performance for Increased Sales: Backend Performance
http://apmblog.compuware.com/2013/05/22/5-steps-to-improve-e-commerce-performance-for-increased-sales-backend-performance/
http://apmblog.compuware.com/feed
about:performance

May 21 2013

How Internet Outages Can Affect Your Application: Outage Analyzer and the adBrite Closure

External RSS Feed Content provided by Compuware

Complexity is the new reality of web and mobile applications with almost no new release going out without the addition of services and applications spread across many different companies. But the reality of this new interrelationship is still the same: If a third party internet outage or issue occurs,, your brand is the one that is affected.

With up to 1,500 distinct third-party services available to choose from around the world, it is sometimes difficult to even identify what a service does when it appears in your applications. This forces your team to not only be fully aware of the components you control, but also to be able to follow the trail of services that extends far outside the code and systems your company manages when issues appear.

Using Compuware Outage Analyzer data, it is now easier to open a window to these services, seeing data collected across all companies and tests to extract patterns that are sometimes hard to find if data is examined on an individual customer basis.

But what is an outage? Well, it means different things to different people. When used in the media, it is sometimes easy to assume that “outage” means that all aspects of the service are unavailable from all places. These Full Service Outages are dramatic but are over-shadowed in number by Partial Service Outages, those that affected either a small geographic area or a few application transactions containing the third-party host call.

In the five months of data studied here (November 2012 through March 2013), Compuware Outage Analyzer tracked 1,413 Full Service Outages and 7,969 Partial Service Outages. So while Full Service Outages may get the most press, a Partial Service Outage is more likely to occur and more likely to affect individual web and mobile applications while leaving other service customer (including your competitors in some cases) completely untouched.

Chart of Service Outages – Full and Partial (1 point equals Service Outages for 1 Service for 1 Month)

In this day of ever-present monitoring, can a complete service outage go undetected for more than a few hours? The answer is yes. On February 1 2013, the ad-serving company adBrite completely halted operations, an event that was announced directly to customers and tracked in the tech media.

Announcement of the adBrite Shutdown

Announcement of the adBrite Shutdown

It should have come as no surprise that the service actually did cease operations on February 1. But during early February, Outage Analyzer tracked a sudden increase in ad-serving outages. When the data was examined more closely, especially by separating Partial from Full Service Outages, half of the Full Service Outages detected in February originated from two adBrite hostnames.

adBrite Outages compared to Other Services for February 2013

adBrite Outages compared to Other Services for February 2013

Drilling into the data deeper indicated that over 200 Compuware APMaaS tests, including a large number of sites that appear on the US Media Benchmarks, continued to contain calls to the adBrite network after it had ceased operations. This poses three potential scenarios

Outage Analyzer Map - February 8 2013

Outage Analyzer Map – February 8 2013

  1. Did companies not know that adBrite was halting operations and just left these calls in place?
  2. Was information of the adBrite shutdown known to some in the company, but not communicated to the people who could remove the hostname from the existing application code?
  3. Or did companies not even know that there were calls to adBrite hostnames in their applications?

The answer will likely vary from company to company. But not closely tracking all aspects of your online applications can lead to service-level blindness such as was seen during this period.

Thankfully, the adBrite situation was likely harmless for most sites, as a missing ad doesn’t usually affect the page presentation or customer experience. But if this had been a critical page component (such as a JavaScript library or a web font file) or a service that provides critical services (online security, site search, or web analytics), an unnoticed outage would have been far more dramatic.

It’s no longer enough to fully understand the third-party services that are included in your site. You now have to have a plan to respond to performance issues and outages that occur with these services so that your customer experience is not affected. Having a Plan B is just not enough – a Plan C, D, E, or even K may be needed to determine how to respond when one of these services has a problem.

Source Article from http://apmblog.compuware.com/2013/05/21/how-internet-outages-can-affect-your-application-outage-analyzer-and-the-adbrite-closure/
How Internet Outages Can Affect Your Application: Outage Analyzer and the adBrite Closure
http://apmblog.compuware.com/2013/05/21/how-internet-outages-can-affect-your-application-outage-analyzer-and-the-adbrite-closure/
http://apmblog.compuware.com/feed
about:performance

May 16 2013

5 Steps to Improve E-Commerce Performance for Increased Sales: Introduction

External RSS Feed Content provided by Compuware

 

The saying “if it doesn’t exist on the Internet, it doesn’t exist1 is reigning truer every day. Nowadays, it is hard to imagine most businesses without an e-commerce platform, let alone without a web presence at all. Since e-commerce is becoming the new standard, e-commerce performance needs to be at its best.

x default

In this blog series, I have come up with several ways to ensure your company’s e-commerce performance success, including: avoiding unnecessary network load, reducing number of (internal) HTTP errors, improving backend performance, understanding your clients, ensuring scalability of e-commerce site and finally understanding sales results through conversion rate.

Our client TescaraHats (name changed for commercial reasons), a European market leader in manufacturing customized hats, decided to expand its market reach with an e-commerce site where its potential customers could choose, customize and order hats online. Since the company’s core competence is in delivering highly customized products, TescaraHats could not simply use an off-the-shelf e-commerce application. It needs a customization wizard so that customers can create a uniquely customized product.

When sales did not increase after the implementation of the e-commerce platform, TescaraHats learned quickly that there is much more to e-commerce than simply putting an e-commerce service online.

Is E-Commerce a Silver Bullet ?

When you start a new business you usually need, at least, two things: 1) to let people know about it and 2) to make sure your employees know how to sell the products. In the case of e-commerce, a lot of businesses initially focus on getting a high page ranking in Google search results, instead of actually selling the products. A common misconception about search engine optimization (SEO) is that people treat it as some kind of voodoo for which an external “shaman” has to be brought in. Moreover, some agencies that specialize in SEO rely too much on the importance of search results for B2B and do a poor job by focusing only on the short-term “hacking” of Google results by creating an enormous network of connected sites which route the traffic to their clients’ web site. Those companies who focus only on the external web of links might eventually suffer the wrath of Google that considers such practices as cheating and takes actions to downgrade or remove fallacious brand from search results;BMW and JCPenney are two examples that learned it the hard way.

What search engines like Google stress is that, just like in case of a brick and mortar location, loud marketing is only part of the story. The business should offer desired products and hire skilled, customer-focused sales people. According to Matt Cutts from Google applying black hat SEO techniques, such as link spamming, will less likely get your site to show up in search results in few months.

TescaraHats is a well-recognized brand and is known as THE customized hats manufacturer in Europe. The company believed that an e-commerce site would be an easy way to boost its sales. However, due to the poor e-commerce performance of the application, users were unhappy with their experiences and the application was not the silver bullet TescaraHats had hoped for.
E-commerce is your business online; therefore it is subject to “web rules.” As Jakob Nielsen writes in his report Did Poor Usability Kill E-Commerce: “You forget the web’s realities? You die.” According to this report, if users cannot navigate easily through an e-commerce site, the site will lose almost half (44%) of its potential sales. On average, users currently only succeed with their purchase 56% of the time. The report concludes: with better usability the average site could increase its current sales by 79%.

Understanding E-Commerce Performance

The usability of an e-commerce (and not only) site is a compilation of many factors; with application performance at the forefront.

The process of ordering a customized hat with TescaraHats e-commerce site consists of three steps (see Figure 1):

  1. The customers choose a type of the hat.
  2. They choose their size, color, add-ons, etc.
  3. Eventually they complete their purchase by submitting the order form, proving shipping and payment details.

tescarahats ecommerce

Figure 1. TescaraHats e-commerce workflow

TescaraHats was preparing for an influx of new orders when the company commenced its e-commerce site, but this was not the case.

Even simple reports from tools like Google Analytics could show that the problem was not getting users to visit the site, but rather getting customers to complete transactions.

In order to see the real reason behind low online sales the Marketing team also consulted the Operations team. The team used an APM tool that could analyze the performance of the data center and application, and show how well the e-commerce site performed depending on geographical location and browser type.

Figure 2 shows an overview report of performance of all major components. The configuration services (Configurator, Configurator hats) experience the worst performance, followed by the components responsible for completing the order (Shopping Cart, Login).

00 portal overview impact fdi

Figure 2. With the health status check report, the Operations Team can quickly determine which infrastructure components experience performance problems and which e-commerce operations are affected.

And What Now?

The TescaraHats use case shows that e-commerce sales results rely not only on pagerank but also on application performance and usability. Over the next 4 posts of this series we will demystify poor sales by looking at some aspects of improving usability of an e-commerce solution through application performance management. We will start with improving backend performance and go all the way to understanding conversion rate.


(This series is based on materials contributed by Pieter Jan Switten, Pieter Van Heck, and Paweł Brzoska based on original customer data. Some screens presented are customized while delivering the same value as out of the box reports.)


  1. Kenneth Goldsmith, Presented at Elective Affinities Conference, University of Pennsylvania, September 27, 2005

Source Article from http://apmblog.compuware.com/2013/05/16/5-steps-to-improve-e-commerce-performance-for-increased-sales-introduction/
5 Steps to Improve E-Commerce Performance for Increased Sales: Introduction
http://apmblog.compuware.com/2013/05/16/5-steps-to-improve-e-commerce-performance-for-increased-sales-introduction/
http://apmblog.compuware.com/feed
about:performance

May 15 2013

APM as a Service: 4 Steps to Monitor Real User Experience in Production

External RSS Feed Content provided by Compuware

With our new service platform and the convergence of dynaTrace PurePath Technology with the Gomez Performance Network, we are proud to offer an APMaaS solution that sets a higher bar for complete user experience management, with end-to-end monitoring technologies that include real-user, synthetic, third-party service monitoring, and business impact analysis.

To showcase the capabilities we used the free trial on our own about:performance blog as a demonstration platform. It is based on the popular WordPress technology which uses PHP and MySQL as its implementation stack. With only 4 steps we get full availability monitoring as well as visibility into every one of our visitors and can pinpoint any problem on our blog to problems in the browser (JavaScript, slow 3rd party, …), the network (slow network connectivity, bloated website, ..) or the application itself (slow PHP code, inefficient MySQL access, …).

Before we get started, let’s have a look at the Compuware APMaaS architecture. In order to collect real user performance data all you need is to install a so called Agent on the Web and/or Application Server. The data gets sent in an optimized and secure way to the APMaaS Platform. Performance data is then analyzed through the APMaaS Web Portal with drilldown capabilities into the dynaTrace Client.

Compuware APMaaS is a secure service to monitor every single end user on your application end-to-end (browser to database)

Compuware APMaaS is a secure service to monitor every single end user on your application end-to-end (browser to database)

4 Steps to setup APMaaS for our Blog powered by WordPress on PHP

From a high-level perspective, joining Compuware APMaaS and setting up your environment consists of four basic steps:

  1. Sign up with Compuware for the Free Trial
  2. Install the Compuware Agent on your Server
  3. Restart your application
  4. Analyze Data through the APMaaS Dashboards

In this article, we assume that you’ve successfully signed up, and will walk you through the actual setup steps to show how easy it is to get started.

After signing up with Compuware, the first sign of your new Compuware APMaaS environment will be an email notifying you that a new environment instance has been created:

Following the steps as explained in the Welcome Email to get started

Following the steps as explained in the Welcome Email to get started

While you can immediately take a peek into your brand new APMaaS account at this point, there’s not much to see: Before we can collect any data for you, you will have to finish the setup in your application by downloading and installing the agents.

After installation is complete and the Web Server is restarted the agents will start sending data to the APMaaS Platform – and with dynaTrace 5.5, this also includes the PHP agent which gives insight into what’s really going on in the PHP application!

Agent Overview shows us that we have both the Web Server and PHP Agent successfully loaded

Agent Overview shows us that we have both the Web Server and PHP Agent successfully loaded

Now we are ready to go!

For Ops & Business: Availability, Conversions, User Satisfaction

Through the APMaaS Web Portal, we start with some high level web dashboards that are also very useful for our Operations and Business colleagues. These show Availability, Conversion Rates as well as User Satisfaction and Error Rates. To show the integrated capabilities of the complete Compuware APM platform, Availability is measured using Synthetic Monitors that constantly check our blog while all of the other values are taken from real end user monitoring.

Operations View: Automatic Availability and Response Time Monitoring of our Blog

Operations View: Automatic Availability and Response Time Monitoring of our Blog

Business View: Real Time Visits, Conversions, User Satisfaction and Errors

Business View: Real Time Visits, Conversions, User Satisfaction and Errors

For App Owners: Application and End User Performance Analysis

Through the dynaTrace client we get a richer view to the real end user data. The PHP agent we installed is a full equivalent to the dynaTrace Java and .NET agents, and features like the application overview together with our self-learning automatic baselining will just work the same way regardless of the server-side technology:

Application level details show us that we had a response time problem and that we currently have several unhappy end users

Application level details show us that we had a response time problem and that we currently have several unhappy end users

Before drilling down into the performance analytics, let’s have a quick look at the key user experience metrics such as where our blog users actually come from, the browsers they use, and whether their geographical location impacts user experience:

The UEM Key Metrics dashboards give us the key metrics of web analytics tools as well as tying it together with performance data. Visitors from remote locations are obviously impacted in their user experience

The UEM Key Metrics dashboards give us the key metrics of web analytics tools as well as tying it together with performance data. Visitors from remote locations are obviously impacted in their user experience

If you are responsible for User Experience and interested in some of our best practices I recommend checking our other UEM-related blog posts – for instance: What to do if A/B festing fails to improve conversions?

Going a bit deeper – What impacts End User Experience?

dynaTrace automatically detects important URLs as so-called “Business Transactions.” In our case we have different blog categories that visitors can click on. The following screenshot shows us that we automatically get dynamic baselines calculated for these identified business transaction:

Dynamic Baselining detect a significant violation of the baseline during a 4.5 hour period last night

Dynamic Baselining detect a significant violation of the baseline during a 4.5 hour period last night

Here we see that our overall response time for requests by category slowed down on May 12. Let’s investigate what happened here, and move to the transaction flow which visualizes PHP transactions from the browser to the database and maps infrastructure health data onto every tier that participated in these transactions:

The Transaction Flow shows us a lot of interesting points such as Errors that happen both in the browser and the WordPress instance. It also shows that we are heavy on 3rd party but good on server health

The Transaction Flow shows us a lot of interesting points such as Errors that happen both in the browser and the WordPress instance. It also shows that we are heavy on 3rd party but good on server health

Since we are always striving to improve our users’ experience, the first troubling thing on this screen is that we see errors happening in browsers – maybe someone forgot to upload an image when posting a new blog entry? Let’s drill down to the Errors dashlet to see what’s happening here:

3rd Party Widgets throw JavaScript errors and with that impact end user experience.

3rd Party Widgets throw JavaScript errors and with that impact end user experience.

Apparently, some of the third party widgets we have on the blog caused JavaScript errors for some users. Using the error message, we can investigate which widget causes the issue, and where it’s happening. We can also see which browsers, versions and devices this happens on to focus our optimization efforts. If you happen to rely on 3rd party plugins you want to check the blog post You only control 1/3 of your Page Load Performance.

PHP Performance Deep Dive

We will analyze the performance problems on the PHP Server Side in a follow up blog. We will show you what the steps are to identify problematic PHP code. In our case it actually turned out to be a problematic plugin that helps us identify bad requests (requests from bots, …)

Conclusion and Next Steps

The intention of this blog was to show you how easy it is to setup your application with Compuware APMaaS and what it delivers – not only for Enterprise Applications that we typically write about but even for applications such as our WordPress blog. Stay tuned for more posts on this topic, or try Compuware APMaaS out yourself by signing up here for the free trial!

Source Article from http://apmblog.compuware.com/2013/05/15/apm-as-a-service-4-steps-to-monitor-real-user-experience-in-production/
APM as a Service: 4 Steps to Monitor Real User Experience in Production
http://apmblog.compuware.com/2013/05/15/apm-as-a-service-4-steps-to-monitor-real-user-experience-in-production/
http://apmblog.compuware.com/feed
about:performance

May 07 2013

Fix Memory Leaks in Java Production Applications

External RSS Feed Content provided by Compuware

Adding more memory to your JVMs (Java Virtual Machines) might be a temporary solution to fixing memory leaks in Java applications, but it for sure won’t fix the root cause of the issue. Instead of crashing once per day it may just crash every other day. “Preventive” restarts are also just another desperate measure to minimize downtime – but – let’s be frank: this is not how production issues should be solved.

One of our customers – a large online retail store – ran into such an issue. They run one of their online gift card self-service interfaces on two JVMs. Especially during peak holiday seasons – when users are activating their gift cards or checking the balance – crashes due to OOM (Out Of Memory) were more frequent which caused bad user experience. The first “measure” they took was to double the JVM Heap Size. This didn’t solve the problem as JVMs were still crashing, so they followed the memory diagnostics approach for production as explained in Java Memory Leaks to identify and fix the root cause of the problem.

Before we walk through the individual steps, let’s look at the memory graph that shows the problems they had in December during the peak of the holiday season. The problem persisted even after increasing the memory. They could fix the problem after identifying the real root cause and applying specific configuration changes to a 3rd party software component:

After identifying the actual root cause and applying necessary configuration changes did the memory leak issue go away? Increasing Memory was not even a temporary solution that worked.

After identifying the actual root cause and applying necessary configuration changes did the memory leak issue go away? Increasing Memory was not even a temporary solution that worked.

Step 1: Identify a Java Memory Leak

The first step is to monitor the JVM/CLR Memory Metrics such as Heap Space. This will tell us whether there is a potential memory leak. In this case we see memory usage constantly growing resulting in an eventual runtime crash when the memory limit is reached.

Java Heap Size of both JVMs showed significant growth starting Dec 2nd and Dec 4th resulting in a crash on Dec 6th for both JVMs when the 512MB Max Heap Size was exceeded.

Java Heap Size of both JVMs showed significant growth starting Dec 2nd and Dec 4th resulting in a crash on Dec 6th for both JVMs when the 512MB Max Heap Size was exceeded.

Step 2: Identify problematic Java Objects

The out-of-memory exception automatically triggers a full memory dump that allows for analysis of which objects consumed the heap and are most likely to be the root cause of the out-of-memory crash. Looking at the objects that consumed most of the heap below indicates that they are related to a 3rd party logging API used by the application.

Sorting by GC (Garbage Collection) Size and focusing on custom classes (instead of system classes) shows that 80% of the heap is consumed by classes of a 3rd party logging framework

Sorting by GC (Garbage Collection) Size and focusing on custom classes (instead of system classes) shows that 80% of the heap is consumed by classes of a 3rd party logging framework

A closer look at an instance of the VPReportEntry4 shows that it contains 5 Strings – with one consuming 23KB (as compared to several bytes of other string objects).This also explains the high GC Size of the String class in the overall Heap Dump.

Individual very large String objects as part of the ReportEntry object

Individual very large String objects as part of the ReportEntry object

Following the referrer chain further up reveals the complete picture. The EventQueue keeps LogEvents in an Array which itself keeps VPReportEntrys in an Array. All of these objects seem to be kept in memory as the objects are being added to these arrays but never removed and therefore not garbage collected:

Following the referrer tree reveals that global EventQueue objects hold on to the LogEvent and VPReportEntry objects in array lists which are never removed from these arrays

Following the referrer tree reveals that global EventQueue objects hold on to the LogEvent and VPReportEntry objects in array lists which are never removed from these arrays

Step 3: Who allocates these objects?

Analyzing object allocation allows us to figure out which part of the code is creating these objects and adding them to the queue. Creating what is called a “Selective Memory Dump” when the application reached 75% Heap Utilization showed the customer that the ReportWriter.report method allocated these entries and that they have been “living” on the heap for quite a while.

It is the report method that allocates the VPReportEntry objects which stay on the heap for quite a while

It is the report method that allocates the VPReportEntry objects which stay on the heap for quite a while

Step 4: Why are these objects not removed from the Heap?

The premise of the 3rd party logging framework is that log entries will be created by the application and written in batches at certain times by sending these log entries to a remove logging service using JMS. The memory behavior indicates that – even though these log entries might be sent to the service, these objects are not always removed from the EventQueue leading to the out-of-memory exception.

Further analysis revealed that the background batch writer thread calls a logBatch method which loops through the event queue (calling EventQueue.next) to send current log events in the queue. The question is whether as many messages were taken out of the queue (using next) vs put into the queue (using add) and whether the batch job is really called frequently enough to keep up with the incoming event entries. The following chart shows the method executions of add,  as well as the call to logBatch highlighting that logBatch is actually not called frequently enough and therefore not calling next to remove messages from the queue:

The highlighted area shows that messages are put into the queue but not taken out because the background batch job is not executed. Once this leads to an OOM and the system restarts it goes back to normal operation but older log messages will be lost.

The highlighted area shows that messages are put into the queue but not taken out because the background batch job is not executed. Once this leads to an OOM and the system restarts it goes back to normal operation but older log messages will be lost.

Step 5: Fixing the Java Memory Leak problem

After providing this information to the 3rd party provider and discussing with them the number of log entries and their system environment the conclusion was that our customer used a special logging mode that was not supposed to be used in high-load production environments. It is like running with DEBUG log level in a high load or production environment. This overwhelmed the remote logging service and this is why the batch logging thread was stopped and log events remained in the EventQueue until the out of memory occurred.

After making the recommended changes the system could again run with the previous heap memory size without experiencing any out-of-memory exceptions.

The Memory Leak issue has been solved and the application now runs even with the initial 512MB Heap Space without any problem.

The Memory Leak issue has been solved and the application now runs even with the initial 512MB Heap Space without any problem.

They still use the same dashboards they have built to troubleshoot this issue, to monitor for any future excessive logging problems.

These dashboards allow them to verify that the logging framework can keep up with log messages after they applied the changes.

These dashboards allow them to verify that the logging framework can keep up with log messages after they applied the changes.

Conclusion

Adding additional memory to crashing JVMs is most often not a temporary fix. If you have a real Java memory leak it will just take longer until the Java runtime crashes. It will even incur more overhead due to garbage collection when using larger heaps. The real answer to this is to use the simple approach explained here. Look at the memory metrics to identify whether you have a leak or not. Then identify which objects are causing the issue and why they are not collected by the GC. Working with engineers or 3rd party providers (as in this case) will help you find a permanent solution that allows you to run the system without impacting end users and without additional resource requirements.

Next Steps

If you want to learn more about Java Memory Management or general Application Performance Best Practices check out our free online Java Enterprise Performance Book. Existing customers of our APM Solution may also want to check out additional best practices on our APM Community.

Source Article from http://apmblog.compuware.com/2013/05/07/fix-memory-leaks-in-java-production-applications/
Fix Memory Leaks in Java Production Applications
http://apmblog.compuware.com/2013/05/07/fix-memory-leaks-in-java-production-applications/
http://apmblog.compuware.com/feed
about:performance