• A
    adamlevy

    This is running core 3.3.1 ATM. I see that latest core is 3.3.3 and I will upgrade tomorrow.

    posted in User help read more
  • A
    adamlevy

    So while increasing the size of the ec2 instance and switching to the memory-medium option has allowed us to catch up on the historical points, I am still noticing a significant memory leak. Here are graphs of our system stats over the past 5 days since I started running mango on a t2.large instance.
    0_1520484562295_mango-stats-latest..png

    We are now working with 8G and the memory-medium option tells java it can use 5G. I have been watching the memory usage steadily climb with periodic jumps once a day around the time when our persistent TCP data sync is scheduled and we get a surge of points.

    Why does the memory consistently grow? This made the system unresponsive again for me at a critical time when I had to demo the system for a potential client. Are we doing something wrong here?

    Thank you

    Adam

    posted in User help read more
  • A
    adamlevy

    Yeah I was just coming to this myself. I wanted to let it run and see how it handled it but I can see that I just need more memory. I'm bumping it up to a t2.large with 8G of memory. It was actually not crashing even though it was at 99% memory usage. But swap was increasing to 50% of the 2G of swap. We'll see how this performs now...

    Thanks for your continued help with this.

    posted in User help read more
  • A
    adamlevy

    Good to know. Thank you again. So far java hasn't run out of memory with the memory-medium ext-enabled but I'm also hitting 98% system memory usage and starting to use swap. But the response times are still OK.

    I increased my query interval from 10s to 90s. I don't think this is the cause of the issue at all but it won't hurt to hit that endpoint less frequently. I will need to reconfigure telegraf to just grab the metrics I want.

    The points waiting to be written are high but are still staying a tad lower than they were before. I think they'll be high as long as mango is catching up on historical point values for awhile. They are peaking around 10k whereas before they were hitting upwards of 15k.

    I intend to disable swagger for production but I have been experimenting with it there as I was instrumenting Mango. Thanks for the reminder though.

    posted in User help read more
  • A
    adamlevy

    Awesome! I will check that out. Is there anyway to view the swagger interface for both v1 and v2 without restarting? or can the swagger interface only be enable for one version at a time?

    posted in User help read more
  • A
    adamlevy

    Thanks for the advice! I will reduce how frequently I hit that endpoint and make the query more specific.

    posted in User help read more
  • A
    adamlevy

    I just realized that the memory-small.sh extension was enabled. I'm bumping it up to medium to see if that helps.

    posted in User help read more
  • A
    adamlevy

    The issue persists. I am really not sure what to do here. I can't move forward with any other work while our server is failing every 40 minutes.

    I am willing to share some access to our instance of Mango or get on the phone to talk this through if that helps.

    This is Adam from iA3 BTW.

    posted in User help read more
  • A
    adamlevy

    The system crashes about every 40 minutes with the update, which appears to be a little longer than it was lasting before. So something was improved by moving to 3.3.1 but not everything... Also I don't start getting errors until the system hits about 69% memory usage.

    I just started collecting points waiting to be written. Here is the graph for the last 30 minutes. There were around 15k points before the crash. Now it is down below 100 which is hard to see on the graph in the picture. I just restarted it though so we'll see if it fails again.

    0_1519862392098_point-values-to-write.png

    I lowered the persistent point value throttle to 5,000,000 as you suggested and I increased the small batch wait time to 20ms and decreased the batch write behind spawn threshold by an order of magnitude to 10,000. I can lower that further if you think it would help but it was previously set at 100,000 so I didn't want to drop it so drastically all at once. I lowered max batch write behind to 6.

    What's strange to me is that I don't think I am seeing abnormal io wait % time.

    0_1519862449856_mango-stats-full.png

    You can see the memory usage hit a cap at 70% and then drop down to around 40% when I restarted mango. Also the usage % spiked as well. The blue line is the user usage %. The system and io usage % is also graphed but it is all well below 10%. Weighted IO Time is maybe a little high but its not spiking with the increased load. so that seems strange. The load average, DB write per second, point value database rate, number of open files, and MangoNoSQL open shards graphs are coming from Mango's /v2/server/system-info endpoint.

    posted in User help read more
  • A
    adamlevy

    Well it is locking up. REST API response times have increased above 2 seconds and are now not responding. And now: Exception .... Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded

    How exactly should I go about tuning the MangoNoSQL database?

    Should I start with the publishers? There are a number of them and that will require me to login to multiple mango instances to adjust them as I don't have JWT tokens set up for all of them yet.

    I'm still watching iotop to see what is going on with mango when it starts to error out.

    posted in User help read more