• P
    Pedro

    @phildunlap said in Times series database storage format, point arrays, and downsampling:

    quite a lot going on there!

    Yes, I was considering whether to split my post into different topics. Thank you for all your answers.

    The POST /rest/v2/script/ run allows you to submit a script and get the results of its run.

    That sounds very interesting. I just completed a python script using XSRF token. Nice: I like how it eliminated the need to login.

    One could easily downsample their data via script, which I can provide a simple or efficient example of if desired.

    If you already have an existing script, it would be nice if you could post it under its own forum topic, as I'm sure many of us would like to downsample old data. In addition to downsampling old data, I have numerous points that were logged much too often due to initially setting a log tolerance threshold that was too small. However, I won't have time to run this script right away due to my other project.

    there is no data type that is "array of data type: numeric" and handled in Mango as such.

    You understood me correctly. I'm looking to store an array of readings (as in multiple channels for each timestamp). Basically, a 2D numerical array where the rows are for different timestamps and the columns are the same type of data type but from different sources (channels). If it were stored in a CSV or spreadsheet, it would look like this:

    timestamp t+0, channel[1],channel[2],channel[3],channel[4],channel[5],...,channel[1000]
    timestamp t+1, channel[1],channel[2],channel[3],channel[4],channel[5],...,channel[1000]
    timestamp t+2, channel[1],channel[2],channel[3],channel[4],channel[5],...,channel[1000]
    timestamp t+3, channel[1],channel[2],channel[3],channel[4],channel[5],...,channel[1000]
    timestamp t+4, channel[1],channel[2],channel[3],channel[4],channel[5],...,channel[1000]
    ...
    

    It seems to me that in order to reduce data redundancy (by not storing the same timestamp multiple times) I could store the data in HDF5 format. HDF5 includes the metadata for the stored information, so the data can be retrieved into a meaningful format using generic tools, even without the source code that stored it. Additionally, it can efficiently compress and decompress binary data such as numerical arrays. My array elements could be any number of bytes. HDF5 is also extremely fast.

    Summary Points - Benefits of HDF5

    • Self-Describing The datasets with an HDF5 file are self describing. This allows us to efficiently extract metadata without needing an additional metadata document.
    • Supports Heterogeneous Data: Different types of datasets can be contained within one HDF5 file.
    • Supports Large, Complex Data: HDF5 is a compressed format that is designed to support large, heterogeneous, and complex datasets.
    • Supports Data Slicing: "Data slicing", or extracting portions of the dataset as needed for analysis, means large files don't need to be completely read into the computers memory or RAM.
    • Open Format - wide support in the many tools: Because the HDF5 format is open, it is supported by a host of programming languages and tools, including open source languages like R and Python and open GIS tools like QGIS.

    I also found TsTables, a PyTables wrapper that will enable storing time stamped arrays into HDF5 files in daily shards and seamlessly stitch them together during queries. Appends are also efficient. The HDF5 tools will also help for debugging whether any inconsistency is occurring during the read or write operation.

    posted in Mango Automation general Discussion read more
  • P
    Pedro

    I'm looking into reading over 2,000 new datapoints, each reporting once per minute, and storing them either in Mango or in another time-series database. Most values are 16 bits wide, but many may be only 8 bits wide, and all could be stored with the same timestamp each minute. This is a lot of data, so I'm looking to store it efficiently, and then probably analyze and visualize it with the Anaconda Python toolset.

    I currently have almost 1,300 enabled points in a Mango installation. I recommend the command line tool jfq to use JSONata queries to quickly answer such questions:

    $ jfq '$count(dataPoints[enabled=true].[name])' Mango-Configuration-Aug-01-2019_000500.json
    1289
    The .[name] is not needed, but I like to list the point names as a sanity check before invoking the count.

    This brings me to Mango's NoSQL time-series database:

    1. How does Mango's time-series database store binary or numerical values?
    2. Are all point values stored as 64 bits? Even binary values?
    3. Is each TSDB entry 128 bits? (64bit timestamp+64bit value)?

    Each four 16-bit readings would be transmitted packed into 64-bit wide words, with the same timestamp from each data source, either over MQTT-SN into a broker and into the Mango MQTT client subscriptions, or over CoAP and REST PUTs. Those four stored numbers can then be used to generate the additional two numbers I need for each timestamp; incidentally, if you ever integrate an MQTT broker into Mango, please be sure it supports MQTT-SN like the Mosquitto broker. Values can therefore be calculated upon retrieval; storing such values would be redundant and bloat storage needs by another 50%.

    To summarize, I could pack four 16-bit values into a 64-bit pointvalue, and then generate the remaining two points when I read and unpack the pointvalue, thus generating my desired six 16-bit values. Unpacking could almost be done with metadata functions, but metadata functions are triggered when the data enters Mango rather than when the data is retrieved from Mango's TSDB.

    • Have you considered implementing ad-hoc metadata functions, where the data is generated when it is retrieved rather than when it is stored? This would reduce the need to store data that can simply be calculated from other stored points. Depending on the Ad-hoc metadata Logging properties , calculated points could either be stored or recalculated each time they're retrieved. Metadata point value calculations are performed on retrieved stored data now by clicking on the metadata point source Generate history function, but they're not ad-hoc in that they are not automatically generated when the metadata point is queried for data that is does not have.

    The most efficient storage in my case would be where each 2,000 element row begins with a single timestamp, and each column is a different data value. Since the timestamp will be the same for all the points, that timestamp could be stored only once. Rare missing values could be stored with a special value like a NaN. This avoids the need to write 2,000 identical timestamps to disk. Calculating the additional 2/4 points upon value retrieval from disk would further reduce the data storage demands by 1/3.

    • Is there any way to handle point arrays in Mango, where each array index is a different point? e.g. voltage[0], voltage[1], ... voltage[1000]?
    • As the data ages, it could be down-sampled (decimated) from 1-minute intervals to 10-minute intervals. Most time-series databases support automatic downsampling. Are there any plans to add automated downsampling to Mango? Purging should not be the only option for old point data.

    I've considered using influxDB. Although they have downsampling policies, it seems that they do not support arrays; the closest data element they support is a 64-bit number. Consequently, I would still have to pack and unpack my 16-bit values. One consequence of having to pack and unpack values is that it makes it unlikely that the visualization tools would render the stored data directly. This forces me to choose between direct visualization and storage efficiency.

    Thank you for your input.

    posted in Mango Automation general Discussion read more
  • P
    Pedro

    As predicted, the Point values to be written has continued to be fairly flat since yesterday's configuration change. It is fluctuating between 276,180 values and 276,230 values, except for a brief spike to 276,300 values. It is not trending up or down. I wonder what will happen next time I generate history on a Metadata point.

    posted in User help read more
  • P
    Pedro

    @phildunlap said in Internal data source attribute 'Point values to be written' keeps climbing. Mitigation strategies?:

    the best way to zero it out would be restarting Mango.

    Ironically, restarting Mango will result in a prolonged data outage while the values are being written. Therefore it becomes a matter of choosing which data I want to lose (today's, or the accumulated values). I also wonder if those values are from a particular point, or from random points.

    posted in User help read more
  • P
    Pedro

    Thanks very much: 3.5 hours after changing the TSDB settings, I see that the Point values to be written has leveled off, on average. Plotted on a one day scale, it will surely look like a flat horizontal line. Now it would be nice to figure out how to make it go down from 276,196 values to zero.

    My current configuration:

    $ jfq 'systemSettings' Mango-Configuration-Jul-17-2019_201152.json | grep -i nosql
      "mangoNoSql.writeBehind.statusProviderPeriodMs": 5000,
      "mangoNoSql.writeBehind.maxInstances": 10,
      "mangoNoSql.backupHour": 4,
      "mangoNoSql.backupEnabled": true,
      "mangoNoSql.backupMinute": 0,
      "mangoNoSql.writeBehind.maxInsertsPerPoint": 10000,
      "mangoNoSql.backupPeriods": 1,
      "mangoNoSql.backupFileCount": 3,
      "mangoNoSql.backupIncremental": false,
      "mangoNoSql.backupFileLocation": "/mnt/WesternDigitalUSB/mango-backup",
      "mangoNoSql.corruptionScanThreadCount": 100,
      "systemEventAlarmLevel.NOSQL_DATA_LOST": "CRITICAL",
      "mangoNoSql.writeBehind.minInsertsPerPoint": 1000,
      "mangoNoSql.writeBehind.stalePointDataPeriod": 60000,
      "mangoNoSql.writeBehind.batchProcessPeriodMs": 500,
      "mangoNoSql.writeBehind.maxRowsPerInstance": 100000,
      "mangoNoSql.backupLastSuccessfulRun": "Jul-13-2019_040000",
      "mangoNoSql.writeBehind.backdateDelay": 4985,
      "action.noSqlBackup": "",
      "mangoNoSql.writeBehind.stalePointCleanInterval": 60000,
      "action.noSqlRestore": "",
      "mangoNoSql.writeBehind.minDataFlushIntervalMs": 100,
      "mangoNoSql.backupPeriodType": "WEEKS",
      "mangoNoSql.writeBehind.spawnThreshold": 100000,
      "mangoNoSql.intraShardPurge": false
    

    posted in User help read more
  • P
    Pedro

    @terrypacker said in Internal data source attribute 'Point values to be written' keeps climbing. Mitigation strategies?:

    Are you seeing any Data Lost events?

    I have not seen Data Lost events. However, since I've had to kill Mango to complete the last several restarts, if there were a data loss it would not have generated an event at that time.

    If so these should enlighten us as to why this is happening, they get raised whenever a batch fails to write.

    Thanks, that's reassuring. I had not remembered that there was such an event.

    I would make sure that event level is set to something other than Do No Log too.

    I had apparently set that event to critical. I did not see any such events, and in the last month I re-enabled critical events emails forwarding through my mobile phone network's email to SMS gateway. My phone sounds an attention getting submarine dive alarm when receiving Mango text messages.

    Do you have any Alphanumeric points that would be saving very large strings of text?

    Only one. I tried listing ALPHANUMERIC points in the table on the data_sources page, but for some reason it would not show a full query response (there were hourglasses at the bottom of the list). In any case, the JSONata command-line utility jfq enables me to specify more detailed reporting of the Mango configuration.

    Query:

    # Show name and loggingType and context update of all enabled alphanumeric points that are configured to log their values:
    $ jfq 'dataPoints[pointLocator.dataType="ALPHANUMERIC"][enabled=true][loggingType!="NONE"].[name,enabled,loggingType,pointLocator.context]' Mango-Configuration-Jul-17-2019_000500.json
    

    I generate a large JSON structure to predict the tides based on a pressure sensor data. The table is displayed in a Mango 2.x graphical view via a server-side script graphical object that converts the JSON to an HTML table. The table is apparently using the Alphanumeric_Default template, which saves "When point value changes." However, it only updates context when the tide direction changes, which is only a handful of times per day. The only other alphanumeric point being logged has a context that triggers only twice a year.

    The NoSQL database has a limit to the size of each entry, which is large but its possible this could cause these symptoms.

    The Point values to be written rises by about 1,700 values per day, which far exceeds the number of times the tide changes direction in one day.

    I see there is now a Failed Login event. That's wonderful, thanks.

    I just changed

    the point clean interval (ms) and the stale data period (ms) to 60000

    I will report back whether or not I see a change in the trend of the Point values to be written. It should take an hour or two to see a change.

    Thanks for your help.

    posted in User help read more
  • P
    Pedro

    @phildunlap said in Internal data source attribute 'Point values to be written' keeps climbing. Mitigation strategies?:

    If it's just the count getting off and not values accruing in memory, it's not a big problem.

    Thanks. Hopefully that's the case, but I think there are accrued values hiding somewhere, as that would explain the prolonged shutdowns I experienced when I restarted Mango, whereby I eventually had to kill Mango. Accrued values aside, I need to be able to generate history after creating or changing metadata points, as they are of limited value without a history. I'm thinking that whatever is underlying this symptom is also affecting my ability to generate metadata value histories.

    Sometimes when I generate history I'm wondering if I'm going to push Mango over the edge, where excessive resource demands cause a positive feedback loop of error messages that demand more resources, eventually requiring a restart for Mango to catch up. This occurred a year or two ago, but I can't recall whether it was a history generation or an Internet outage (and inability to send email alarms) that kicked off the positive feedback loop.

    It would be nice to have automatic ad-hoc metadata points as a feature enhancement: if the metadata value is not found in the TSDB point log (e.g. because it was not logged), then it should be calculated at the time it is retrieved. This would result in calculating historical values only when the data is being retrieved, thus reducing computational load and the storage demand, yet still making it available for review or download on the occasion where the data is needed for analysis. Since the value would be calculated upon retrieval, values that are never retrieved would not have to be calculated or recorded. The values could optionally be logged when they are calculated, as they are now. Live values would still be displayed. Should I submit this feature request on Github?

    posted in User help read more
  • P
    Pedro

    @phildunlap

    Have you checked your log file and events for any errors in writing?

    I did not find any ERRORs, but when I searched for tsdb I found WARNings:

    07-14-2019-1.ma.log:WARN  2019-07-14T13:46:10,746 (com.infiniteautomation.tsdb.impl.Backdates$BackdatePoster.runImpl:187) - The backdate poster ran without inserting, queue size: 1 
    07-12-2019-1.ma.log:WARN  2019-07-12T05:05:10,802 (com.infiniteautomation.tsdb.impl.Backdates$BackdatePoster.runImpl:187) - The backdate poster ran without inserting, queue size: 1 
    07-12-2019-1.ma.log:WARN  2019-07-12T06:35:10,970 (com.infiniteautomation.tsdb.impl.Backdates$BackdatePoster.runImpl:187) - The backdate poster ran without inserting, queue size: 1 
    07-12-2019-1.ma.log:WARN  2019-07-12T12:13:51,286 (com.infiniteautomation.tsdb.impl.Backdates$BackdatePoster.runImpl:187) - The backdate poster ran without inserting, queue size: 1 
    07-12-2019-1.ma.log:WARN  2019-07-12T12:31:48,767 (com.infiniteautomation.tsdb.impl.Backdates$BackdatePoster.runImpl:187) - The backdate poster ran without inserting, queue size: 1 
    07-12-2019-1.ma.log:WARN  2019-07-12T13:20:10,772 (com.infiniteautomation.tsdb.impl.Backdates$BackdatePoster.runImpl:187) - The backdate poster ran without inserting, queue size: 1 
    07-12-2019-1.ma.log:WARN  2019-07-12T16:35:10,747 (com.infiniteautomation.tsdb.impl.Backdates$BackdatePoster.runImpl:187) - The backdate poster ran without inserting, queue size: 1 
    07-12-2019-1.ma.log:WARN  2019-07-12T17:35:11,002 (com.infiniteautomation.tsdb.impl.Backdates$BackdatePoster.runImpl:187) - The backdate poster ran without inserting, queue size: 1 
    07-13-2019-1.ma.log:WARN  2019-07-13T12:53:03,262 (com.infiniteautomation.tsdb.impl.Backdates$BackdatePoster.runImpl:187) - The backdate poster ran without inserting, queue size: 1 
    07-13-2019-1.ma.log:WARN  2019-07-13T16:04:19,245 (com.infiniteautomation.tsdb.impl.Backdates$BackdatePoster.runImpl:187) - The backdate poster ran without inserting, queue size: 1 
    07-14-2019-1.ma.log:WARN  2019-07-14T04:35:10,769 (com.infiniteautomation.tsdb.impl.Backdates$BackdatePoster.runImpl:187) - The backdate poster ran without inserting, queue size: 1 
    07-14-2019-1.ma.log:WARN  2019-07-14T06:20:10,757 (com.infiniteautomation.tsdb.impl.Backdates$BackdatePoster.runImpl:187) - The backdate poster ran without inserting, queue size: 1 
    07-14-2019-1.ma.log:WARN  2019-07-14T13:46:10,746 (com.infiniteautomation.tsdb.impl.Backdates$BackdatePoster.runImpl:187) - The backdate poster ran without inserting, queue size: 1
    

    I had noticed that message a few weeks ago, after running a Generate History on a metadata point, but I had thought that was related to a large meta point history generation and that it would resolve itself after the task queues caught up when the CPU utilization fell back down. By the time I was noticed the rising values to be written, I had forgotten about the backdate issue. I think the Generate History function may be triggering or exacerbating the problem. However, the values to be written was rising daily long before I started the Generate History. Either way, I don't know what to do about it.

    About how fast is it rising?

    The rate is irregular when viewed on an hour timescale: the samples to be written may go down every few minutes, but the increases exceed the decreases and a linear trend is seen in the graph. It has been averaging out to an additional 1,700 values per day. Day/1700 ≈ 0.847 min/value ≈ 50.8 sec/value, so not your typical cron job interval. Viewed on a 1-day timescale, the values to be written plot looks like a straight line with a steady rising slope.

    System uptime is 3,500 hours (almost 146 days). 3,500h/272,458values = 46.24 seconds/value in the last 146 days. 146 days ago the values to be written was 445,000, rising at 14,000values/day. At restart that day the values to be written reset to zero, then rose at roughly 1,800values/day. That must be the day I last upgraded Mango. If I remember correctly, I had to kill mango many minutes after initiating shutdown because it was taking too long to complete and I was concerned about prolonging a Mango outage.

    $ date -d 'now - 146 days'
    Tue Feb 19 11:44:35 EST 2019
    

    I poked around for large changes in internal data source readings before and after the update, and today, to see if they correlate with the rate of change in values to be written:

    • Backdates running: usually 0 then and now.
    • Backdates queued: Usually close to 0; 1 today. Spiked above 100 on June 20. The values to be written rose at a much faster rate around that time, but then the slope resumed to 1,700values/day. This was probably the day I ran the Generate History on a metadata point.
    • Consecutive run polls: currently 0; spiked from 0 to 205,000 on June 15, then suddenly down to 0 on June 27.
    • Currently scheduled high priority tasks is typically 12,000-18,000. Does not seem to correlate with other issues.
    • Point value write threads is currently 3. It was 5 before the Feb 19 restart, and appears to be going up in proportion to the number of values to be written.
    • Total backdates is currently 216,293. It had climbed slowly to 1,200 at which time it was reset to 0 at the Feb 19 restart. Since then it climbed slowly to around 3,000 on May 1, at which point it shot up suddenly to almost 60,000 that day, then resumed its slower rate of increase. On June 21 it shot up suddenly from about 60,000 to 215,000, then resumed its slower increase rate.

    I believe the total backdates shot up when I ran a Generate History, and the system has not caught up with the backlog. This is odd because CPU utilization is usually more than 80% idle and CPU % IO wait is currently 0. If I had not ran the Generate History, it seems that these numbers would still be increasing indefinitely, but not at such a fast rate.

    posted in User help read more
  • P
    Pedro

    I graphed the Point values to be written three hours before and after the change in TSDB settings. It continues to rise at the same rate, and is now at 267,341 points. I saw no visible difference in the graph.

    The Database batch writes per second varies between 85 and 100, as it did before.

    posted in User help read more
  • P
    Pedro

    @phildunlap

    Thanks for your help. Yes, the values to be written are still climbing at a steady rate, as they have been at least for months, probably since last time I restarted Mango. I presume there were a lot of accumulated points to be written during my upgrade to 3.5.6, as that would explain the long shutdown time.

    The TSDB setting values I posted yesterday were either the default when I converted to TSDB years ago, or they were changed to new values recommended by IA, probably around the day I converted to TSDB. I doubt I would have the courage to change them on my own without understanding their meaning. Since I have not seen documentation regarding the meaning of each of these settings, I have been afraid to change them. Before noticing that the values to be written was consistently increasing, I also did not see a need to change the settings, so I left them alone.

    Per your suggestion I just changed only the following settings (old ==> new):

    max write behind tasks: 100 ==> 10
    increase the minimum time before flushing a small batch: 1 ==> 100
    increase the maximum batch size: 1 ==> 10000 
    increase the minimum batch size: 1 ==> 1000
    

    The default settings are geared toward quality of service, not necessarily throughput. I would only expect persistent TCP syncs or extreme numbers of points / poll rates to encounter that

    There are 58 data sources (of which 15 are metadata sources). Most of the data sources are Modbus/TCP with a typical polling rate of either 1, 2, or 5 seconds.

    There are about 1300 enabled data points, though many are from metadata sources.

    (or slow disks, which are aided by batching writes, generally).

    I don't think the issue is slow disks because iowait is low.

    It's too soon to see if the trend of increasing values to be written changed. I'll check the values to be written later, and see if it started to decrease, then report back.

    Thanks again

    posted in User help read more