Do you manage your data…or does it manage you?
Graham Green, measurements product manager at National Instruments, discusses the number one concern on an engineers mind – data management, and explains how it is possible to take control.
I was surprised to recently learn that data management is rated as the toughest software challenge facing engineers today. I would have assumed that this dubious honour would have gone to the maintenance of legacy code, or architecting a new application. However, according to a global survey undertaken by National Instruments, data management is the number one concern on engineers’ minds.
The survey results seem to ring true. At least once a week, I hear statements like ‘Where is the file saved from the test run last Monday?’ or ‘We’re adding 20 channels. That’s going to mess up my file I/O!’ or even ‘What on earth is a .r7z extension and how do I open it?’
These problem may seem like just general parts of the day-to-day issues encountered in the measurement and control business, but dig a little deeper and you will see these problems are costing engineering companies a great deal of money.
DEUTZ, a German engine manufacturer, recently published a statement that standardising on a single data management approach across all areas of design, test and production had saved it $2.5 million in just two years. Most of this saving was driven through minimising test repetition, with improved access to past data and better data sharing between departments.
Although taking a standardised approach to data acquisition, analysis, storage and management should be at the forefront of the minds of test managers, for those at the sharp end of design or test, there are additional practices that can be observed to enable greater productivity. These broadly fall under two headings – choosing the best file format and making your files searchable.
A file format for all occasions
I am somewhat a creature of habit and will generally save my data as ASCII files because it is easy to read and share. However, I realised there is a better way. While building a distributed data logger, I found I needed to stream a lot of data from dozens of channels onto a flash drive. I very quickly discovered the limitations of ASCII and the benefits of more advanced, engineering-specific file formats. Now I choose my file type carefully, based on the needs of my application.
There are occasions when I am required to use all types of file format, but for most measurement applications, TDMS (Technical Data Management Streaming) is best suited. This open file format combines XML headers with binary data storage. The good thing about this is that hierarchical header information, or metadata, can be saved at the file level, channel group and individual channel level, allowing real flexibility when documenting, organising and searching test data.
In search of the perfect file
Acquiring data and saving it to disk is really only the first chapter in the story of measurement data. The purpose of data is to help us make decisions, and for this we often need to identify specific segments that show particular or abnormal behaviour in the system. These can be hard to find, especially when searching in data recorded by another person or department.
Results from the National Instrument survey showed that 75 % of data is, in fact, being used other than the measurement taker. Therefore, in order to enable all involved colleagues to search effectively, it is essential to have well-organised and accessible metadata that identifies key attributes such as test time, channels, units, sensor type, maximum and minimum results, and so on. Here are my top three tips for metadata formatting:
- Keep the file easy to read. It helps to have only one property per line and to use the same delimiter throughout the entire file. The most common delimiters are the comma, tab and semicolon. One thing to keep in mind is how the date is formatted. A date formatted as January 1, 2014, may not be read correctly if your delimiter is a comma, whereas 01/01/14 is much easier to programmatically read.
- The header should be the first section in your file. If you run multiple tests and append header information to the end of the file, colleagues who look at your data may not realise that there are multiple sets of data within a single file. This results in a great deal of unnecessary scrolling down in Microsoft Excel or manipulation in data management tools such as NI DIAdem. It is best to create a new file whenever header information is to be written to the file.
- When you organise metadata for measurement files, you usually have different levels of information. You have properties that are applicable for a whole test (date, operator, overall status) and properties for specific channels (unit, channel name). The metadata should be organised to reflect this. The separation could be a blank line between the two sections or possibly a row of asterisks. Metadata that is for the entire file is usually organised in a format with a property name and a value, while metadata for channels has a property name followed by N values where N represents the number of channels of data you are collecting. An easy way to do this is to use a file format, like TDMS, that has this inherently built in. However, if you use another file format, it may be worth laying out your file in a text editor to make sure it is clear before you start adding real data.
File formats such as TDMS, which are designed specifically for engineering purposes, and intelligent data mining and analysis tools such as NI DIAdem, can help. However, at the end of the day, it is down to each of us to think beyond the measurement we are taking today, towards the decision we must make tomorrow.
Source: Control Engineering Europe - All Articles