Big-Data / Data-Science
Gaining information and knowledge from large amounts of data
We are Big Data / Data Science experts from Switzerland
Words like 'big data' have unfortunately been devalued by the marketing department of many companies. However, the field of data science is by no means just a marketing buzzword, but an effective Master's programme at the largest university in Switzerland, the University of Zurich. For the following reasons, PolygonSoftware is the optimal partner for you.
We are trained data scientists and have the latest knowledge from research at the University of Zurich.
PolygonSoftware develops the entire data science stack, from data collection to the presentation of the generated knowledge
PolygonSoftware specialises in frontend technologies for interactive presentation of data with maximum flexibility.
We also realise partial projects, for example only the data visualisation, competently and reliably.
Data science pipelines developed by PolygonSoftware can be hosted by large cloud operators in a cost-effective and scalable way.
Our references in Big-Data / Data-Science.
Many companies have become accustomed to collecting and storing data. Such data can be, for example, sensor data in machines, stock levels in the logistics hall or project information in management. This data can contain a vast amount of information - but only very few companies know how to generate important knowledge from these data sets.
Big Data - Collect, store and organise data
The term Big Data generally refers to the fact that companies collect a lot of data that could be relevant for information gathering. This data is often produced by machines, but can also be customer data, website data or financial data. Data generally comes in two types: Structured or Unstructured. While structured data has a form that is easy for computers to interpret (for example, a database or even an Excel file), unstructured data is not comprehensible to computers per se. For example, it could be a large text file with log entries from a production machine. The first step in data science or big data projects is therefore always to collect all relevant data and to create as much data as possible in a structured form.
When collecting the data, it is important to ensure that the data can be easily searched again. It is therefore a good idea to use a database to store the data. Furthermore, one quickly runs the risk of storing too much data and thus an unmanageable amount of irrelevant data. To solve this problem, industry experts are needed who can assess the relevance of data and arrange for possible aggregation of data pairs. For example, one could decide not to store all data from a production machine every second, but to aggregate them over the course of a production cycle and only store certain statistics: Average speed, maximum/minimum speed, etc.
Collecting the data is often not enough. Data sets often have gaps, are contaminated by measurement inaccuracies or do not contain the relevant information. Data is often in the wrong format, distributed in different places, has duplicates or is simply irrelevant or too confusing for algorithms.
It is therefore almost always advisable to use a data scientist at the data collection stage to ensure that all the data relevant to solving a problem is available in the right form and in the right place. In addition, algorithms can already be developed that process the data, interpolate missing data pairs and transform incorrect formats.
Data visualisation: displaying data to gain information.
The field of data visualisation deals with the representation of complex sets of data using graphs, colours and shapes. The visualisation of data is intended to help people to get a quick overview of even large amounts of data and to be able to extract information from it. The simplest and probably most common example of data visualisation can be found in the financial market. In order to describe the share price at a certain point in time, five common key figures are important: open price, close price, high, low, price loss/price gain. If you want to analyse the price of a share over months, you could put together an Excel spreadsheet with these five columns and 365 rows representing the past price year (assuming that the stock exchange is always open, such as exchanges for cryptocurrencies).
However, long-term investors are not very interested in the absolute number on a given day - for them, the trend in the data is relevant. They try to see regularities, patterns or correlations in the data. To achieve this, stock data is visualised in candlestick charts: A candle visualises the High/Lows with its peaks and the Open/Close price with its bar. The colour of the candle (red/green) indicates whether the day brought a price gain or loss. 365 days would therefore be visualised with 365 candles. It is easier to see patterns and regularities in this data than in the raw data.
Data dashboards: Identifying relationships between different data
Data visualisations are often combined in dashboards. A dashboard contains multiple graphs and visualisations from related data pools. For example, a dashboard can be created for a production chain of machines, showing different statistics for all machines: Failures over time, production speeds, machine heat and so on. By displaying all these data side by side, one can see correlations between them. These correlations could look like this, for example: Machines in the back of the factory run hotter and hotter towards the evening. The same machines then have breakdowns during the night. It could therefore be that the greater cooling of the machines during the night causes these failures.
Even this simple example illustrates the value of data visualisation. The correlation between the heat and the failures turns out to be indirect. At the time of the failures, the machines were running at normal temperature. The previously incoming heat phase is responsible for the failures that occur later. At the same time, one can see in the visualisations the correlation between the heat and the position of the machine in the factory. So the problem in this example is not at all with the machine itself, but with the ventilation of the factory floor.
Data science: gaining information from data without looking at it
The supreme discipline of Big Data is ultimately the field of data science. This is the science of extracting information directly from data without visually processing it for a human. Smart algorithms detect anomalies, relationships and anomalies in the data and pass them on to the data scientist. In the context of the previous example, this would mean that an algorithm notices the relationship between location, temperature and failures and passes this information directly to the operator.
Such algorithms are difficult to develop and often rely on machine learning or artificial intelligence techniques to find different correlations in the large amounts of data. However, before you can develop a new algorithm that extracts knowledge from data, you need to be clear about what you want the algorithm to do in the first place.