Advanced Search
RSS LinkedIn Twitter

Journal Archive

Johnson Matthey Technol. Rev., 2022, 66, (2), 198

doi:10.1595/205651322x16445719154043

Unlocking Scientific Knowledge with Statistical Tools in JMP®

Benefits and challenges of new statistical tools

  • Pilar Gómez Jiménez*
  • Johnson Matthey, Blounts Court, Sonning Common, Reading, RG4 9NH, UK
  • Andrew Fish
  • Johnson Matthey, PO Box 1, Belasis Avenue, Billingham, TS23 1LB, UK
  • Cristina Estruch Bosch
  • Johnson Matthey, Blounts Court, Sonning Common, Reading, RG4 9NH, UK
  • *Email: pilar.gomez.jimenez@matthey.com

PEER REVIEWED
Received 6th June 2021; Revised 26th November 2021; Accepted 7th December 2021; Online 5th April 2022

SHARE THIS PAGE:

Article Synopsis

The value of using statistical tools in the scientific world is not new, although the application of statistics to disciplines such as chemistry creates multiple challenges that are identified and addressed in this article. The benefits, explained here with real examples, far outweigh any short-term barriers in the initial application, overall saving resources and obtaining better products and solutions for customers and the world. The accessibility of data in current times combined with user-friendly statistical packages, such as JMP®, makes statistics available for everyone. The aim of this article is to motivate and enable both scientists and engineers (referred to subsequently in this article as scientists) to apply these techniques within their projects.

1. The Benefits

Cost reduction is possibly the first benefit considered when talking about statistical tools, especially with respect to statistical design of experiments (DoE). However, cost is not the only advantage or even the most significant one. Here is a list of some of the benefits which are discussed in this article:

  • Cost and resource savings

  • Capacity for planning

  • Reliable conclusions, better decisions

  • Utilising historical data

  • Gaining control and adaptability

  • Recording and transferring knowledge

  • Visualisation – improving communication

  • Statistical significance for more objective decisions

  • Comparing to choose the right tool

  • Systematic and structured approach.

1.1 Cost and Resource Savings

DoE and multivariate statistical approaches have been identified before as a clear way of saving time and resources (1). They are systematic and structured approaches to product development and process improvement. The methodology is based on introducing variability into the system by changing a limited number of variables at controlled levels simultaneously, but systematically, in order to study the parameter space. The aim of a DoE is to maximise the knowledge obtained while minimising experimentation. It can also help to ‘fail quickly’; if, for example, the outcome of the study is that the variability cannot be explained by changes on any of the variables, further variables need to be considered. This leads again to saving time and resources.

Statistical modelling of designed or undesigned data can provide a predictive model. This model can be used to predict the output from a combination of the inputs that has not been tried before experimentally, providing that the combination is within the experimental space. Therefore, the predictive aspect of the model can potentially save unnecessary experimentation in the future.

Although it is difficult to quantify and compare like-for-like, some studies in the pharmaceutical space stated that projects involving multivariate experimentation resulted in a requirement for 50–70% fewer batches than traditional experimentation, such as a ‘one factor at a time’ (OFAT) approach. Therefore, the total number of product development weeks were reduced by at least 43% (1), illustrating that time can be saved using this approach.

Historically, multivariate experimentation has not been very accessible for non-statisticians, but now it is possible thanks to user-friendly statistical software packages like JMP® (from SAS Institute, USA), which offers extensive DoE capabilities to design and analyse all types of DoE, and visualisation tools which allow the user to understand which experiments are being carried out and to visualise and communicate the results effectively. Despite potentially remarkable savings, the implementation cost should be relatively low, since it only requires making a software package like JMP® available to scientists and having a good network of support and coaching internally within organisations to share good practices and new methods.

Although cost reduction is possibly the strongest advantage of using statistical tools, it is not the only one. The structured approach embedded in statistical experimentation allows a better project planning process with known schedules.

1.2 Capacity for Planning

When planning an experimental programme, the use of DoE methodology provides several advantages over a traditional approach. The scientists involved in the work must first identify all the variables, thinking about the entire system, which helps to ensure that the project scope is properly assessed and clearly defined at the outset. The variables should be split into those which can be changed (factors or inputs) and those which are to be affected because of these changes and measured (responses or outputs). Are the factors continuous or categorical in data type? Can the factors be controlled during the experiments? If they cannot be controlled then should they be measured? Which of the factors will be fixed as part of the scope of the experimental programme and which will be varied? What are the ranges of each factor? How many levels for each factor will be used? These questions are best answered by a team of scientists with pre-existing knowledge and skills. The planning stages of the DoE are crucial to its outcome and should not be overlooked.

The DoE methodology of variable identification facilitates project definition and considers the experimental design space in its entirety before focusing on the parts of interest. Experimental design space is the total space defined by the factor ranges. This must be carefully chosen by the scientist to ensure that the aims of the experiment can be achieved. An illustration of the design space for a three-factor experimental design is shown in Figure 1, where the design space is the three-dimensional area within the cube, and experiments can take the form of any combination of the three factors within this design space. This can often be neglected during the planning of non-DoE type experimental work.

Fig. 1.

Three-dimensional plot of a three-factor experimental design, with one factor each on the x, y and z axes. The points represent experimental runs (green is a centre point), and the area within the points is defined as the experimental space. In this screening design, all points except the centre point are at the extremes (high or low) settings of the factor ranges

Three-dimensional plot of a three-factor experimental design, with one factor each on the x, y and z axes. The points represent experimental runs (green is a centre point), and the area within the points is defined as the experimental space. In this screening design, all points except the centre point are at the extremes (high or low) settings of the factor ranges

The experimental matrix generated by the DoE is also important in project scheduling. Access to the full set of planned experiments at the start of the project helps when assigning resources and provides a good estimate to management about exactly how long the programme will take. It also prevents interim interpretation of the data because the full set of results is necessary for analysis. This is in direct contrast to typical ‘reactive’ laboratory practice whereby each experiment is analysed immediately afterwards and used to inform the next experiment. In this traditional way, the end of the programme is not clear as the total number of experiments has not been defined and is therefore likely to take longer. DoE is a more proactive approach with a clear timeline for project management purposes and ensures that the full dataset is available before analysis, decreasing the chances of drawing incorrect conclusions or subjectively changing the parameters of the project based on the results of the latest experiment.

An example of this proactive approach coupled with demand for a tight project schedule was demonstrated within Johnson Matthey. An online analyser was loaned from an instrument manufacturer to investigate whether it could be used to monitor a chemical reaction in real-time on plant. The analyser was only available for two weeks and it was therefore important to study the effectiveness of the analyser as efficiently as possible, by collecting spectral data accounting for a range of reaction product mixtures. The aim was to provide enough variation to ensure that robust calibrations for each component in the product mixture could be established within the range of expected online process conditions. A screening DoE was generated to assess the influence of six factors on the spectral response for each reaction product mixture. Two centre points, set in the middle of the factor ranges, were included to determine whether non-linear relationships between the factors and the spectral response could be present, and to establish whether the spectral response was repeatable. The DoE generated 17 experiments, which were run in a randomised order (Table I). These 17 experiments were combinations of high- and low-level settings for each of the six factors, ensuring that there were no correlations between each of the two factors and that effects on the response can be independently quantified (Figure 2).

Table I

Experimental Matrix for 17-Run Screening Design with Six Factors (X1–X6)

Experiment X1 X2 X3 X4 X5 X6
1 +1 –1 +1 –1 +1 +1
2a 0 0 0 0 0 0
3 –1 +1 –1 +1 +1 +1
4 +1 +1 +1 +1 +1 –1
5 –1 +1 +1 –1 –1 –1
6 –1 –1 –1 –1 +1 –1
7 +1 –1 –1 +1 –1 –1
8 +1 +1 +1 –1 –1 –1
9 –1 –1 +1 –1 +1 –1
10 –1 +1 –1 +1 –1 –1
11 +1 +1 –1 –1 –1 +1
12 +1 –1 –1 +1 +1 –1
13 –1 –1 +1 +1 –1 +1
14 +1 –1 +1 +1 –1 +1
15 +1 +1 –1 –1 +1 +1
16 –1 +1 +1 +1 +1 +1
17a 0 0 0 0 0 0

aExperiments 2 and 17 are repeat centre points. Note that the run order has been randomised

Fig. 2.

Scatter plots of six factors in 17-run experimental matrix. The centre point is coloured in green. One point may represent multiple runs in the matrix

Scatter plots of six factors in 17-run experimental matrix. The centre point is coloured in green. One point may represent multiple runs in the matrix

Excellent calibrations for all components of the reaction product mixture were obtained, and the instrument manufacturer commented on how well the design space had been explored in the time available using the DoE. Following successful demonstration that this online analyser could be used to monitor the reaction in all expected conditions, proposals were submitted recommending its purchase and operation on a customer plant. The advantage offered by statistical tools to draw trustworthy conclusions is an aspect which deserves proper consideration.

1.3 Reliable Conclusions, Better Decisions

Trustworthy conclusions obtained from a study and its data are necessary to make the right decisions. The conclusions obtained from the data are as good as the data itself. Therefore, quality of the data is a key aspect. For example, if the data is biased or unbalanced, there is a possibility of obtaining inaccurate conclusions which could lead to unsuccessful or suboptimal decisions for the system or process. The use of statistical tools to plan the study should ensure good quality data and therefore increase the probability of drawing reliable conclusions.

‘Universal versus local optimum’ is an issue which can occur if the data to analyse is not a good representation of the experimental space being studied. In that case, the data analysis can lead to a local optimum of conditions to maximise the output, while the universal optimum is still to be discovered (Figure 3). Following traditional experimentation only data in the red path was obtained, leading the scientists to reach a local optimum. However, within the experimental space defined, a better outcome is possible, but has not been found. This is what it is referred to here as the universal optimum for the experimental space.

Fig. 3.

Local vs. universal optimum issue which can be encountered when using traditional experimentation such as OFAT. The darker shaded areas represent a higher response

Local vs. universal optimum issue which can be encountered when using traditional experimentation such as OFAT. The darker shaded areas represent a higher response

DoE leads to obtaining the right data since the experiments are designed to study the effect of the selected variables and understand the system or process in the most efficient way. It ensures the data is balanced and distributed within the experimental space, allowing unbiased and relevant conclusions about the system to be extracted. JMP® software is a leader in statistical DoE, making multiple state-of-the-art designs available for scientists to choose from depending on the specific case.

When dealing with historical data, which could be biased, for example rich in certain areas of the experimental space and sparse in others, the risk to find a local optimum instead of the universal optimum is significant. Working with historical data can not only lead to suboptimal decisions but can also be time consuming, so the use of statistical tools within JMP® can significantly support this process.

1.4 Utilising Historical Data

The use of advanced data analytics may be applied effectively to existing datasets. There are many instances in research and development (R&D) and manufacturing where large datasets have been generated from previous work programmes which could prove useful as a starting point for the current project of interest. Rather than starting completely from scratch, it may be possible to identify trends and relationships between variables from this existing data. This has the advantage of utilising historical data, much of which was probably expensive and resource-intensive to generate. The use of exploratory data analysis tools within JMP® facilitates this process.

An example of analysing historical data with an exploratory approach has been demonstrated within Johnson Matthey at a catalyst manufacturing site. Two separate plants were involved successively in the production of a single catalyst product, and the multivariate tools within JMP® were used to determine which of the process inputs most affected the properties of the intermediate material (output of Plant 1), and then which of these as inputs affected the properties of the finished catalyst product (output of Plant 2). The process data used in this analysis was taken from several years of production on both plants. An example of part of the exploratory data analysis used for this example is shown in Figure 4, where the distribution and graph builder platforms of JMP® were used to visualise relationships between variables. Based on the results of the analysis and the predictive models created, process settings were changed to optimise catalyst product properties and both plants now have higher rates of meeting target specifications.

Fig. 4.

Exploratory data analysis of a historical dataset showing ‘dynamic linking’ within JMP®, whereby data points highlighted in one visualisation also appear highlighted in another visualisation side-by-side. These plots show that a high value of the Y1 response is generally only achieved when X2 is at a low setting, when X1 is low or mid-range, and is not really dependent on the X3 setting. Assessing the data in this way helps to establish relationships between the variables which can inform modelling of the dataset

Exploratory data analysis of a historical dataset showing ‘dynamic linking’ within JMP®, whereby data points highlighted in one visualisation also appear highlighted in another visualisation side-by-side. These plots show that a high value of the Y1 response is generally only achieved when X2 is at a low setting, when X1 is low or mid-range, and is not really dependent on the X3 setting. Assessing the data in this way helps to establish relationships between the variables which can inform modelling of the dataset

Limitations may exist in the historical data, and probably will be present if the data was collected using a traditional OFAT approach rather than from a designed set of experiments. In this case, it is important to identify where multicollinearity exists and how this affects the analysis of the dataset and the conclusions drawn. The multivariate and exploratory tools within JMP® allow these limitations to be visualised and understood, enabling the scientist to make informed decisions about what the data is showing while being mindful of the underlying assumptions. It also provides an opportunity for sequential experimentation, whereby the existing data, although limited, can be used as a starting point for a subsequent DoE which can deconvolute the limitations in the historical data, resolving the correlated effects and suggesting the best combination of experiments in parts of the design space with fewer existing data points. Alternatively, the understanding gained from mining the historical dataset may be used to focus on fewer significant effects for a new experimental design with additional factors. Related to predictive models, numerous advantages can be drawn for the prediction capabilities, such as gaining control and adaptability.

1.5 Gaining Control and Adaptability

An important advantage of the predictive capacity of a model is the control over the system or process that it offers. It allows the scientist to respond to the outputs and modify the inputs in a system or process to adapt to a new situation, keeping the system or process on target. For example, if the value of one of the input variables changes for external reasons out of our control, the model will point out what the value of the other inputs should be which can be controlled to keep the output on target, compensating for the changes in the input without any experimentation needed. This brings control back to the users and offers tremendous flexibility and adaptability; very important qualities in the fast-moving world. This is often used within Johnson Matthey in different businesses, for example, formulations for certain products to ensure the quality of the final or intermediate product, by proactively adapting to changes in the raw materials.

This task is performed easily in JMP® using the interactive ‘prediction profiler’ (Figure 5). The profiler also allows the scientist to find a new optimum combination of input values if the output target changes (for example, a new customer specification), or when an input needs to be fixed at a certain value (for example, a new requirement or limitation). The profiler will find the optimal combination of the remaining input variables to stay on target.

Fig. 5.

Snapshot of interactive prediction profiler tool in JMP® showing: (a) the recommended values of Inputs 1 and 2 to obtain a target output of 90%; (b) how the output doesn’t get to the target when Input 1 is forced to 1000, keeping Input 2 at the previous level; (c) the recommended value of Input 2 when Input 1 has to be equal to 1000 in order to reach the target output (90%)

Snapshot of interactive prediction profiler tool in JMP® showing: (a) the recommended values of Inputs 1 and 2 to obtain a target output of 90%; (b) how the output doesn’t get to the target when Input 1 is forced to 1000, keeping Input 2 at the previous level; (c) the recommended value of Input 2 when Input 1 has to be equal to 1000 in order to reach the target output (90%)

Control over systems and processes is not the only advantage of data modelling. Another very important aspect is related to knowledge storage.

1.6 Recording and Transferring Knowledge

In a scientific process, data is generated to obtain answers to technical questions, prove and contradict hypotheses and corroborate assumptions in the process of discovery or optimisation. Therefore, the data itself is a vehicle to get knowledge. Knowledge is the final aim, but that knowledge ideally needs to be recordable, communicable and transferable to maximise its use.

Statistical modelling allows knowledge to be extracted from a study or from data in the shape of a model that helps to communicate and visualise the effects of the different inputs on the output. The model itself contains this knowledge and allows the rest of the world to utilise that knowledge.

Within JMP®, statistical modelling is accessible to everyone with multiple modelling techniques available and the ability to compare them easily. In addition, the software offers the prediction profiler tool (Figure 5), which not only enables scientists to visualise and communicate their findings (contained in a model) dynamically and interactively, but also to transfer and share the learnings with colleagues in the same team and between different teams and functions. Utilising these tools can ensure that the knowledge obtained from experimentation stays in the company in a reusable format despite employees leaving or retiring.

Another aspect to facilitate knowledge sharing comes from the understanding of a chemical problem or question. Sometimes this can be very subjective and variable depending on the scientist’s background, expertise and interests. JMP® tools offer enhanced visualisations for different stages in the process to ensure good communication and visualisation of problems and results.

1.7 Visualisation – Improving Communication

Visualisation tools are used in different steps of data analysis and are key to helping understand and communicate the chemical problem studied. In the first instance, they are used to explore the data set initially. This process is very important as it helps the scientist to get to know the data. On one hand, it helps to understand the experimental space and identify possible gaps, outliers and errors. As mentioned before, this stage is particularly important when looking at historical data as this data tends to be limited. It can also help the scientist to identify correlations between the inputs and the outputs before embarking into model building. The ‘graph builder’ and ‘distribution’ platforms available in JMP® are excellent tools to use at this stage (Figure 4). They are also great tools to present a point or argument in a meeting since they are interactive and easy to understand. All these visualisations can also be shared using dashboards that can be produced in JMP® very easily, and the interactivity is retained (Figure 6). Dashboards, in the same way as other visualisations, can be converted into HTML so they can be explored without the need to have JMP®. Dashboards allow scientists to present key findings and can support stakeholders with decision making.

Fig. 6.

Snapshot of a dashboard generated in JMP®. Different visualisations and reports of the analysis carried out can be added to dashboards and the interactivity is retained

Snapshot of a dashboard generated in JMP®. Different visualisations and reports of the analysis carried out can be added to dashboards and the interactivity is retained

Once the model is built, the effect of the inputs to the outputs can be visualised using the prediction profiler (Figure 5) which is one of the most powerful tools available in JMP®. As already mentioned, this allows the scientist to explore the effect of the factors and better understand the chemical problem. It is also a great tool to communicate the process and the effect of the factors. JMP® allows these visualisations to be saved in an interactive format which can be shared across different functions. An example of utilising these tools to generate value has been demonstrated within Johnson Matthey. When the commercial team received enquiries regarding the use of a product under certain conditions, they had to contact the development team to access the information. The research team has now built a model as a result of a response surface DoE. The model has been shared with the commercial team using the interactive prediction profiler. With this, the commercial team can predict the performance depending on the conditions suggested by the customers. This tool has provided the commercial team with more autonomy and a quicker response to the customer and has saved time for the development team. Statistical tools can not only help us to visualise data but also to make objective decisions.

1.8 Statistical Significance for More Objective Decisions

The use of statistics in disciplines such as physics, biology, medicine and finance is common (2, 3). However, in our experience, its use in chemistry has been sparse despite it being a useful tool, and some would say, indispensable.

The aim of experimentation is typically exploratory, to gain understanding, or to optimise a process. Although the objective might be different, a tool that helps to differentiate between the experimental variability and the effect of a particular input is needed. This is where statistics can help to make more informed decisions. Statistical tests are carried out to understand if results are statistically significant or not. When talking about statistically significant results, we refer to those results obtained by testing or experimentation that are not likely to occur randomly or by chance, instead they are due to a specific cause. Often p-values are used to describe this. Although the inappropriate use of p-values in some cases has brought controversy (47), they can be very useful. It is important to remember that the conclusions drawn from statistical tests should be interpreted within the context of the study (sample size, reliability and validity of the instruments used to measure the outputs).

An example of this within Johnson Matthey has been a comparison study between several analysers (Figure 7). The statistical tool facilitated the visualisation and helped to establish the significance of the differences found between the measurements obtained in the analysers when dealing with the same samples. These types of studies are crucial to ensure the reproducibility of results.

As seen so far, the toolbox is quite extensive and sometimes that can be slightly overwhelming. For example, when generating a DoE, it is possible to be intimidated by the choice of design types available. However, JMP® has features to help when evaluating and comparing designs.

Fig. 7.

Example of oneway analysis in JMP® for measurements of the same sample on three different analysers showing significant difference between Analyser 3 and the other two analysers, especially Analyser 1. Analyser 3 provides on average significantly lower measurements than the other two analysers

Example of oneway analysis in JMP® for measurements of the same sample on three different analysers showing significant difference between Analyser 3 and the other two analysers, especially Analyser 1. Analyser 3 provides on average significantly lower measurements than the other two analysers

1.9 Comparing to Choose the Right Tool

The choice of design depends upon the aims of the project (screening or optimisation) and the resolution required (main effects, higher order terms). Classical DoEs (full factorial and fractional factorial designs) are no longer used as often as increasingly popular modern designs (definitive screening designs and bespoke custom designs) (8, 9). The design choice must then be carefully balanced against the resources available (timeframe, cost of running experiments) to decide upon the experimental matrix to be used. More experiments will provide more information about the system, but often this is not possible because of practical or financial constraints. It therefore becomes extremely important to compare multiple designs and understand the relative advantages and disadvantages of each.

This is made possible with the ‘evaluate design’ and ‘compare designs’ tools in JMP®. Potential designs can be opened side-by-side and comparisons made. Power analysis helps to estimate the ability of the design to detect effects of importance by reporting the probability of detecting effects of a given size. Higher powers for model terms result in a greater chance of detecting their effect. Prediction variance profiling displays the uncertainty across the experimental space and can be altered depending on the focus of the design. For example, an optimisation design would try to minimise prediction variance at the centre of the experimental space. Colour maps of correlations show the absolute value of the correlation between any two effects that appear in the prediction model, represented visually with a sequential colour scheme (Figure 8). This helps to identify where factors and higher order terms in the models may be partially or fully confounded, and where one design might have the advantage over another.

Fig. 8.

Colour map of correlations for a three factor response surface design (X1 and X2 are continuous, X3 is 3-level categorical), showing partial correlation of higher order terms

Colour map of correlations for a three factor response surface design (X1 and X2 are continuous, X3 is 3-level categorical), showing partial correlation of higher order terms

The eventual design choice will be unique to the scenario, but evaluation and comparison of multiple designs allows the requirements of the project to be considered against the real-world implications. Running more experiments will provide additional understanding of the system but resource may only be available for a predefined number of experiments. These tools allow the best choice to be made to carry out experimentation in the most efficient manner to maximise the information that can be gained while also identifying the limitations of the design. The efficiency of statistical design has already been mentioned several times, and this characteristic is due to the systematic and structured nature of this approach.

1.10 Systematic and Structured Approach

The traditional approach to experimentation, which is still taught in most universities, consists of changing one input while keeping the others constant. This provides the certainty, or so it is believed, that the variance observed in the output is due to this change. However, this approach has many pitfalls: there is no way of studying the interaction between two inputs, experimental error is not accounted for and the experimental space is not fully covered. DoE corrects all these pitfalls: it allows the study of interactions between inputs, the experimental error is accounted for and all the experimental space is fully covered. All this provides more control than traditional experimentation.

The process of carrying out statistically designed experimentation follows a structured approach. Initially, the experimental space is decided by the scientist based on experience or prior knowledge. If working in a new area, a pilot trial can be used to help the scientist. Once the first set of experiments has been completed and analysed, further experiments can be planned based on the results obtained, the aim of the experimentation and the number of experiments that can be performed. Also, experiments to validate the model should be carried out. The scientist has control over the experimental plan and the statistical tools are only there to facilitate the work. All this is made very easy by JMP® as it provides different platforms to generate the different designs and augmentations. As already commented, tools to evaluate the designs can also be found in these platforms so the scientist can take an informed decision when selecting the design.

As emphasised extensively in this article, there are multiple benefits from utilising statistical tools for product development and process optimisation. However, their implementation has not been so widely applied, especially in the chemical industry. It is worth highlighting some of the challenges and how to overcome them.

2. The Challenges

These are some of the common challenges found when introducing new statistical tools and software into a well-established technical community:

  • The ‘Excel mind’

  • Learning new software

  • Learning or refreshing statistics

  • Cultural shift

  • Fear of being redundant

  • The timings

  • Black box

  • Irreproducibility.

2.1 The ‘Excel Mind’

Commonly, data logging from scientific equipment and data analysis from experiments is done within Microsoft Excel (Microsoft Corporation, USA). Scientists are familiar with this program, having probably used it daily during the entirety of their career. There is a reluctance to move away from something to which we are so accustomed, in some cases to the point where we can no longer see the limitations. Microsoft Excel is an excellent spreadsheet program with simple user interface, ensuring it is used universally. However, it was never designed with the intention of handling and interpreting large volumes of data. A recent example of its misuse resulted in Public Health England failing to report nearly 16,000 coronavirus cases in 2020 (10). Add-ins are available to perform simple statistical functions, but specialist software like JMP® is required to thoroughly interrogate data and deliver greater understanding.

As well as the tools available within JMP® to provide greater insight, it has been purposely designed to manipulate and visualise large datasets. This is exemplified by features such as ‘graph builder’ and ‘dynamic linking’, as previously shown in Figure 4. The click-and-drag interface when building graphs in JMP® is a much simpler workflow to visualise data than creating graphs within Excel. There is also a JMP® add-in available for use in Excel which allows the user to transfer data between the two programmes in a single step and quickly access some of the common analysis platforms of JMP®. From experience within Johnson Matthey, we have found that the key to persuading people away from Excel and into specialist software is to show a direct comparison of a typical workflow with real-life data used in that part of the company. The improved visualisation and data analysis are immediately obvious, as are the time savings, which liberates more time for scientists to develop new technologies and products in the laboratory rather than handling and formatting data. However, there is also a barrier to overcome when learning to use new software.

2.2 Learning New Software

Johnson Matthey has recognised the benefits of promoting and instilling a culture of advanced data analytics. However, a common barrier to overcome when transitioning to new ways of working is the initial investment of time required to get to grips with new software. For research professionals whose time is a precious commodity, the investment of time needed upfront to learn new techniques and navigate around the software can be deterring. This is especially true as this part of the learning curve does not provide any immediate, tangible output. Furthermore, the wealth and variety of training resources available to new software users can make the learning process seem initially overwhelming. From experience within Johnson Matthey, we have found that setting aside time at regular intervals to progress through a predetermined training plan helps to make the process as simple as possible for new users. The training plan can be developed alongside a more experienced software user and will be bespoke to the requirements of the individual, concentrating on the functionality of the software with which the user will primarily be working. The training plan typically includes different resources, such as individual learning (online webinars, e-learning subscriptions, ‘Statistical Thinking for Industrial Problem Solving Modules’ – a free online statistics course provided by JMP®) and group learning (Johnson Matthey specific software introduction courses developed and run by experienced users). There is also an active JMP® user community within Johnson Matthey which was created to support new users, provide an informal environment for sharing knowledge and as an open forum to ask questions about specific problems.

Demonstrations of Johnson Matthey projects to senior management where the software has been used to improve process understanding have been critical to increase awareness of the benefits the software can bring. This has resulted in management encouraging staff to dedicate time towards software training. The impact of coronavirus has also accelerated this process, as developing new software skills is a task that can be carried out while working from home, either during forced periods of self-isolation or by minimising regular operations on site. But obviously it is not all about learning to use new software, it is also about learning statistics.

2.3 Learning or Refreshing Statistics

As already mentioned, the use of statistics is more common in other fields such as biology or medicine than in chemistry. Traditionally statistics is not a featured component in chemistry undergraduate degrees. If some statistical content was taught in the first years, the learning was not normally reinforced with practical activities further on in the course. This can make chemists uncomfortable around statistics.

Within Johnson Matthey we believe that a practical understanding of statistics can be achieved to complement chemistry expertise by our scientists. The use of practical statistics is now totally accessible by using software packages like JMP®. The learning curve of statistics goes hand-in-hand with learning new software from a practical point of view. It allows scientists to practise and learn with their own data, which has been proved to be the best way to learn, always supported by the most experienced users within the company and learning from each other’s cases. This process is not easy because it requires a total change of culture.

2.4 Cultural Shift

While DoE methodology has been applied experimentally for decades, it is only relatively recently that its usage has gathered momentum across many scientific disciplines. This is due to a combination of advances in the algorithms used to tailor designs to the experiments and an increasing industrial need for rapid experimentation and decision making. For example, addressing design space constraints (11), handling mixture-type factors (12), comparing the effectiveness of different designs (13), introducing uncertainty in the factors and optimising using a variability simulator (14). However, for traditionally trained scientists who are used to changing OFAT in accordance with the scientific method, the transition to DoE methodology can be met with trepidation. There can be a concern that the scientist’s skills are not being fully utilised and that the recommended experiments in the matrix will not be enough to understand the system. Overcoming these anxieties is a significant challenge, particularly within established R&D departments. At Johnson Matthey, the way this has been approached is to demonstrate the power of DoE on small projects across a range of technology areas, and actively promote these results to the rest of the company, increasing the visibility and viability of the DoE methodology. This generates additional interest and establishes confidence in the methods so that scientists have more faith in using DoE for larger, more complicated projects. The functionality of software such as JMP® to create designs, analyse the results and present the conclusions is essential in facilitating this cultural shift.

At Johnson Matthey, the key principle when driving this transition to an advanced experimentation and statistical approach to data is to empower our scientists to do it themselves. The scientists are the technical experts in their respective areas, and by giving them the understanding, tools and training to create and analyse DoE it is believed that this will result in better outcomes for both the current project and future work programmes.

Indeed, recognising the technical expertise in their respective areas when deploying a statistical software like JMP® is a very important step to overcome a very important concern; the fear of being substituted by a computer or a machine (15).

2.5 Fear of Being Redundant

The media can be overwhelming in this respect: listening and reading continuously about artificial intelligence, robotics, automation and machine learning. However, technical expertise will always be necessary, and the human being has proved to be indispensable in many fields. Statistical techniques like DoE are not designed to substitute the chemical expertise of a human scientist but to work in conjunction with them to get the most out of experimentation, and to make them more efficient. Statistical tools are exactly that: tools to be used, not to substitute scientists.

Indeed, the first step in a statistical design is the planning. For this step, chemical expertise plays a crucial role. The DoE is not going to tell the scientist which factors or responses should be studied. It is the scientist who should feed all this valuable information into the design.

In the same way, as a result of a DoE it could be found that a variable has or does not have an effect on the output, but it will not say why. It is up to the scientist to interpret the result, try to understand why and continue designing more experiments to test and prove that hypothesis.

When teaching these techniques within Johnson Matthey we are very careful to emphasise these tools are to help scientists, not to replace them. It is crucial to motivate scientists to believe in the process to overcome other major challenges, such as the timings.

2.6 The Timings

Another challenge that scientists experience when using DoE is the lack of immediate visibility of the factors’ effects. In traditional experimentation, the scientists can see the effect that changing an input has on the output once the experiment has finished and then, based on this result, decide the next experiment. However, there is no visibility of progress while carrying out experiments from a DoE as analysis of the results only makes sense once all the experiments of the design have been carried out. This requires some patience and trust from the scientists to see it through. Therefore, the first time that someone uses such tools they will struggle but once they see the results, they understand that the wait was worthwhile. For this reason, it is recommended to start with smaller sets of experiments instead of embarking on a large, complex DoE. Also, to start with a relatively simple DoE such as a full factorial design with only a few factors, to overcome another important challenge, the fear of the ‘black box’.

2.7 Black Box

Another big challenge that pushes scientists away from using DoE is that it is seen as a ‘black box’. A lack of understanding of the technique together with a limited understanding of statistics creates uncertainty and the scientist can feel a loss of control. It is an understandable response and can only be helped by providing the information needed to understand the technique and its benefits.

Work is being done at Johnson Matthey to make sure that scientists are provided with the tools and support necessary, so they can understand the techniques and use them with confidence. Different approaches are taken for this: one-to-one training, in-house and external group training. Also, the use of software such as JMP® has been critical to empower the scientists at Johnson Matthey to use such tools. The program is easy to use and there are lots of free learning materials available from JMP®. For new users who do not feel very adventurous, starting with a simple and more intuitive design such as a small full factorial is recommended. Despite starting with something simple, the results will sometimes be unsatisfactory for the experimenters, and might reveal some difficult truths.

2.8 Irreproducibility

The use of statistics and DoE during experimentation might uncover some weaknesses in the way the experiments are carried out. Sometimes, inconclusive results will be obtained from DoE due to irreproducibility issues on the experimentation. It might be tempting to point at the DoE as the problem; however, DoE has only helped to uncover an issue that already existed even when performing traditional experimentation. Instead of seeing this as an issue it should be thought of as an opportunity to improve the way experimentation is carried out and to reduce the experimental error.

The variability observed could be due to many reasons, such as uncontrolled factors that affect the response or its measurement. These could be included in a subsequent DoE to be studied further and help to provide better understanding of the system (Figure 9). To be included in the study, the scientist needs to be able to measure and control the different inputs. Understanding the origin of the variability of an experiment can be used to improve the process. For example, if a more precise measurement of the output can be obtained, the scientist would be able to observe smaller effects of inputs which, combined, could drive larger improvements in the output.

Fig 9.

Flow followed when carrying out DoE and subsequent model building

Flow followed when carrying out DoE and subsequent model building

DoE and statistical tools allow experimenters to obtain reliable data in order to extract objective conclusions and take decisions, even if those conclusions are that the experimentation needs to be redesigned or the measurement system improved.

3. Conclusions

It is hoped that this article has been able to show the importance of statistical tools in the scientific space and how challenges can be overcome by using statistical software, like JMP®, which makes statistics accessible to everyone, and by offering employees good support and multiple learning opportunities depending on their background, learning style and needs. At Johnson Matthey we believe scientists and engineers are in the best position to use statistical tools themselves, in order to design, capture, analyse and obtain conclusions from their own data, and that the value of this approach can be of immense benefit to an organisation.

BACK TO TOP

References

  1. 1.
    R. Lievense, “Pharmaceutical Quality by Design Using JMP®: Solving Product Development and Manufacturing”, SAS Institute Inc, Cary, USA, 2018
  2. 2.
    B. Durakovic, Period. Eng. Nat. Sci ., 2017, 5, (3), 421 LINK http://pen.ius.edu.ba/index.php/pen/article/view/145/175
  3. 3.
    S. E. Fienberg, Ann. Rev. Stat. Appl ., 2014, 1, 1 LINK https://doi.org/10.1146/annurev-statistics-022513-115703
  4. 4.
    R. Nuzzo, Nature, 2014, 506, (7487), 150 LINK https://doi.org/10.1038/506150a
  5. 5.
    L. G. Halsey, Biol. Lett ., 2019, 15, (5), 20190174 LINK https://doi.org/10.1098/rsbl.2019.0174
  6. 6.
    V. Amrhein, S. Greenland and B. McShane, Nature, 2019, 567, (7748), 305 LINK https://doi.org/10.1038/d41586-019-00857-9
  7. 7.
    R. L. Wasserstein, A. L. Schirm and N. A. Lazar, Am. Stat ., 2019, 73, 1 LINK https://doi.org/10.1080/00031305.2019.1583913
  8. 8.
    B. Jones and C. J. Nachtsheim, J. Qual. Technol ., 2011, 43, (1), 1 LINK https://doi.org/10.1080/00224065.2011.11917841
  9. 9.
    B. Jones and D. C. Montgomery, “Design of Experiments: A Modern Approach”, John Wiley & Sons Inc, Hoboken, USA, 2019
  10. 10.
    L. Kelion, ‘Excel: Why Using Microsoft’s Tool Caused Covid-19 Results to be Lost’, BBC, London, UK, 5th October, 2020 LINK https://www.bbc.co.uk/news/technology-54423988
  11. 11.
    D. C. Montgomery, E. N. Loredo, D. Jearkpaporn and M. C. Testik, Qual. Eng ., 2002, 14, (4), 587 LINK https://doi.org/10.1081/QEN-120003561
  12. 12.
    P. Goos, B. Jones and U. Syafitri, J. Am. Stat. Assoc ., 2016, 111, (514), 899 LINK https://doi.org/10.1080/01621459.2015.1136632
  13. 13.
    A. Jankovic, G. Chaudhary and F. Goia, Energy Build ., 2021, 250, 111298 LINK https://doi.org/10.1016/j.enbuild.2021.111298
  14. 14.
    R. S. Kenett and S. Zacks, “Modern Industrial Statistics: with Applications in R, MINITAB, and JMP”, 3rd Edn., John Wiley & Sons Inc, Hoboken, USA, 2021, 880 pp
  15. 15.
    E. Dahlin, Socius: Sociol. Res. Dynam. World, 2019, 5 LINK https://doi.org/10.1177/2378023119846249

Acknowledgements

Thanks to Johnson Matthey management for its support on extending the use of statistical tools around Johnson Matthey, and to all the motivated colleagues who have joined us in this journey for their enthusiasm and patience.

Microsoft and Excel are trademarks of the Microsoft group of companies. All other trademarks are the property of their respective owners.

The Authors


Pilar Gómez Jiménez works as a principal scientist in Johnson Matthey, UK. She has a Master’s degree and a PhD in Chemical Engineering, and she has been working in R&D of catalysts and materials for 17 years. She is enthusiastic about the application of DoE and the DoE mindset. This led to her current role extending the use of DoE and advanced data analytics through training, support and method development within Johnson Matthey.


Andrew Fish is a principal researcher at Johnson Matthey. He holds a Bachelor’s degree in Applied Chemistry and has been working on the R&D of catalysts since 2005. His current role is focused on testing catalysts for the development and optimisation of syngas-based flowsheets. He also has a keen interest in data analytics and DoE methodology, providing advice and support on these topics to colleagues within Johnson Matthey.


Cristina Estruch Bosch has a Master’s degree in Catalysis and a PhD in Chemical Engineering. She is a senior scientist at Johnson Matthey where she has been working on R&D of catalysts since 2007. Her current role is to enable and support scientists and engineers in Johnson Matthey in the use of statistical tools.

Read more from this issue »

BACK TO TOP

SHARE THIS PAGE: