DeepPerf

Many software systems provide users with a set of configuration options and different configurations may lead to different runtime performance of the system. It is necessary to understand the performance of a system under a certain configuration, before the system is actually configured and deployed. This helps users make rational decisions in configurations and reduce performance testing cost. As the combination of configurations could be exponential, it is difficult to exhaustively deploy and measure system performance under all possible configurations. Recently, several learning methods have been proposed to build a performance prediction model based on performance data collected from a small sample of configurations, and then use the model to predict system performance with a new configuration. DeepPerf is an end-to-end deep learning based solution that can train a software performance prediction model from a limited number of samples and predict the performance value of software system under a new configuration. DeepPerf consists of two main stages:

Stage 1: Tune the hyperparameters of the neural network
Stage 2: Utilize the hyperparameters in Stage 1 to train the neural network with the samples and predict the performance value of software system under a new configuration.

Citing DeepPerf

If you find our code useful, please cite our paper:

@inproceedings{Ha2019DeepPerf,
  author    = {Huong Ha and
               Hongyu Zhang},
  title     = {DeepPerf: performance prediction for configurable software with deep
               sparse neural network},
  booktitle = {Proceedings of the 41st International Conference on Software Engineering,
               {ICSE} 2019, Montreal, QC, Canada, May 25-31, 2019},
  pages     = {1095--1106},
  publisher = {{IEEE} / {ACM}},
  year      = {2019}
}

Prerequisites

Python 3.6.x
Tensorflow (tested with tensorflow 1.10.0, 1.8.0)

Installation

DeepPerf can be directly executed through source code

Download and install Python 3.6.x here.
Install Tensorflow

$ pip install tensorflow==1.10.0
Clone DeepPerf

$ clone https://github.com/DeepPerf/DeepPerf.git

Data

DeepPerf has been evaluated on 11 real-world configurable software systems:

Apache
LLVM
x264
BDBC
BDBJ
SQL
Dune
hipacc
hsmgp
javagc
sac

Six of these systems have only binary configuration options, the other five systems have both binary and numeric configuration options. The data is store in the DeepPerf\Data directory. These software systems were measured and published online by the SPLConqueror team. More information of these systems and how they were measured can be found in here.

Usage

To run DeepPerf, users need to specify the name of the software system they wish to evaluate and then run the script AutoDeepPerf.py. There are 11 software systems that users can evaluate: Apache, LLVM, x264, BDBC, BDBJ, SQL, Dune, hipacc, hsmgp, javagc, sac. The script will then evaluate DeepPerf on the chosen software system with the same experiment setup presented in our paper. Specifically, for binary software systems, DeepPerf will run with five different sample sizes: n, 2n, 3n, 4n, 5n with n being the number of options, and 30 experiments for each sample size. For binary-numeric software systems, DeepPerf will run with the sample sizes specified in Table IV of our paper, and 30 experiments for each sample size. For example, if users want to evaluate DeepPerf with the system LLVM, the command line to run DeepPerf will be:

$ python AutoDeepPerf.py LLVM

When finishing each sample size, the script will output a .csv file that shows the mean prediction error and the margin (95% confidence interval) of that sample size over the 30 experiments. These results will be same/similar as the results we report in Table III and IV of our paper.

Alternatively, users can customize the sample size and/or the number of experiments for each sample size by using the optional arguments -ss and -ne. For example, to set the sample size = 20 and the number of experiments = 10, the corresponding command line is:

$ python AutoDeepPerf.py LLVM -ss 20 -ne 10

Setting none or one option will result in the other option(s) running with the default setting. The default setting of the number of experiments is 30. The default setting of the sample size is: (a) the five different sample sizes: n, 2n, 3n, 4n, 5n, with n being the number of configuration options, when the evaluated system is a binary system OR (b) the four sample sizes specified in Table IV of our paper when the evaluated system is a binary-numeric system.

NOTE: The time cost of tuning hyperparameters and training the final neural network for each experiment ranges from 2-20 minutes depends on the software system, the sample size and the user's CPU. Typically, the time cost will be smaller when the software systems has smaller number of configurations or when the sample size is small. Therefore, please be aware that for each sample size, the time cost of evaluating 30 experiments ranges from 1 hour to 10 hours.

Experimental Results

To evaluate the prediction accuracy, we use the mean relative error (MRE), which is computed as,

$MRE = \dfrac{1}{\vert C \vert} \sum_{c \in V} \dfrac{\vert predicted_c - actual_c \vert}{actual_c} \times 100,$

where V is the testing dataset, predicted_c is the predicted performance value of configuration c generated using the model, actual_c is the actual performance value of configuration c. In the two tables below, Mean is the mean of the MREs seen in 30 experiments and Margin is the margin of the 95% confidence interval of the MREs in the 30 experiments. The results are obtained when evaluating DeepPerf on a Windows 7 computer with Intel Xeon CPU E5-1650 3.2GHz 16GB RAM.

Prediction accuracy for software systems with binary options

Subject System	Sample Size	DECART		DeepPerf
Subject System	Sample Size	Mean	Margin	Mean	Margin
Apache	n	NA	NA	17.87	1.85
	2n	15.83	2.89	10.24	1.15
	3n	11.03	1.46	8.25	0.75
	4n	9.49	1.00	6.97	0.39
	5n	7.84	0.28	6.29	0.44
x264	n	17.71	3.87	10.43	2.28
	2n	9.31	1.30	3.61	0.54
	3n	6.37	0.83	2.13	0.31
	4n	4.26	0.47	1.49	0.38
	5n	2.94	0.52	0.87	0.11
BDBJ	n	10.04	4.67	7.25	4.21
	2n	2.23	0.16	2.07	0.32
	3n	2.03	0.16	1.73	0.12
	4n	1.72	0.09	1.67	0.12
	5n	1.67	0.09	1.61	0.09
LLVM	n	6.00	0.34	5.09	0.80
	2n	4.66	0.47	3.87	0.48
	3n	3.96	0.39	2.54	0.15
	4n	3.54	0.42	2.27	0.16
	5n	2.84	0.33	1.99	0.15
BDBC	n	151.0	90.70	133.6	54.33
	2n	43.8	26.72	16.77	2.25
	3n	31.9	22.73	13.1	3.39
	4n	6.93	1.39	6.95	1.11
	5n	5.02	1.69	5.82	1.33
SQL	n	4.87	0.22	5.04	0.32
	2n	4.67	0.17	4.63	0.13
	3n	4.36	0.09	4.48	0.08
	4n	4.21	0.1	4.40	0.14
	5n	4.11	0.08	4.27	0.13

Prediction accuracy for software systems with binary-numeric options

Subject System	Sample Size	SPLConqueror		DeepPerf
Subject System	Sample Size	Sampling Heuristic	Mean	Sampling Heuristic	Mean	Margin
Dune	49	OW RD	20.1	RD	15.73	0.90
	78	PW RD	22.1	RD	13.67	0.82
	240	OW PBD(49, 7)	10.6	RD	8.19	0.34
	375	OW PBD(125, 5)	18.8	RD	7.20	0.17
hipacc	261	OW RD	14.2	RD	9.39	0.37
	528	OW PBD(125, 5)	13.8	RD	6.38	0.44
	736	OW PBD(49, 7)	13.9	RD	5.06	0.35
	1281	PW RD	13.9	RD	3.75	0.26
hsmgp	77	OW RD	4.5	RD	6.76	0.87
	173	PW RD	2.8	RD	3.60	0.2
	384	OW PBD(49, 7)	2.2	RD	2.53	0.13
	480	OW PBD(125, 5)	1.7	RD	2.24	0.11
javagc	423	OW PBD(49, 7)	37.4	RD	24.76	2.42
	534	OW RD	31.3	RD	23.27	4.00
	855	OW PBD(125, 5)	21.9	RD	21.83	7.07
	2571	OW PBD(49, 7)	28.2	RD	17.32	7.89
sac	2060	OW RD	21.1	RD	15.83	1.25
	2295	OW PBD(125, 5)	20.3	RD	17.95	5.63
	2499	OW PBD(49, 7)	16	RD	17.13	2.22
	3261	PW RD	30.7	RD	15.40	2.05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepPerf

Achievements

Achievements

Block or report DeepPerf

DeepPerf

Citing DeepPerf

Prerequisites

Installation

Data

Usage

Experimental Results

Prediction accuracy for software systems with binary options

Prediction accuracy for software systems with binary-numeric options

Popular repositories Loading

Uh oh!