A genetic algorithm-based job scheduling model for big data analytics

EURASIP Journal on Wireless Communications and Networking

Table 1 Symbol and explanation of all parameters

Type	Symbol	Explanation
Cluster	DC_i	the ith data center i∈[1,N _dcs]
	B _ii	Bandwidth between nodes in DC_i
	V _dw	Speed of writing data to the local disk
Hadoop&HDFS	P _i	Partition size
	N _sm	Number of simultaneous maps executed
		in one node
	N _cr	Number of simultaneous reduces executed
		in one node
	\(N_{\text {cp\_threads}}\)	Number of i/o threads copy to one reduce
		node
	\(V_{\text {cp\_thread}}\)	Theoretical maximum copy speed of one
		copy thread
	\(V_{\text {reduce\_rep}}\)	Theoretical maximum output replication
		speed of one copy thread
	N _Spaths	Number of sort paths for copy
	N _reps	Number of replicas in HDFS
	S _buff	Sort buffer size for copy
App	DS_i	Input data size in the ith data center
	N _p	Number of partitions
	N _reduces	Number of reduces
	M _thruput	Average map throughput of each node
	R _thruput	Average reduce throughput of each node
	RIO_map	Ratio of map output to input size
	RIO_reduce	Ratio of reduce output to input size
Module	T _total	Total execution time
	T _prepare	Total execution time for raw data input into
		HDFS
	T _job	Total execution time for a job
	T _map	Time for a map wave
	T _copy	Time for a copy wave
	T _sort	Time for a sort phase
	T _reduce	Time for a reduce phase
	T _rp	Time for reduce processing
	T _ro	Time for reduce output writing
	N _mw	Number of map waves