{"id":37601,"date":"2016-07-14T17:01:01","date_gmt":"2016-07-14T11:31:01","guid":{"rendered":"http:\/\/www.tothenew.com\/blog\/?p=37601"},"modified":"2016-07-14T17:01:01","modified_gmt":"2016-07-14T11:31:01","slug":"concurrency-and-parallelism-using-python","status":"publish","type":"post","link":"https:\/\/www.tothenew.com\/blog\/concurrency-and-parallelism-using-python\/","title":{"rendered":"Concurrency and Parallelism using Python"},"content":{"rendered":"<p>As we know Python supports multiple approaches for concurrent programming with threads, sub-processes and some other ways which could help achieving solutions built on multiple CPUs or multi-core CPU.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone  wp-image-37610\" src=\"\/blog\/wp-ttn-blog\/uploads\/2016\/07\/Screenshot-from-2016-07-14-105339.png\" alt=\"Screenshot from 2016-07-14 10:53:39\" width=\"488\" height=\"177\" \/><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">I tried implementing something similar on my existing use case for <\/span><a href=\"http:\/\/www.tothenew.com\/blog\/aws-security-re-check\/\"><span style=\"font-weight: 400;\">AWS Security Re-Check<\/span><\/a><span style=\"font-weight: 400;\"> where I was running a check on AWS for my responsibilities over security. Executing that along with other similar scripts for other categories consumed a lot of time since they were running as a single process and in a sequential fashion. Also since it was a single sequential process, it was able to make use of a single core at a time. So to counter this, I tried to bring parallelism in my script. Let me tell you the two ways I tried with:<\/span><\/p>\n<h2><span style=\"font-weight: 400;\">Multiprocessing<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">It is the usage of more than one CPU within a single machine. In Python, for <\/span><a href=\"https:\/\/docs.python.org\/2\/library\/multiprocessing.html\"><span style=\"font-weight: 400;\">multiprocessing<\/span><\/a><span style=\"font-weight: 400;\">, there is a package named multiprocessing, By multiprocessing package, a user can fully leverage the multiple processors on a machine. Process are produced by creating objects of Process class of multiprocessing package. Each process has a separate memory space so it is difficult to share objects in multiprocessing.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In my <\/span><span style=\"font-weight: 400;\"><a href=\"https:\/\/gist.github.com\/sakshi1995\/6154698d14f67bcde534b163f6df3557\">script<\/a>,<\/span><span style=\"font-weight: 400;\"> I had multiple functions, and I divided each function into a process and started all the processes, by this I could now run many processes concurrently which earlier used to take place in series, hence reducing the execution time. By multiprocessing, all CPUs of my machine are parallelly processing different functions of my script, also, now the total time to run that script reduced by about ~50%.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">These functions were being called serially initially, like:<\/span><\/p>\n<p>[sourcecode language=&#8221;python&#8221;]<br \/>\ndef ServerHosting():<br \/>\ndef Security_group_opened_for_all():<br \/>\ndef Cloudtrail_Status():<br \/>\ndef MFA_enabled():<br \/>\ndef Access_Permissions_on_S3():<br \/>\ndef iam_key_rotate():<br \/>\n[\/sourcecode]<\/p>\n<p>But, after multiprocessing, these were called parallelly as independent processes.<br \/>\nIt was achieved as follows:<\/p>\n<p>[sourcecode language=&#8221;python&#8221;]<br \/>\nfrom multiprocessing import Process<br \/>\ndef runInParallel(*fns):<br \/>\n\u00a0 \u00a0 \u00a0 \u00a0proc=[]<br \/>\n  \u00a0 \u00a0 \u00a0for fn in fns:<br \/>\n \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0process=Process(target=fn)<br \/>\n \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0process.start()<br \/>\n \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0proc.append(process)<br \/>\n\u00a0 \u00a0 \u00a0 \u00a0for process in proc:<br \/>\n \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0process.join()<br \/>\nrunInParallel(ServerHosting,Security_group_opened_for_all,cloudtrail_Status,MFA_enabled,Access_permissions_on_s3,iam_key_rotate)<br \/>\n[\/sourcecode]<\/p>\n<p>By creating an\u00a0object and calling function start() of Process class, the process is started. join() function indicates the program to wait for the process to complete before proceeding to the main process.<\/p>\n<p><span style=\"font-weight: 400;\">Hence, all functions that have to be executed as separate processes can be run by creating a function (here runInParallel), wherein function can run as an individual independent process. \u00a0<\/span><\/p>\n<h2><span style=\"font-weight: 400;\">Multithreading<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">In multithreading, multiple lightweight processes called threads are created which share the same memory pool so precautions have to be taken, or two threads will write the same memory at the\u00a0same time. In Python, for multithreading, there is an inbuilt package \u2018<\/span><a href=\"https:\/\/docs.python.org\/3\/library\/threading.html\"><span style=\"font-weight: 400;\">multithreading<\/span><\/a><span style=\"font-weight: 400;\">\u2019. By creating objects of Thread class of multithreading package, threads can be easily created.<\/span><strong><strong>\u00a0<\/strong><\/strong><\/p>\n<p><span style=\"font-weight: 400;\">In my <\/span><a href=\"https:\/\/gist.github.com\/sakshi1995\/6154698d14f67bcde534b163f6df3557\"><span style=\"font-weight: 400;\">script<\/span><\/a><span style=\"font-weight: 400;\">, I divided each function into a thread and started them, so now several threads were running concurrently using the same memory pool. Multithreading also utilizes all CPUs of the machine. It also reduced the execution time of the script by more than 50%.<\/span><strong><strong>\u00a0<\/strong><\/strong><\/p>\n<p><span style=\"font-weight: 400;\">These functions were being called serially initially,<\/span><strong><strong>\u00a0<\/strong><\/strong><\/p>\n<p>[sourcecode language=&#8221;python&#8221;]<br \/>\ndef ServerHosting():<br \/>\ndef Security_group_opened_for_all():<br \/>\ndef Cloudtrail_Status():<br \/>\ndef MFA_enabled():<br \/>\ndef Access_Permissions_on_S3():<br \/>\ndef iam_key_rotate():<br \/>\n[\/sourcecode]<\/p>\n<p><strong><strong>\u00a0<\/strong><\/strong>But, after multithreading, these are called parallelly as independent threads.<br \/>\nIt was achieved as follows:<strong><strong>\u00a0<\/strong><\/strong><\/p>\n<p>[sourcecode language=&#8221;python&#8221;]<br \/>\nfrom multithreading import Thread<br \/>\ndef runInParallel(*fns):<br \/>\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0proc=[]<br \/>\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0for fn in fns:<br \/>\n\t\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0    \u00a0p=Thread(target=fn)<br \/>\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0p.start()<br \/>\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0proc.append(p)<br \/>\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0for p in proc:<br \/>\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0p.join() runInParallel(ServerHosting,Security_group_opened_for_all,cloudtrail_Status,func_MFA_enabled,func_access_permissions_on_s3,iam_key_rotate)[\/sourcecode]<\/p>\n<p><strong><strong>\u00a0<\/strong><\/strong><span style=\"font-weight: 400;\">By creating an\u00a0object and calling function start() of Thread class, the thread is started. join() function indicates the program to wait for the process to complete before proceeding to the main thread.<\/span><strong style=\"line-height: 1.71429; font-size: 1rem;\"><strong>\u00a0<\/strong><\/strong><\/p>\n<p><span style=\"font-weight: 400;\">I tested executing two scripts on three different machines, and the time they consumed in executing scripts has been displayed graphically below. Taking a look at these observations, you would understand how concurrency and parallelism in Python decreases the execution time hence improving the efficiency of the script.<\/span><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-37603\" src=\"\/blog\/wp-ttn-blog\/uploads\/2016\/07\/img3.png\" alt=\"img3\" width=\"600\" height=\"371\" \/><\/p>\n<p>&nbsp;<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-37604\" src=\"\/blog\/wp-ttn-blog\/uploads\/2016\/07\/img4.png\" alt=\"img4\" width=\"600\" height=\"371\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Thus, Parallelism and Concurrency not only increases the efficiency by utilizing all cores of a machine, but also by cutting off the execution time of the script. <\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>As we know Python supports multiple approaches for concurrent programming with threads, sub-processes and some other ways which could help achieving solutions built on multiple CPUs or multi-core CPU. &nbsp; I tried implementing something similar on my existing use case for AWS Security Re-Check where I was running a check on AWS for my responsibilities [&hellip;]<\/p>\n","protected":false},"author":928,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":1},"categories":[1174,2348],"tags":[1557,3750,3747,3748,3749,1358],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/37601"}],"collection":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/users\/928"}],"replies":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/comments?post=37601"}],"version-history":[{"count":0,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/37601\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/media?parent=37601"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/categories?post=37601"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/tags?post=37601"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}