Back

Speaker "Kaiqing Fan" Details Back

 

Topic

1. SAS Techniques to Handle Big Data Execution time

2. Plot big Data using SAS

Abstract

1. SAS Techniques to Handle Big Data Execution time
As a Sr. Data Scientist, Sr. SAS Tech Lead, we are always struggling with the big data execution, and especially long execution time of our SAS engines, sometimes it would be couple hours, sometimes it may be more than 40 hours, or much longer even we use many servers and memories. Too long execution time would not be acceptable. Actually I have many innovative SAS technical skills, if you can use them properly, we can hugely shorten the execution time. I did it. I successfully shortened the execution time of many SAS engines from 136 hours to around 2 hours, from 9 hours to 20-30 minutes, 3.5 hours to 6 minutes. Here I want to summarize most of the technical skills I used and share them with you.

2. Plot big Data using SAS
Many data scientists are discussing big data visualization but mostly they only focus on theoretical researching and talking, so far I have not found anybody provides any feasible way or software engine, which can easily plot graphics from big data resources because big data contain huge amount of information, huge amount of variables and observations. How can we figure out how to plot big data?

Here, I would like to provide one very good and easy solution to solve this big trouble through SAS Enterprise Guide software. Because it is kind of complicated task, so I split it into 3 steps based on three big challenges:

Step 1: provide one table with all required parameters such as color, graph type, thickness, height, width, and image type … based on the graphics they need. Through this table, any user can define these required parameters by themselves. It would be very friendly for other users and software developers to follow.

Step 2: In Step 1, there may be hundreds of parameters for each single graphics, if we are required to plot composite graphics such as putting 4 or 9 graphics on one page, the number of parameters would 4 or 9 times of then on each single graphics. It would be huge trouble to pass huge amount of parameters into the graphic software engine. Here I provide one easy and friendly way to do the tough work beautiful.

 

Step 3, how to calculate the graphic y-axis tick-values as perfect as possible is the final challenge because the data number can be any numeric --- from negative infinity to positive infinity. I have not seen anybody provided a calculator to solve this challenge. But I figured out a very good calculator, which is a complicated mathematical calculation, and works so well that we already have executed this solution for thousands of thousands time, it produces beautiful and good graphics.

Profile

Sr. SAS Tech Lead, Sr. Data Scientist, Sr. SAS and R Developer, Statistician with 10 year experience in software programming, 3 year in statistical Analysis and developing SAS engines; Professional in handling big data, massive large data files, optimization of software engines, automation execution of SAS engines, hugely cutting of SAS engines' execution times (which can beat python performance on SPARK); I got my Master degree of Statistics from University of Wyoming, Master degree of Applied Mathematics from University of New Orleans, Master degree of History from East China Normal University, Bachelor degree of History from Liaochen University.