VSCSE Data Intensive Summer School

June 30-July 2, 2014
11am-5pm EST
McVey Hall Rm 327
Contact: Chandrima Dadi
For more information visit: VSCSE

See the agenda below for more information.


Monday, June 30
11:00-11:15am Robert Sinkovits Introduction and overview
11:15-12:15am Rachana Ananthakrishnan Globus Online for research data management [pptx, pdf,Video ]
12:15-12:45am Ilkay Altintas Workflows and data provenance [Video]
1:00-1:15pm Break
1:15-1:45pm Ilkay Altintas Introduction to Kepler and its features [Video]
1:45-2:15pm Shweta Purawat Demo session for getting started with Kepler
2:15-2:45pm Lunch
2:45-3:15pm Shweta Purawat Provenance framework in Kepler and reproducibility [Video]
3:15-5:00pm Ilkay Altintas

Distributed Computing on XSEDE and comodity clusters in Kepler [Video]

Introduction to Hadoop, Spark, and Fink engines in Kepler
Scalable bioinformatics using bioKepler
Tuesday, July 1
11:00-12:00am Rick Wagner File systems, hardware and the nuts and bolts of storage [pptx, pdf, Video]
12:00-1:00 Amarnath Gupta and Bill West Working with big data [pptx, pdf, Code and Data, https://www.dropbox.com/s/zsbxv7spmvahll0/map_reduce_data.tar.gz]
1:00-1:15pm Break
1:15-2:15pm Amarnath Gupta and Bill West Working with big data (continued) [Video]
2:15-2:45pm Lunch
2:45-5:00pm Amarnath Gupta and Bill West Working with big data (continued w/ optional break ~ 3:00pm) [Video]
Wednesday, July 2
11:00-12:00am Natasha Balac Introduction to predictive analytics and data mining [pdf, Video]
12:00-12:30am Nicole Wolter Overview of data mining tools [pdf]
12:30-12:45am Break
12:45-2:15pm Paul Rodriguez Unsupervised learning (PCA and clustering) [pdf, Video, https://www.dropbox.com/s/ei5bmfntm425pkd/AHW_1.csv, https://www.dropbox.com/s/4zdi2yvnsrb0f6x/QUEST_out_data.csv , https://www.dropbox.com/s/ei5bmfntm425pkd/AHW_1.csv, https://www.dropbox.com/s/77kvf4l4vhvtpbi/core_interactions.txt]
2:15-2:45pm Lunch
2:45-3:45pm Nicole Wolter and Natasha Balac Supervised learning (decision trees)[pdf, Video]
3:45-4:00pm Break
4:00-5:00pm Paul Rodriguez Techniques and strategies for big data [Video, pdf]