# Advanced Data Administration Making Informative Visualizations Program-TU .

Follow the work steps through the assignment. While working through the assignments you should do the following: 1. Create one single Word document that includes all of your code statements: In[], Out[], and plots, numbered according to the step of the assignment along with annotation and findings. Assignment 6-Hands-on Mini-Project Plotting and Visualization Making informative visualizations (sometimes called plots) is one of the most important tasks in data analysis. It may be a part of the exploratory process—for example, to help identify outliers or needed data transformations, or as a way of generating ideas for models. For others, building an interactive visualization for the web may be the end goal. Python has many add-on libraries for making static or dynamic visualizations. In this assignment, we will focus on matplotlib and libraries that build on top of it. To use matplotlib, we use the following import convention: In [1]: import matplotlib.pyplot as plt Numpy libraries are imported as well: In [2]: import numpy as np The numpy.random module supplements the built-in Python random with functions for efficiently generating whole arrays of sample values from many kinds of probability distributions. For example, you can get a 4 × 4 array of samples from the standard normal distribution using normal: In []: arr = np.random.normal(size=(4, 4)) In []: arr Out[]: array([[ 0.5732, 0.1933, 0.4429, 1.2796], [ 0.575 , 0.4339, -0.7658, -1.237 ], [-0.5367, 1.8545, -0.92 , -0.1082], [ 0.1525, 0.9435, -1.0953, -0.144 ]]) We can also generate an ordered list of data range with arange()function: In [3]: data = np.arange(100) In [4]: data Out[4]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]) To plot the above data range: In[]: plt.plot(data) 1 Out[]: [] To plot the cumulative sum, cumsum, of the data elements. In[]: plt.plot(data.cumsum(), ‘k–‘) # The ‘k–‘ is a style option instructing matplotlib to plot a black dashed line. Out[]: [] 2 Perform the following tasks that aim to extract the meaning from simulated data: 1. Create a new dataset, named for example myData, of hundred random elements using np.random generation method: myData= np.random.randn(100) 2. Plot the simple plot() and cumsum() plot of the data using (see above examples), and show the generated two plots (right-click on the created plot and copy or save the generated plot to include in your submitted work). 3. Using the generated data, myData, draw a histogram plot, which presents a bar plot that gives a discretized display of value frequency of the number of data points in each bin. Split the data points into the following number of bins: 10, 25, 50, to compare between the generated histograms with different bin numbers (save a copy of the generated plots by: right-click>copy or save image in your submitted work document): In[]: Histplot1 = hist(myData, bins=10, color=’k’, alpha=0.3) In[]: Histplot1 = hist(myData, bins=25, color=’k’, alpha=0.3) In[]: Histplot1 = hist(myData, bins=50, color=’k’, alpha=0.3) 4. To normalize the data, so that the sum of the data elements is 1, we use the following def function: In[]: def normalize(x): return (x – x.mean()) / x.std() In[]: norm= normalize(myData) 5. Plot both the simple plot() and cumsum() plot of the normalized data of step 4, and show the generated two plots ((right-click on the created plot and copy or save the generated plot to include in your submitted work). 6. Extract statistics like the minimum(), maximum(), mean(), std(), sort() or any exploratory data analysis like finding or counting the number of values greater than a certain value, e.g., (myData > .25).sum() 3