Author Archives: Data Monster and Insight Monster

R Animation – Can not get easier than this… Contributions from Gurus…

 

The above animations were courtesy of the works from the fifth link, below.

Animation is building a series of images and combining them as a movie.

There are three slightly different ways to do animation. 

(1) You may use package ‘animation’.  This has built it saving facility, to save Flash, GIF, HTML pages, PDF, and Videos, such as saveSWF(), saveGIF(), saveHTML(), saveLatex(), and saveVideo().  While HTMLpages are created using the R and javascript, we need PDF creator for embedding animated graphics with in PDF.

(2) Or you can use R, statistical calculations, and lapply and plot functions, to get codes to draw the sequence of gif files. Combine them using save.gif function available in R, which in turn needs a software in your system to be installed, called, imagemagik (Windows download) – available at the bottom of the site.  Here block updates as screen shots seems to work easily.  These you do them on your own. Going through these helps you understand the foundations of animations.

This approach uses ggplot to capture varying screen shots, with out using gganimate.

(3) Or you can use gganimate to do the animation.

All the methods fundamentally use the following pseudo code approach. (this snippet is from the first reference below)

ani.fun <- function(args.for.stat.method,
args.for.graphics, …)
{
{stat.calculation.for.preparation.here}
i = 1
while (i <= ani.options(“nmax”) & other.conditions.for.stat.method) {
{stat.calculation.for.animation}
{plot.results.in.ith.step}
# pause for a while in this step
Sys.sleep(ani.options(“interval”))
i = i + 1
}

# (i – 1) frames produced in the loop
ani.options(“nmax”) = i – 1
{return.something}
}

Whatever the method you follow, there is more power in animated graphics compared to static graphics. Go animaation! 

  • key references:

GGANIMATE:



Google Maps – Fusion Tables – A Great Resource for Business Intelligence Developers

Google Fusion tables resources are phenomenal.

Every researcher or BI developer could benefit by this powerful tool.

What fascinated me is that Google already perfected the mapping algorithms, and has the most popular richest repository of mapping assets.

One may wonder, why bother.  After all the county maps, the most often used version of mapping is made into pulp by R developers.

I love R and I recommend R system, for lots of its wonderful functionality.  However, I would say, R has a long way to go, to own or translate the Google map functionality.

Perhaps you may want to just see all the counties with unemployment rate below 3%, a number considered to be full employment, or just counties that exceed unemployment rate more than 10%.  This is an example of counties with more than 10%. Some parts of Kentucky, Mississippi, California seem to be worst hit.

The beauty of google maps and fusion tables is that

  1. you are in the league of the best practices of the world and hence no worries about missing counties, old shape files, …
  2. you are using the full functionality of google map (zoom/real time data possibilities)
  3. your map works can tap into deeper levels of tiger files at all levels of census.gov data availability.

The key learning site you need to use are:

To bring together the shape files in one bundle in the right way for fusion tables to work, use the link http://www.poynter.org/2011/how-to-map-data-onto-counties-districts-using-shpescape/141788/

http://www.smalldatajournalism.com/projects/one-offs/mapping-with-fusion-tables/#foreign-keys-and-unique-ids

Have fun. It is a liberating feeling.  Phew!  Maps have been in my radar for a long time.  Waiting to identify the one that is needed at the foundation level so that the foundation best practices permeates at all levels of functionality is all the worth.

To complete the notes, I also want to bring the following to your attention – ggmap tutorial.  This is a great one, as long as you do not need the dynamic zooming facility.

https://journal.r-project.org/archive/2013-1/kahle-wickham.pdf

Also, if you want to work with all types of census.gov shape files, it may be argued that you can do that with ggmap.  I find google maps and fusion tables more standardized and easier.

The quick start (cheat sheet) ggmap guide is this two page pdf. https://www.nceas.ucsb.edu/~frazier/RSpatialGuides/ggmap/ggmapCheatsheet.pdf

Here is another blogger who uses ggmap with well articulated application.

http://www.kevjohnson.org/making-maps-in-r-part-2/

More country level maps: https://www.students.ncl.ac.uk/keith.newman/r/maps-in-r#countries

Social Network Analysis – Visualization and Applications

Downloading Gephi

https://gephi.org/users/download/

Introduction to Gephi;  installing and exploring

Modularity and desired number of communities

Filtering Networks

Communication neworks

Predicting epidemic using social networks

SNA and Fraud Detection

Using R in SNA (In this particular example it is network analysis applied in documents), using package iGraph

http://www.r-bloggers.com/an-example-of-social-network-analysis-with-r-using-package-igraph/

The free ebook with all codes are available here: http://www.rdatamining.com/books/rdm/code

You want weekly class notes, here is Univ. of Maine lecture notes.

http://www.umasocialmedia.com/socialnetworks/course-lectures-fall-2015/

Stanford University 9 lectures on SNA

http://sna.stanford.edu/lab.php?l=1

http://sna.stanford.edu/lab.php?l=2

http://sna.stanford.edu/lab.php?l=3

http://sna.stanford.edu/lab.php?l=4

http://sna.stanford.edu/lab.php?l=5

http://sna.stanford.edu/lab.php?l=6

http://sna.stanford.edu/lab.php?l=7

http://sna.stanford.edu/lab.php?l=8

http://sna.stanford.edu/lab.php?l=9


Some Top Techniques and Tools for a Data Scientist

  1. How to find the optimum parameter values for a curve fitting problem. Stochastic Gradient Descent Method for Finding Local Optimum.

This is a commonly used numerical optimization technique. Some times, this is also called batch gradient descent algorithm.

2. How to avoid over-fitting problem? Use Regularization.

3. Sparse Matrix Based Prediction – GLMNET in R


4. Naive Bayes Modeling when there are many many conditioning variables

5. Out of Necessity, Real Time Application of Naive Bayes Application – Spam Deduction

6. Fisher LDA and Bayesian Classification

7.Lagrange Multipliers – A Simple Intro

8. Density Based Spacial Clutering Algorithm With Noise – DBSCAN

9. Meanshift Clustering with Scikit-learn and Python

Basics of Matrix Algebra (updated: 14Nov15) – It all starts with solving a well defined system of linear equations – Please come back and see how this is growing… yes, in deed.

The linked PPT below gets modified as I improve, some times as simple as correcting typos.

Today (11/14/15), I uploaded a version that included eigen value-eigen vectors decomposition.  This is the foundation that are utilized in the PCA, Factor analysis and many other multivariate methods that helps reduce the variables/dimensions in data.  A important tool that will also help with big data in reducing the big data with out loosing its information content.

Today (9/4/15), I added the actual calculation of inverse by row echelon transformations.  Today(9/12/15), I added a slide on what is meant by RowEchelon matrix and its properties. It is beautiful how we can dig all the properties of linear algebra using solving systems of linear equations as foundations.

Basics of Matrix Algebra

All the way up to row echelon sweeping method to solving for linear equations vs. finding inverse of a matrix.

With that in mind, I found these valuable videos on Elementary matrices, Part 1 where the author defines elementary matrices.

Part2:   Elementary matrices are invertible

A square matrix is invertible if and only if it is a product of elementary matrices

MIT – A full semester course on Linear Algebra for those who are interested in the advanced learning of matrices. The lectures 2 to 34 can be traced on the right column of the youtube references, sequentially.

Extensions of this lead one to appreciation of why the row echelon type calculations in linear programming method with slack variables work.

It will also lead to eigen values, eigen vectors, matrix decompositions, Singular Value Decomposition, and generalized inverses of matrices.

Oh…all the beauty; there is beautiful order in life…

CNN on Big data – A great collection…

 

Internet of everything:

http://www.cnn.com/video/data/2.0/video/business/2014/10/03/cnn-orig-what-is-the-internet-of-things.cnn.html

Human Face of Big data:

http://www.cnn.com/video/data/2.0/video/bestoftv/2012/12/05/exp-point-smolan-big-data-book.cnn.html

Cyberazzi are watching:

http://www.cnn.com/video/data/2.0/video/bestoftv/2012/08/22/ebof-pkg-lavandera-big-brother-watching.cnn.html

Predicting ahead using social media:

http://www.cnn.com/video/data/2.0/video/health/2011/02/02/sot.wolfe.public.health.privacy.cnn.html

A way to achieve Excellence in Teaching – Make it Simple, Fun, Learning

2015 Nicholas Baxter | Log in | Website by Kim Maida
(nbaxtor.com)
1. Make it SIMPLE – FUN – LEARNING
1.1:   SIMPLE means do not use jargon. Even for the right terminologies wait until all the concepts are explained and then finally say that this is what is called ….
1.2:   SIMPLE means talk about the stories behind the concepts, rather than mathematics and symbols, until, finally one may end up saying why the equation of a certain type exists and what are the structural assumptions for the goodness of the process.  For example, not talking about the jargon “linear model” until the full story and various extensions of Galton’s data on father’s heights vs. son’s heights are well discussed. (Data and R codes here: http://blog.crmportals.com/2015/06/11/classic-fransis-galton-heights-of-fathers-and-sons-data-set-and-the-simple-regression/)
2. Make it FUN
2.1  FUN means they want to learn little bit more and they excitedly share what they learnt with their buddies.  Making it as team work to solve additional (the second example) problems is an example of making it fun; reducing the steps to simple parts so that they can predict what is the next step in expanding the understanding is a way of making it FUN
2.2  FUN also means they are happily interacting with the teacher and with other students
2.3  FUN also means they are ready to spend some extra time to make it more elegant, artsy, and demonstrative.
3.  Make it LEARNING EXPERIENCE
3.1 LEARNING EXPERIENCE means they can claim they know a way to solve a particular type of problem.  Students can see problems of similar types and expand their questions and or confirm their understanding, including caveats in expanding the scope of the solution.
3.2 LEARNING also means students can help other students
3.3  LEARNING also means they know how to articulate small creative extensions of the standard problem
With this in mind, we want to follow the following 1,2,3 methods.
1 – For each type of problem type
2 – there will be two problems solved in the class; first one as introduction of the problem by the teacher, and second one with a  class group(team) problem solving approach
3 – Additional third and more problems are given as home work with small variations only
(they may reach out to the faculty if they need help)
Exams problems are from the list but possibly with small variations in data/parameters.
At the end of each class, please get the feed back, 10 minutes before the class is over, whether every one understood the problem solving methods.  The 10 minutes is, possibly, to spend one more round of explaining slowly on how to solve the problem.
Overall, talk slowly and make sure you are logically walking with the student through the topics.
Towards this, we want to collect the top __ (40?) problem types in R programming, and __ (40?) problem types in BOSA.  I put more numbers in BOSA, because there are so many different concepts in this foundations course.  This will become part of our “Question Banks” as well as our “core problem types”
We want to submit these to the department at the end of the term as teaching aids and accomplishments, along with the Exam Papers/Quiz papers, their solutions, and grades.
I am looking forward to the “Problem types”/”Questions Bank” document.