Notes on Data Science Tools

Auto-sklearn: automatically searches for the right learning algorithm for a new machine learning dataset and optimizes its hyperparameters Website Github folder

Parallel Computing/HPC/Cloud Computing

Speed up Python

Intel Distribution for Python

Optimizes Python for Intel architectures using low-level, high-performance libraries like MKL. Can provide massive speedup for linear algebra routines and ML algorithms.

*Numba

Just-in-time compiler (using LLVM) for Python. Replaces slow Python code with optimized machine code at runtime. Super easy to use.

*swiftapply

Automatically vectorizes apply calls, or replaces them with the best alternative.

*Dask

Provides parallelism for analytics by extending arrays, dataframes, and lists to “parallel” versions that are ready for distributed environments, plus provides a dynamic task scheduler.

*Cython

Compile Python into C extensions. General use tool that can have more flexibility and power than simpler alternatives, at the cost of difficulty.

*PySpark

Runs Python code on distributed Spark clusters. Great for processing big data sets.

Kyle McKiou Blog

Agile Development

Visualization

Visual Vocabulary

Seaborn

GGplot

D3.js

Matplotlib

Tableau

QlikView

Git

Excel

Pivot Tables

Big Data Tools

5 Full Stack Data Science Technologies for 2020

AWS

Hadoop

MongoDB

Neo4j

Spark

Spark ML
Spark RDD

Flink Streaming

Hive

BigQuery

Hbase

Cassandra

Business Intelligence Software

PowerBI

KNIME

Alteryx

Qlik

OBIEE

Web analytics tools (GoogleAnalytics, Adobe, etc.)

Web scraping

Beautiful Soup

URLLIB

Scrapy

Design

6 Google Slides image editing hacks

Canva-Online Design Tool

Latex

Mathematics in R Markdown

Markdown

Basic Syntax

Math

Share on

Twitter Facebook Google+ LinkedIn

Li Liu

Interactive applications

Shiny App

Django

SQL

Functions

NoSQL databases

Questions

R

Missing values

Python Programming

Data Types

Function

Class

Data structure

Array Basic Sorting

LinkedList

Recursion

Heap

Queue and Stack

Binary Search

Binary Tree

Advanced Tree

DFS/BFS

HashTable

ML packages