These are links to things I’ve found on the web for everything from math, programming, utilities, research, papers, and videos. This page is mainly for my reference (to keep track of everything I use regularly), but maybe it can save you some time as well.
Online Tools
Search and Information
Wikipedia (random article): You should know this one already!
- WolframAlpha: computational knowledge engine (google on steroids)
-
Understands math, natural language, web queries and more. Seriously just ask it anything. Saves a lot of time and effort instead of having to program and google things.
- NumCalc: numerical/scientific web calculator
-
Useful for quick calculating when you need arbitrary precision, symbolic computation, or special functions quickly.
Content Creation
- Runway ML studio
-
Web-based video editing and content creation studio that uses ML models (check out: Gen-2)
- pandoc: free document conversion online, no BS
-
Useful for quickly converting between markdown, HTML, , and other formats in the browser. There’s also a command line version: install pandoc
Interesting Stuff
Math and Science Visualizations
Feature Visualization: How neural networks build up their understanding of images
My Favorite Software
ML/AI Models You Can Use Today
pharmapsyhchotic.com: Tools and Resources for AI Art
ebsynth.com: 3D-aware style transfer
Utility Libraries
zstd: Fast Compression Algorithm
Math and Science Libraries
- PyTorch: Machine Learning Framework
-
Flexible and easy to use, more generalizable than Tensorflow and better for research. Includes automatic differentiation, GPU support, and a large ecosystem of libraries.
-
Some extra PyTorch packages I use:
-
- PyTorch3D: 3D differentiable rendering and geometry for deep learning
-
- DEODR: Another differentiable renderer
-
- xFormers: PyTorch implementation of Transformer models
-
And, if you’re interested in “hacking” PyTorch or writing your own backend, check these out:
- OpenCV: Computer Vision Library
-
Bulky but comprehensive library for computer vision, uses Python bindings.
- SOD: An Embedded Computer Vision & Machine Learning Library
-
Edge computing is awesome! For IoT/embedded needs this is way easier than OpenCV.
- MAGMA: Matrix Algebra on GPU and Multicore Architectures
-
GPU-accelerated library for linear algebra (BLAS & LAPACK) that I worked on, also includes some sparse linear algebra.
libbf: Arbitrary Precision Floating Point Library
- GMP: GNU Multiple Precision Arithmetic Library
-
Arbitrary precision arithmetic library for C, which is very popular but not the best in my opinion.
-
See also:
-
- Also, check out all the other math libraries at multiprecision.org
- FLINT: Fast Library for Number Theory
-
Symbolic computing library for solving/evaluating number theory problems.
Datasets and APIs
Text Datasets
- OpenWebText2, by EleutherAI
-
Large text database, generated from positive voted Reddit links
Common Crawl dataset - Common crawl of the entire web
Image Datasets
Video Datasets
- Vimeo-90k
-
High quality dataset from Vimeo videos
DeepVideo, with Sports-1M - Sports-1M dataset, scraped from YT
Audio Datasets
3D Datasets
Multi-Modal Datasets
- LAION-5B: A new era of open large-scale multi-modal datasets
-
A new large dataset, used to train Stable Diffusion, but also freely available as subsets for individuals who don’t have 240TB of storage for the full dataset
-
Also, has lots of good metadata on the considerations that went into it, and the challenges of creating a large dataset
Research Areas
Computer Science
- The Art of Computer Programming, by Donald Knuth
-
Possibly the best book on computer science ever written, deals primarily with algorithms and their implementations
- Modern Computer Arithmetic, by Richard Brent and Paul Zimmerman
-
A useful book for implementing bignum arithmetic, goes into many, many algorithms and special cases. Basically all you need to write your own MPFR/libbf/GMP library
- Advanced Programming in the Unix Environment, by Richard Stevens
-
Heavily recommended for C programming, teaches the C standard library for UNIX OSes. I’ve got the physical book and it’s great for perusing
The Cerberus C Semantics: An in-depth exploration of the C lanauge. If you’re interested in writing compilers and designing new languages, C is a master class in both what can go right and wrong for a language.
Structure and Interpretation of Computer Programs
- The Humble Programmer, by Edsger Dijkstra
-
Dijkstra has (almost) all the correct opinions about programming… A must read!
- Design Principles Behind Smalltalk, by Daniel H. H. Ingalls
-
Smalltalk is incredibly important to understand the languages that came after it, because it was designed with a great purpose and vision.
-
Even if you have heard about Smalltalk, READ THIS. Unfortunately, if you just believe the common view that “Smalltalk is an object oriented language” you have fallen victim to the propaganda. The best contributions of Smalltalk are the fact that the objects send messages to each other, and that atomistic communication between objects is actually the benefit of OOP. To quote the paper: “Purpose of Language: To provide a framework for communication”.
-
All OOP languages that came after but don’t implement message passing are failing to realize the true benefit of OOP.
- Blub Paradox, by Paul Graham
-
You can’t trust the opinions of the others, because of the Blub paradox: they’re satisfied with whatever language they happen to use, because it dictates the way they think about programs.
- Codata in Action
-
My favorite explanation of how codata can actually be used. Essentially, it works as encoding control flow and order on top of normal data. I think this is something new programming languages need to use as it is lazier and easier to reason about in many cases.
-
I’m surprised this came from Microsoft… One of the few times they have positively affected programming.
- GOTO Considered Harmful, by Edgar Dijkstra
-
Why
goto
statements (i.e. unstructured control flow) are bad. It’s so important to understand this, because even allowing low-level unstructured control flow inhibits the ability for optimizers and static analyzers to do their job. Not to mention the ability of programmers to reason about the code. -
On a meta-note, this paper established the “X considered harmful” title meme, which is still used today.
- Structured Programming with
goto
statements, by Donald Knuth -
Great history about the
goto
debate, with a lot of interesting anecdotes and analogies to the world of mathematics. Although Knuth was probably in the wrong here (at least, in our modern view) in suggestinggoto
has valid uses, it’s refreshing to hear a different perspective on the matter. - Notation as a Tool of Thought, by Kenneth Iverson
-
Important work by a Turing award winner that explores how notation and language can affect our thinking. So often overlooked is the fact that “ugly” or otherwise “bad” syntax is conducive to worse quality code, and conversely that “pretty” syntax can lead to better code.
-
All syntaxes are not created equal! We should strive for syntaces that are easy to read and write, and that are easy to reason about.
Theory of Sets, Types, and Categories
- Algebraic Subtyping (thesis), by Stephen Dolan
-
A long, grueling tour of algebraic subtyping, but there are a lot of good nuggets in there. Also great to get acquainted with the notation
- Cogent: uniqueness types and certifying compilation
-
Great end-to-end example about the theory of uniqueness types
- Types, Abstraction, and Parametric Polymorphism, by John Reynolds
-
A great theory paper explaining distinctions between “types”, “sets”, and some of the problems with common conceptions we have about programming and math. A must-read for anyone making a new programming language, so as to not repeat the mistakes of the past and thinking with a mathematical mindset.
- An Expirement with Inline Substitution, Rice University
-
Results are dated, but a good example of a historical note where inlining did not aid in perforance. Of course, nowadays it is absolutely required due to the more abstract nature of programming
Programming Languages and Compilers
Polyhedral Compilation
Graphene: An IR for Optimized Tensor Computations on GPUs
- Diesel: DSL for Linear Algebra and Neural Net Computation on GPUs
-
Example of a language geared at numerics-heavy compilation (focusing on neural networks). I actually ended up working with the authors of this paper as part of my NVIDIA internship
- PolyJIT: Polyhedral Optimization Just in Time
-
Application of JIT techniques with polyhedral compilation
slides: Polyhedral Compilation as a Design Pattern for Compiler Construction
ML/AI Research
- GLM-130B: NLP Model for Text Generation
-
Better than GPT-3 at most things, available in different sizes, and free to download and use
- GPT 3: NLP Model for Text Generation
-
State of the art in text generation, at 176 billion parameters this model is just too large to run yourself. You can run it using OpenAI’s API to GPT-3
-
Here’s a walkthrough of GPT architecture. This is the best overall explanation I’ve seen
- Real-ESRGAN: Image Super Resolution
-
For image super resolution (just say: AI, enhance and zoom image!), this is the best deployed general solution I’ve seen
- RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation
-
For video frame interpolation (just say: AI, increase frame rate fluidly!), this is the best downloadable model out there. This is kind of controversial because different models do different things better but I like RIFE the best
-
I use hzwer/Practical-RIFE, which promises better aesthetics and is easier to use
-
I also use nihui/rife-cnn-vulkan, which runs on Apple Silicon and has a nice interface (although you’ll have to use FFMPEG in addition)
- HIFIC: High-Fidelity Generative Image Compression
-
This model can be used to get insane results in image compression (much better than JPEG)
- AIVC: Artificial Intelligence-based Video Coding
-
Although not as much of a upgrade as HIFIC is for images (relative to existing codecs), this is still interesting research as we wait for a superior one to emerge…
- NNCP: Lossless Data Compression with Neural Networks
-
Neural networks typically aren’t easily made into lossless compressors, but this implementation gives state-of-the-art results (albeit with slow compression speed) for text compression
Consistent Video Depth Estimation
Magenta Colab Notebooks: ML music resources
- LAVIS: A one-stop library for language-vision intelligence
-
This library can do anything from image captioning to image classification to image generation. It’s great to quickly integrate into your own projects
- bRigNet: Automatic neural net 3D character rigging in Blender
-
Check out the code here: pKrime/brignet
- Deep Motion Editing: deep learning for 3D character motion
-
Motion style transfer, retargetting, and more 3D animation features
torch-ngp: Neural Graphics Primitives
Awesome Neural Rendering (curated)
- ACT-1: Transformer for Actions
-
A very ambitious research/product that aims to create a transformer that “can do anything a human can do in front of a computer”
-
This is a first of it’s kind that I think will end up being the primary way we interact with computers in the future. Seriously this thing is cool AF!
- WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing
-
Interesting article about WebGPT, a model meant to surf the web to answer questions
- Training Compute-Optimal Large Language Models (Chinchilla), by DeepMind/Google
-
Chinchilla, better and smaller than GPT-3. Also, this paper has a great introduction that explains broad ideas in ML/AI. Notable for also considering compute efficiency (LLMs are getting expensive, so this is becoming more important)
- Video Diffusion Models
-
Soon-to-be-outdated, but an interesting paper about using diffusion models for video generation
- Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image
-
Basically an infinite 3D fractal zoom into landscapes, works fairly well
- Dream Fusion Paper: Text-to-3D using 2D Diffusion
-
Interesting proof of concept of using diffusion models to generate 3D scenes from text, with no 3D data or training required! Uses optimization like DeepDreaming, unlike typical ML models which train then use inference
Deep Positron: A Deep Neural Network Using the Posit Number System
Math
- The Elements
-
Written by Euclid around 300BC, this book is a good introduction to the basics of mathematics starting with the basics of geometry
-
In my opinion, this should be the first mathematics textbook for schools to use. It’s ridiculous that most schools to teach students geometry without using Euclid’s work. Anything to overpay the private companies that produce US textbooks, I guess…
- Prime Numbers and the Riemann Hypothesis
-
Very useful book, for people of all backgrounds (not just mathematicians) that explains prime numbers, number theory, and the Riemann Hypothesis. Gives multiple formulations, diagrams, and explanations. My favorite book on my favorite problem in all of mathematics (so far)!
- On the Number of Primes Less Than a Given Magnitude
-
Possibly the most influential (and yet still underrated) paper in all of mathematics, I highly recommend this paper. Check out my blog post on the Gamma/Zeta function implementations
- Counterexample To Euler’s Conjecture on Sums of Like Powers
-
One of my favorite papers, although not particularly explanative. A computer-assisted dis-proof of one of Euler’s conjectures
Fast constant-time GCD computation and modular inversion
- Machine Learning-Aided Numerical Linear Algebra: Convolutional Neural Networks for the Efficient Preconditioner Generation
-
And associated talk/slides
The FBHHRBNRSSSHK-Algorithm For Multiplication in Z_{2}^{5x5} is Still Not The End of the Story
Deep Programming Lore
- TempleOS: a truly impressive operating system written by a lone schizophrenic developer, according to his perceived “revelations from God”
-
the case of Terry Davis is tragic, but the story of his life and work is fascinating and important to understand. Due to his online presence, he may be one of the best-documented schizophrenia cases in history.
-
- Video by Fireship: TempleOS in 100 Seconds
-
- Video by Linus Tech Tips: I’ve never seen ANYTHING like this before… TempleOS
-
- Video by Fredrik Knudsen: Down the Rabbit Hole: TempleOS
Paul Le Roux: a South African programmer who became a cartel boss, arms dealer, and drug trafficker. Creator of the well-known Encryption for the Masses (E4M) program.