Reference Annotations: The Beatles (and others) - a substantial collection of annotations for The Beatles catalogue, including chords, beats, keys, and large-scale structure.

Free Music Archive (FMA) - an interactive library of high-quality, legal audio downloads.

Government & non-profit data

Data.gov - the US government's open data resources.

The World Bank Open Data Catalog.

The CDC/National Center for Health Statistics have a number of public datasets for researchers to explore.

NASA Data Portal.

United States Geological Survey data and tools.

Miscellaneous

Kaggle.com contains a list of open datasets on a variety of topics, many of which are part of machine-learning and statistics competitions for (aspiring) data scientists.

data.world bills themselves as a "social network" for data people. A great place to host/share/collaborate on datasets, as well as a place to find datasets others have shared.

Datasets subreddit.

Open Data subreddit.

Corpora - "a collection of static corpora that are potentially useful in the creation of weird internet stuff." Contains lists of things like nouns, adjectives, animals, body parts, curse words, Spinal Tap drummers, ... you get the idea.

A big, long list of publicly available datasets from Quora.com users.

Awesome public datasets is one GitHub user's collection of publicly available datasets.