We have made available Facebook and Twitter raw data from the State of the Union: http://discovertext.com/sotu.aspx The tweets are for the two days leading up to SOTU and the Facebook content goes back about 6 months. * Huffington Post (Twitter, de-duplicated) (14,476 documents, 11 MB) * Redstate (Twitter, de-duplicated) (1,571 documents, 1 MB) * Sean Hannity (Twitter) (1,170 documents, 930 KB) * #SOTU (Twitter, de-duplicated) (53,369 documents, 40 MB) * #SOTU (Twitter, full dataset) (109,601 documents, 82 MB) * Sarah Palin (mentions on Twitter in #SOTU feeds) (1,271 documents, 927 KB) * Whitehouse Official Facebook Page (Facebook) (423,358 documents, 315 MB) * Obama (Twitter, de-duplicated) (116,776 documents, 87 MB) * Obama (Twitter, full dataset) (222,441 documents, 166 MB) To what end? The automated classifiers really don't tell anything useful, thought they do make pretty pictures: http://blog.texifter.com/index.php/2011/02/03/text-analysis-during-the-2011-... I guess the hope is that other users will dive into particular slices of the data and do content analytic or interpretive qualitative work that let's us know if all the chatter really does matter. Meanwhile, we are working on some better classifiers in the lab. ~Stu -- Stuart Shulman President & CEO Texifter, LLC <http://www.texifter.com/> Have you tried DiscoverText? http://discovertext.com *Featuring the Facebook Graph & Twitter APIs*