This study finds that poll data on consumer confidence and presidential job approvals can be approximated with straightforward sentiment analysis of Twitter data.
The idea of aggregate sentiment is particularly interesting--where the errors are treated as noise which is expected to cancel itself out in aggregate. They point to Hopkins and King (2010) to show that standard text analysis techniques are inappropriate for assessing aggregate populations. Further, they provide some evidence from their own experiment: they mention that filtering out "will" (which is treated as positive sentiment despite being a verb sense, since they don't do POS tagging). However, they mention one caution: errors could potentially correlate with information of interest, such as if certain demographic groups might tweet in ways that are harder to analyze.