Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
Search the Full Text of Nonprofit Tax Records for Free (www.propublica.org) similar stories update story
124 points by walterbell | karma 84571 | avg karma 5.55 2019-06-07 06:21:16 | hide | past | favorite | 54 comments



view as:

The linux foundation results on here are quite interesting. they're showing pretty rapid "revenue" growth from $39M in 2015 to $61M in 2016 to $81M in 2017.

I'm guessing this is, in no small part, down to increasing conference/event revenues.

It'll be interesting to see what they do with all the additional cash.


I don’t think so. Corporate sponsorships are a much better source of revenue.

Well sponsorships are part of conf. revenue in a lot of cases.

I've looked at the sponsor packs for the CNCF conferences, and those higher tiers do not come cheap.

Also don't underestimate smaller individual payments * very large number.

E.g. Kubecon Barcelona, tickets were $900+ each and there were 7700 people, so we're talking almost $7m from ticket sales alone.

Now the venue ain't cheap but that + all the sponsor cash == a fair profit, I'd expect.


look here on page 9:

https://pp-990-rendered.s3.amazonaws.com/201823199349304962_...

it breaks down the income into broad buckets


Our member lists are public and the price of membership at every level is public.

ETA:

look here on page 9:

https://pp-990-rendered.s3.amazonaws.com/201823199349304962_...

it breaks down the income into broad buckets


Yep, wrong about that. 63M revenue for conferences, 12M for sponsorships and 5M for training.

Though I bet a lot of that 63M is conference sponsorships.

Source: https://projects.propublica.org/nonprofits/organizations/460...


$37 million for salaries, $32 million "other expenses", and $12 million that weren't spent.

see https://projects.propublica.org/nonprofits/organizations/460...

(I planned to make a smart-ass comment about this story literally being about an interface to answer such questions. But "other expenses" is just slightly to meaningless to declare victory here)


yes unfortunately all that these forms tell us is, more money is coming in and more money is being spent, not really what it's being spent on (and I'd imagine the 2018 numbers will show another big jump in revenue if the growth of CNCF projects has anything to do with it)

From having had a look round the Linux Foundation site, I couldn't easily see a lot of information about what they spend their increasing revenues on, although I may well have missed it...


I also looked there first and was surprised that they apparently do not publish their financial reports themselves.

It's not a major issue, since the data is disclosed on the government side, but it's just bad service and leaves an impression of in-transparency.


CNCF's 2018 annual report has some useful info: https://www.cncf.io/cncf-annual-report-2018/

How was this created? It would be cool to see how I could download copies of this for personal analysis.

https://registry.opendata.aws/irs990/

A dataset of IRS 990 filings are available there. It is a big collection of XML files.

Here is an example of one chosen at random: https://pastebin.com/pzNYBZYQ

EDIT: here is the same thru propublica explorer:

https://projects.propublica.org/nonprofits/organizations/437...

which links here, which is the document I posted:

https://s3.amazonaws.com/irs-form-990/201643199349201044_pub...


Yup. We’ve been using this data for a while to render e-filed 990s on our site and to extract highly paid employees. Now we just strip the markup out and toss it all into elasticsearch for search. It’s really interesting to surface things like grants.

I will say for personal analysis that the schema has a habit of changing, and things like grants can appear in multiple places depending on the context. What’s more, just 2/3rds of nonprofits e-file now (and I’m sure fewer and fewer the further back you go) Just some things to look out for.

If you’re interested in processing the 990 XML data though, check out the truly excellent irsx: https://github.com/jsfenfen/990-xml-reader


If you don't e-file does that mean the IRS don't digitise your accounts and so you avoid appearing in these sorts of data sets?

Sounds like a lot of interesting data will be in that last third, in which case.


We’re building a graph of this same underlying data at https://alma.app (with a lot of enrichments) to help people discover and donate to nonprofits.

E.g. here’s Stanford: https://alma.app/charities/941156365-the-board-of-trustees-o...

Would love to know what types of analysis people would like to see about organizations and their relationships?


Could you clarify what you mean by “building a graph”? Are you combining the 990 data plus other enrichments in a graph database?

Yep basically - looking at the network of grants (who funds who), people (who works where over their careers), social media and news (who’s referenced or cited alongside whom) and so on. You start to see interesting clusters of nonprofits that are funded by the same groups and work together extensively.

Neat. What technologies are you using to store and analyze the data? Are you using any graph algorithms, like PageRank, community detection, etc?

DGraph and elasticsearch with some redis caching.

Yea, a version of pagerank is coming soon! Love any other ideas...


> Would love to know what types of analysis people would like to see about organizations and their relationships?

Related public records, e.g. court filings, municipal or other co-investments alongside the nonprofits, adjacent (time or geo) legislation/policy changes, rotating doors of nonprofit, gov, commercial.


Interesting thanks!

This is neat! I thought about extracting the grants (still might), but full-text seemed like good bang for the buck. Your tools sound like they might be very useful for reporters. Have you given any thought to that? We love mapping these sorts of connections.

Hey, first up really amazing work you’re doing, hugely inspiring for us! Thank you.

Whilst our focus has been delivering a consumer layer on top of all this data, yea, very open to exposing our underlying graph to others. Want to drop me a note at dan at alma.app?

As you mention elsewhere, half the battle is cleaning the data and getting quality.


Oh hey I built this. Let me know if you have any questions about how it works.

Edit: wrote a little bit about that here - https://news.ycombinator.com/item?id=20141744


Is it possible to sort after searching? Or filter by annual revenue? I'd like to see the organizations with the highest annual revenue for my specific searches. Thanks!

Those are good suggestions! I'm planing to add sorting by year, but revenue makes sense as well.

Any update on this?

Cool. I had one question, what's the usual lead time for non-profit data to show up in this dataset (e.g. when would you expect that 2018 forms/data would appear)?

There are already some 2018 forms in there, but it's based on fiscal year. So a nonprofit whose FY is Dec 2018 would have had to just file last month -- and sometimes they file late or get extensions. And again, this only covers e-filed forms -- they could file on paper, in which case you'd have to use the nonprofit name search and check for a filing.

This is great. AC is an old friend of mine, you guys are doing amazing work at ProPublica. Nice to see the tech work that runs behind the stories.

Just thought this was curious, if I search for "HOCKANUM VALLEY COMMUNITY COUNCIL" nothing comes up for them specifically but if I search for the town they are in then the business appears in that list.

That's super weird! Comes up when I search other text in their form too. I'm gonna flag this and take a look tomorrow. Thanks!

assuming elastic search?

Yup! We use it for a bunch of things, and I thought: what if I just dumped all this into it?

How do you get the data? I wasn't able to find the forms for 2018 for some charities. Did the IRS make them available yet?

The IRS puts them on an s3 bucket: https://docs.opendata.aws/irs-990/readme.html

There are 2018 filings in there, but many charities have fiscal years that end in Dec. IIRC, they generally file within 6 months. Given things like human error, bureaucracy and filing extensions... more should start rolling in over time.


Deletable nit: could this be retitled to spell out “3 million?” I expected something about 3M, the industrial conglomerate.

Indeed. I would have been pretty interested in what LOB was non-profit under the umbrella of Minnesota Mining and Manufacturing.

I actually thought the same thing. Or maybe a $ sign prior to 3M. $3M could be easier to distinguish quickly.

[edit] It seems that this is a link to 3 million records. Not a 3 million dollars worth. It seems I was even confused with another way that the headline could be perceived.


Its not $3 million, its 3 million tax records from non-profits, digitally filed by those non profits, from 2011 until now.

"The new feature contains every electronically filed Form 990, 990-PF and 990-EZ released by the IRS from 2011 to date. That’s nearly 3 million filings. The search does not include forms filed on paper."


Ah, then it should be something like 'Search the Full Text of 3 million records from a Non-profit tax filings' or just add the word millions. Thanks for the correction.

I thought this was about 3M as well

I immediately thought the article was about some kind of Panama Papers but regarding 3M Corp :b

We don't really need a number up there, so I just took it out.

The community appreciates the change! Thanks for listening!

They also have an api

Many of these forms are empty it seems. I tried two random ones, and I could not find any data in them. Also, could not find some non-profits. Does anyone know how much coverage does this database have?

I guess Moms Across America didn't file a tax return electronically?

Reference: https://news.ycombinator.com/item?id=20032686

Thanks for flagging my comment, jerks. What the fuck.


We've banned this account for repeatedly violating the site guidelines and ignoring our many requests to stop.

https://news.ycombinator.com/newsguidelines.html


Great. Now will you delete all my comments?


This is amazing work that enables transparency, if every transaction was indeed reported to the IRS. Typically we would have to manually search each organization's website to obtain this information so kudos to them for publishing everything all in one place and even with full text search feature and api.

Legal | privacy