Data And Bagels
Some digital journalism tips and tricks (including some learned at NICAR17) for a quick talk at the University of Montana Journalism School
First off: Download this zip file. If you want to stick around until the end for some very basic HTML / CSS, download the Sublime Text Editor.
A little about us
Claire Chandler, Kelsey Johnson, Kayla Robertson.
Data journalism glossary
Terms to know. Or, at least pretend to know and reference during job interviews.
-
Open Source Open source refers to a philosophy and a means of developing and licensing software and other copyrighted works so that others are free to inspect, use and adapt the original source material. Promotes collaboration and iterative software development. Examples: R, GitHub, Google, IBM.
-
iFrame An HTML tag that allows for one web page to be wholly included inside another; it is a popular way to create embeddable interactive features. Iframes are usually constructed via JavaScript as a way around web browsers’ security features, which try to prevent JavaScript on one page from quickly talking to JavaScript on an external page. Many security breaches have been designed using iframes. For example: embedding an Instagram post or a Google calendar in a news article on a website. Need the iFrame code.
-
Web Scraping: extracting data from a website. Instead of copying and pasting info from a website, you can use software to copy the data automatically and either put it in a database/spreadsheet or save it to your computer. Plus you can get a lot more info than what meets the eye. For example: scraping Trump’s tweets off of Twitter. You could manually copy and paste each Tweet, a simple line of code can take all of them and even more info, such as IP address etc.
-
API: The way in which objects (could be software, an application or a piece of an application) talks to a different application/piece of application/software. So, let’s say you want to access Google calendar. You open up your browser (a piece of software) and ask it to bring up Google. The way in which your software asks Google calendar to appear on your browser is an API. Google’s serve in turn replies to you browser, sending data and functionality through an API.
-
Data story: a story that uses data- either a statistic or graph/trend as a source instead of a person. It is learning to make data into a source in a story.
-
Back End: The backend usually consists of three parts: a server, an application, and a database. If you book a flight or buy concert tickets, you usually open a website and interact with the frontend. Once you’ve entered that information, the application stores it in a database that was created on a server. For sake of ease, just think about a database as a giant Excel spreadsheet on your computer, but your computer (server) is stored somewhere in Arizona.
-
R: A statistical, open-source programming language
-
Python: A sophisticated computer language that is commonly used for Internet applications. Designed to be a very readable language, it is named after Monty Python. It first appeared in 1991 and was originally created by Guido van Rossum, a Dutch computer programmer who now works at Google. Python files generally end in .py.
-
Ruby: An increasingly popular programming language known for being powerful yet easy to write with- open-source
-
Front End: When we discuss the “frontend” of the web, what we’re really talking about is the part of the web that you can see and interact with. The frontend usually consists of two parts: the web design and front end web development.
-
HTML: The dominant formatting language used on the World Wide Web to publish text, images and other elements. HTML uses pairs of opening and closing tags (also known as elements), such as
and ; each pair assigns meaning to the text that appears between them. HTML can be considered code, but it is not a programming language; it’s a markup language, which is a separate beast. Creates the structure of how the text is laid out. -
CSS: CSS files determine the style of a website, like the fonts, the colors, the arrangement of items or even animations. CSS modifies the design and display of HTML elements.
-
Javascript: JavaScript is a programming language used to make web pages interactive. It runs on your visitor’s computer and doesn’t require constant downloads from your website. JavaScript is often used to create polls and quizzes.
-
D3: D3 is a library in Javascript that allows coders to make visualizations based on data in web browsers.
-
Infographic : a way to visualize statistics or information. Could be in the form of a bar chart, a map, different sized circles etc.
- Data file types : most tools and generators work with these file types.
- CSV: a comma separated value- a simple data format that stores data in text documents and uses commas to delineate between values.
-
Microsoft Excel- a software that has an easy to use interface allowing people to edit and organize data into meaningful patterns.
- JSON (JavaScript Object Notation): an easy way to store data. Commonly used to pass info between applications.
-
Data journalism The ability to work with data is an increasingly important part of a journalist’s armoury. Skills needed to research and tell a good data-based story include finding relevant data, data cleaning, exploring or mining the data to understand what story it is telling, and creating good visualisations
- Github: a central depository system that allows users to store, download documents/code, revise code/documents and re-upload with changes. Github keeps the version online current by integrating new revisions and showing the changes made. It also allows anyone to download (open-source) and is super useful when collaborating on code. Github is predominately used by developers, but could really store any form of document.
So where do I find this “data” stuff?
Here’s a few solid resources. Bookmark these to use as sources for stories, or search these for story ideas (look for outliers).
Data.gov
All of the U.S. government’s open data. Nice, easy to use website. If you’re writing a story about climate change, you can easily search by that + Montana or U.S. to see what kind of data you need. From there, you download.
Census American Fact Finder
Basically all of the census data. The website is super dense and not super intuitive, but it’s a goldmine of census info, which is great because historically the census is super accurate, recurring by year so it’s reliable, and great for demographic-based stories (race, age, ect).
National Data Repository
Wikipedia page of data hubs of different countries that are government funded / based.
Google Public Data Explorer
Can help you visualize a dataset. Kind of like Google search, it searches open datasets. Doesn’t have everything and the search can be tricky. But it can quickly show you what a dataset has in it without you having to analyze it and make your own infographic. You can easily see if the dataset is worth downloading, etc.
General searching tips
The file types PDF and Excel are often where the good data is. To find these:
- Google search: missoula housing data filetype:pdf
- Searches only PDFS with those keywords
- Google search “montana meth project” filetype:xls
- Searches only Excel files with those exact terms
Making your own data
- Scraping social media is a great way to compile your own datasets. There are tons of stories that can be pulled from social media that we might not find without scraping the web.
- https://blog.datastories.com/blog/reddit-front-page
- http://www.trumptwitterarchive.com/
-
So today I’ll show you a quick demo about how to scrape Twitter, up to 3200 tweets. See the the original documentation on Github to follow along on your own computer or to use this tool in the future. For the sake of time, I’m going to skip over a few of the beginning steps and show how you how to run the script once you have your computer set up correctly.
-
Go to https://apps.twitter.com/. You need to make an “application” to be allowed to scrape someone’s tweets, so it knows who is doing it.
- Enter the information in the secrets.py file in Sublime text.
Ok, so I have this data. What can I do with it?
Infographics are cool and good. Why? Infographics make stories more believable, more shareable, and more digestible for audiences. How your story is presented online is everything. Here are some tools that reporters (or anyone) can use to make their stories better — no coding or design knowledge involved. Here’s a few starter tools:
Datawrapper
-
ABOUT Datawrapper is a simple chart and map making tool created by journalists and Germany. It’s very user-friendly, mobile responsive and customizable according to your publication’s style guide. It’s also important to visualize any data you have, even if a chart isn’t necessary for your story. Overall data literacy enhances a reporter’s bullshit detector. Additionally, by having a designer create this (very simple) chart for you, you’ve outsourced your story and possibly missed something important. So let’s make a graph.
-
TUTORIAL
- First, find the file called POTUSTWEETS.csv in the folder you downloaded.
- Clean data: only use numbers — no dollar signs, percent signs or commas. Datawrapper will read this as a string! Append and prepend your data easily within Datawrapper after you’ve uploaded it.
- Note section is great for transparency, esp. If you have a pie chart that adds up to more than 100%. Explain the study.
Knightlab tools
Knightlab is a leading innovator in journalism technology, and makes a lot of opensource (free) resources for journalists. There are three main tools which are very usable, responsive and easy to learn. Let’s look at how to make an interactive timeline:
Timeline.js
- What it is: Open source tool for making interactive, responsive timelines.
- Cool because: Can import content from Twitter, Flickr, YouTube, Vimeo, Google Maps, Wikipedia, SoundCloud. Responsive on all devices, pretty easy to use.
- How to do it:
- Timeline.js uses Google Sheets. Click here to get the template and follow along.
- Click Make a Timeline. Click Make a Copy. important: Don’t change the column headers, don’t remove any columns, and don’t leave any blank rows in your spreadsheet.
-
It may look daunting, but all the hard Excel work has been done for you. Just fill in the columns with your information. I’ve taken the data Kelsey scraped (POTUS tweets from March). I’ll do the last few lines to show you what I did. (Feel free to make your timeline based on anything — fake dates, your life events, ect. for).
- Input each day, month and time into separate columns. Add a headline for the slide and add your body text. Options: Links to media (must copy image address for photos. Must be hosted somewhere online) This is also where you add Soundclouds, embed codes, ect.
- Each downward row (1, 2, 3, 4) will be a different ‘slide’ with a different headline and visual and date.
- When you’re done: File, publish to the Web. Click OK. Yes, you are sure.
- Now, copy the URL it gives you.
- To generate the timeline, go back to Timeline.js website (https://timeline.knightlab.com/#make). Paste the code you copied into step 3.
- Here you can preview, change a few settings, and grab the iframe code to publish to your site!
- Timeline.js uses Google Sheets. Click here to get the template and follow along.
So much more you can do with data — bookmark these:
-
DataViz Catalogue — A resources that lets you search by both the type of chart and by function of what you want to show based on your dadta. Then it recomends websites and tools of how to make them, and tips on what kind of data fits best. Nice.
-
RAW Graphics — another cool infographic-generator to check out.
Bots
-
Bots: Short for “web robot.” An automated application that sends automated tasks (scripts, messages or texts) over the internet. May be used with “chat apps” such as Facebook messenger or Slack.
-
Example: Botlist
-
The goal is to create a friendly robot to interact with customers or readers in a way that empathizes with the human condition. “We think that people are going to start projecting their expectations of how human interactions work onto bots,” he said. “Just like they do with pets. They know the cat can’t understand what they’re saying, but it’s almost like you can’t help yourself because you have an emotional connection with it. We think it’s going to be the same with bots,” Poncho CEO Sam Mandel.
-
Poncho example
-
How news orgs are using bots:
- Higher engagement- by making news a conversation and tailored to each individual based on their questions, news orgs can generate higher engagement.
- Audience development: Chat bots allow news orgs to participate on a different, more personal platform.
- How you can use bots to keep up on news:
- NYT, WSJ have messenger bots that send you morning updates or election updates. Can ask it direct news questions and it retrieves the info in your messenger. As you use a news bot more, it tailors news to fit your interest. It is a blend of AI and human interaction. Most pull from a database of answers constantly updated by humans.
-
Open Messenger, tap on the search bar, scroll down to see some bots you can install. Get some!
-
How you can use bots to tell stories: polls, questions, can get reader feedback and tips, customer service and can create personalized feeds.
- Building bots — Botlist.com
HTML/CSS: Or, how do these things get published?
-
Remember that .zip folder we downloaded at the beginning? Extact the folder called “FirstHTML” and put it somewhere easily accessible, like on your dekstop.
-
Open load Sublime Text Editor (command + Space — search Sublime). Why? It’s Opensource, easy to use, free (just click “no” if they want you to donate), widely used by journo developers. Or, if you already have a text editor like Atom or Brackets, that’s fine. Need a text editor like this — DON’T use Microsoft word or TextEdit.
-
File — open. Open the document called “index.html”. The Index file is where all of your content lives. You can’t have a webpage without a home HTML files, typically named index.
-
What’s in this file?
- Heading stuff
-
- Mostly meta data goes here, adn links to other sources of code you may need to reference.
- Let’s add a title. This is what shows up at the title of the tab in your browser. In the space between
and , write something like “My First HTML Page”.
-
- this is where all of the content that you see on a webpage goes. Right now it’s blank. If you go back to the folder on your desktop and double click on index.html, it will open it in your browser. Nothing is here. That’s ok. Let’s add some stuff.
- Back in Sublime:
-
-
Save your document (file, save). Go back to your browser and refresh your “My First HTML Page”.
- Let’s add more content and a little styling.
<strong> around some words </strong> and they become bold!
-
Adding images:The img tag adds images onto the page. Alt text is important for accessiblity (readers that use a screen reader) and for navigation if your page has trouble loading.
- ```
- <img src=”jschool.jpeg” alt=”University of Montana School of Journalism” height=”50” width=”50”>
```
-
Adding links
- Under the image, create a hyperlink that goes to the J-school website using the tag. For example:
<a href=”https://jour.umt.edu/”>Click here to visit the UMT J-School Website.</a>
```
- More cool stuff:
- iFrames: For example, the timeline iframe code we generated earlier can be dropped into the body of an html document and will show up!
- CSS
-
Link to your stylesheet (called style.css) in the head of the document using “”
- styling an H1
- Linking to another page
-
- Keep in mind: You don’t have to memorize any of this — you can always copy from W3Schools. It is a leading web resource with how-tos on just about everything web design.