ONA Student Newsroom
  • Data reveal differences in return to in-person schooling
    • June 26, 2021
  • How three news start-ups approached innovation in 2020
    • June 22, 2021
  • Sports journalists, from left, Matt Musil of KHOW TV, Emily Giangreco of KVUE TV, and John Affleck, the Knight Chair for Sports Journalism at Penn State University.
    Virtual group interviews are changing sports coverage
    • June 22, 2021
  • In their memory: Pandemic offers opportunities to transform digital obits
    • June 21, 2021
  • COVID-19 vaccine incentives: do they work?
    • June 21, 2021
  • Home
  • ONA23 Conference
  • Online Journalism Awards
  • Ethics Tool
  • Knowledge Base
  • ONA Insights
  • Member Log In
ONA Student Newsroom
  • Innovation
  • Washington D.C.

Hundreds of D.C. TripAdvisor listings, visualized

  • Alan Hovorka
  • October 7, 2017
  • 3 minute read
Total
0
Shares
0
0
0
0

With thousands of journalists flooding into the nation’s capital, TripAdvisor data offers a snapshot of popular things to do in Washington, D.C.

The ONA Student Newsroom collected more than 600 attraction listings to see how their ratings and number of reviews correlated.

The bulk of the attractions we collected received ratings above 3 stars, with the average rating sitting at 4.26 stars, but the number of reviews for each attraction varied wildly. The median review count was 15, while the average came in at about 405 reviews, signaling heavy skewness in this metric because of a small number of widely-reviewed attractions.

[pym id=”dc_trip_advisor” url=”https://static.journalists.org/projects/dc-trip-advisor-matrix/index.html?c=10″]

How we did it

First, we found the location listings for Washington, D.C. on TripAdvisor’s website and then pulled down all of the HTML for those pages on my computer for faster, local scraping. Once we compiled a list of some 600 locations to visit in Washington, D.C., we ran a BeautifulSoup script that grabbed the rating, number of reviews and establishment category in the HTML. The script compiled all of these individuals objects into a new json file that we used in the visualization.

With the data in hand, we created the data visualization using D3. The scatter plot shows possible ratings across the x-axis and number of reviews along the y-axis. One problem we faced was how to show attractions that had a widely varying amount of reviews — some attractions had 10 reviews, while others had more than a 1,000.  We controlled for this issue by using a logarithmic scale in D3, which exaggerates the lower, more dense ranges of the data and shrinks the more sparse upper ranges. The scatterplot has a bit of clustering of these data points, and there are more than 600 overall. We added a filter to the data to allow users to focus on what establishments have the most and highest ratings.

Using category data, we made the scatterplot color coded and filterable by establishment type, such as “Landmarks” or “Nightlife.”

How you can use it

Visualizing this TripAdvisor data offers us a quick overview of some possibly interesting places in Washington, D.C. that tourists might not know about. It’s more accessible than scrolling through more than 600 location listings. You choose the establishment type you want and isolate the data to just those listings. A tooltip allows you to tap on the point and get the name, rating and location of the place.

Caveats

This interactive pulls location, rating and review information from TripAdvisor, but there are other data sources that offer differing perspectives. It’s only scanning what people who use TripAdvisor think are great or terrible places for tourism in Washington, D.C. The scraper did not pull all of the listings for every single thing to do in town. In the case of bars, it only grabbed the top 30 bars, according user reviews.

Due to several locations only having a handful of reviews, it’s possible that their review ratings are somewhat inflated. This is somewhat evident due to the clustering of reviews between four and five stars. However, this highlights a possible problem with review sites — reviews and ratings are only as reliable as the number and variety of reviews.

You can find the code for this project on Github.

Total
0
Shares
Share 0
Tweet 0
Pin it 0
Alan Hovorka

Previous Article
  • Conference
  • Innovation

What are people talking about at #ONA17?

  • Elaine Zhang
  • October 7, 2017
Read More
Next Article
Opener of ONA 17 sketchnote by Ayumi Fukuda Bennet
  • Conference

Old tools, new medium: Sketchnoting

  • Sebastian Vega
  • October 7, 2017
Read More

Special thanks to our Patron Sponsors
Google News Initiative logo

Content Authenticity Initiative logo

…and our Supporting Sponsors
Microsoft logo
The Atlanta Journal-Constitution logo

Top Articles
  • 1
    Ever heard of a ‘newsgame’? They aren’t as new as you might think
    • September 24, 2022
  • A table with three speakers sit in front of a room with people at round tables 2
    As journalists look to build trust, solutions journalism might help
    • September 24, 2022
  • Why news organizations are pivoting to short-form video and TikTok
    • September 24, 2022
  • screen shot of Zine produced by the ONA Newsroom in 2022 4
    To paper and back again – the #ONAZine
    • September 24, 2022
  • 5
    Women’s Leadership Accelerator cohorts back in person after two years
    • September 24, 2022
@ONANewsroom
My Tweets

Subscribe

Subscribe now to our newsletter

ONA Student Newsroom
Daily conference coverage from ONA's student newsroom

Input your search keywords and press Enter.