The Gun Violence Archive (GVA) is a database of gun violence and gun crime in general aggregated from a diverse array of sources. The specific details of their methodology for tagging incidents is listed under their Methodology.

#Table of Contents

Getting the Data

After notifying them by e-mail of my intentions, I proceeded to build a small crawler that would index all of the incidents they have on file and dump the relevant block of HTML into a text file for later processing.

The results.txt file stores all of the GVA URL indices and their respective HTML <div> elements containing incident reports. The next step is to create a parser that will read through the HTML and populate a SQLite database of incidents.

Since incidents have only one location and time but potentially no limit on how many people are involved, I decided to make each entry of the database tie to an individual. Thus the parser will create entries for each person who has been involved in a gun-related incident.

Of course, additional fields can be defined but the relevant lines of code have to be changed to accommodate them. This particular block of code can take a long time to run because I’m using the html5lib parser, which reads the whole document in. However, this is only a one-time cost (as future updates to the database will be only those events that have happened since we last indexed).


Now we have a SQLite database called gva.db with the table GVA in it. We can now ask it some basic questions:

With a database in hand, the door is open to all sorts of investigation. In particular, let’s explore the relationship between the individuals in the database, the event (as defined and used by the GVA), and the “outcomes”; here, I will define an outcome as the status of the individual (either perpetrator or victim) at the end of the event. The two contrasts to look at are victim/perpetrator, and male/female. The definition of “perpetrator” in this situation is not always the person using the gun; for example, someone who is shot at during an attempted home invasion will be called a “perpetrator”. Using the GVA’s notation, the possible outcomes are:

  1. Unharmed.
  2. Injured.
  3. Killed.
  4. Arrested.

A simple python script can automate the access, storage, and plotting of the data:

With my treatment of the data, these are tags attached to gun related incidents; it is not quite a direct count of factors, since a single event frequently has many tags attached to it in the GVA. However, the relative frequency of the different tags can still communicate what the nature of gun incidents and gun violence in America looks like:

###Top event characteristics for all perpetrators in the GVA

###Top event characteristics for all victims in the GVA

DGU stands for “Defensive Gun Use”. Looking at the data for all perpetrators and victims, Armed robbery is the number one tag in use for both. On the victim side, it turns out Mass Shooting sits just below Accidental Shooting in terms of frequency in the GVA database.

Now let’s examine the gender differences, if any (for those records where gender is filled in):

###Top event characteristics for male perpetrators in the GVA

###Top event characteristics for female perpetrators in the GVA

Armed robbery is at roughly the same frequency for male perpetrators as for the whole database, but for female perpetrators it has fallen down from 1st to 4th most used tag, with Non-Shooting Incident rising from 3rd to 1st most used. Domestic Violence for male perpetrators sits between Defensive Use and Car-jacking at 2.16%, while for female perpetrators it sits between Brandishing and ATF/LE Confiscation at 5.00%. Now let’s flip the relationship and look at victims:

###Top event characteristics for male victims in the GVA

###Top event characteristics for female victims in the GVA

Flipping from perpetrator to victim we see a dramatic change: Domestic Violence for female victims is top of the list, at 9.24% (compare that to 3.89% when we looked at all victims, and 2.74% for male victims).

###Top event characteristics for killed perpetrators in the GVA

###Top event characteristics for killed victims in the GVA

Lastly, looking at how deaths from gun incidents are distributed we see that Suicide and Murder/Suicide are both at the top of the list along with law enforcement related tags (evoking, among other things, the notion of “suicide by cop”). Suicide and Domestic Violence being at the top of the list for killed victims is both a disheartening but important fact to keep in mind about the frequency of these causes in gun casualties.


The final thing to be done with the data is to make it accessible. I picked four types of event tags that figure prominently in discussions of guns and gun control to highlight:

  1. Accidental discharge.
  2. Use in defense.
  3. Use by a third party “good samaritan” in defense.
  4. Mass shootings.
  5. Suicide.

The result of this effort is a gun violence choropleth, age/gender histogram, and time series d3.js document. The document aims to put at one’s fingertips national and state level data and sources. From playing around some qualitative observations can be made:

  1. Accidental gun injuries and deaths skew younger in general in a bimodal manner with one spike at 3 years and the other at 15 years.
  2. Looking at it per-capita, there is a distinct geographic difference between states where defensive firearm injuries and deaths occur and accidental firearm injuries and deaths occur.
  3. The time series contains many artifacts from how police reports are filed. This is an important thing to take into account if we were to attempt any sort of time series analysis to investigate if gun violence events are correlated across time.
  4. Looking at it by the numbers, “good samaritan” events that lead to injury or death are very rare. This may be a blind spot in the GVA: since the GVA only tracks events that get reported to the police or otherwise make the news, we have no idea how many times a standoff was resolved peacefully without notice or report but with a gun.


The chief flaw with the GVA dataset is that its tagging system works well for simple events, but becomes difficult for events involving many victims and perpetrators. For example, a naive search of events where someone is “Unharmed” still turns up entries with tags such as “Shot - Wounded/Injured” and “Shot - Dead (murder, accidental, suicide)”. In this instance, the event-centric model of the GVA makes it difficult to proceed in a person-centric model.

Another flaw of the GVA database is the perennial issue of missing data fields:

  1. Age: 51.10% of the entries have an age listed.
  2. Relationships: 3.28% have a relationship (Friends, Family, etc.) listed.

That being said, there is still a lot of potential still in the richness of the GVA database and I have only scratched the surface:

  1. Location tags: 95.19% of the entries (165,121 entries) have exact GPS coordinates of the event. Knowing the location this precisely opens the door to a huge amount of additional data we can use, from political to demographic content. One potential extension is to join to the database the status of CCW permits (from most permissive to most restrictive). Another idea is to try to use the regional economic data from the BEA.
  2. Guns involved: the GVA tracks (if possible) all the guns reported in the incidents. This is useful potentially to test the hypothesis that different guns are used for primarily different purposes.
  3. Event tags: the tags the GVA applies to all the events can probably be grouped together in sensible ways, providing extra signal about different kinds of gun violence events. Ultimately, this points the way towards developing some type of classifier for gun violence events; when people have conversations about policy it is important to clearly illustrate their impact by ignoring the “noise” of gun violence events that the policy isn’t targeting; for example, a law designed to reduce domestic violence gun incidents should not be judged for failing to reduce the number of drive-by shootings.

Appendix: Some Extra Graphs

###Top event characteristics for unharmed perpetrators in the GVA

###Top event characteristics for unharmed victims in the GVA

The notable characteristics that jump out here are Institution/Group/Business and Drug involvement on the perpetrator side, and Defensive Use on the victim side- evidence that there are a substantial amount of gun incidents where the victim defensively employs a gun and walks away unharmed.

###Top event characteristics for injured perpetrators in the GVA

###Top event characteristics for injured victims in the GVA

###Top event characteristics for arrested perpetrators in the GVA

###Top event characteristics for arrested victims in the GVA