Freakonometrics

To content | To menu | To search

Tag - dataset

Entries feed - Comments feed

Thursday, May 24 2012

French dataset: population and GPS coordinates

A short post today based on recent work by @3wen (Ewen Gallic, graduate Student in Rennes, spending a year in Montreal). Since we were working on a detailed French dataset (per commune), we needed a dataset containing a list all communes, with population and location. GPS coordinates were extracted from Google, using the following php file, inspired by http://www.andrew-kirkpatrick.com/ on Google geocoding api with php webpage. Population was interpolated from INSEE's datasets, i.e. http://www.insee.fr/ (since data are over a 35 year period, from 1975 to 2010, changes have been taken into account as carefully are possible - e.g. merges and splits of cities - based on that description). A spline model has been used for all cities (with three degrees of freedom, and null and negative interpolation became one, since we'll be using loglinear models afterwards). Names are from that dataset, still on INSEE's website, http://www.insee.fr/.

A zipped file can be downloaded here popfr19752010.zip, but it is also possible to use the code below (it is a 24Mo dataset). Since it was hard to find such a dataset online (different files can be found, but we found none with population and location), we have decided to upload that dataset. Please let us know if there are problems with those data...

> base=read.csv(
+
"http://freakonometrics.free.fr/popfr19752010.csv", + header=TRUE)

Using that code, it is possible to locate all the communes in France (metropolitan), for instance

> library(maps)
> map("france")
> points(base$long,base$lat,cex=.1,col="red",pch=19)
> points(base$long,base$lat,cex=2*base$pop_2010/
+ max(base$pop_2010),col="blue",pch=19)

Several additional lines of code on that dataset (and also others) will be uploaded, soon.

Cette oeuvre est mise à disposition sous licence Paternité - Partage à l'Identique 3.0 non transposé. Pour voir une copie de cette licence, visitez http://creativecommons.org/. Date : 24 mai 2012, par Ewen GALLIC. Sources : INSEE, API Google Maps v3 et GeoHack (coordonnées GPS), propres calculs (estimation de population à partir des données INSEE).
  • reg : code region INSEE (character)
  • dep : code departement INSEE (character, corse 201 et 202 au lieu de 2A et 2B)
  • com : code commune INSEE (character)
  • article : article du nom de la commune (character)
  • com_nom : nom de la commune (character)
  • long : longitude (numeric)
  • lat : latitude (numeric)
  • pop_i : estimation de la population à la date i (ramenée à 1 si <=0), i=1975,...,2010 (numeric)

Monday, May 4 2009

Wanted: datasets with individual observations

In a recent post on google datasets (here), only 4 hours after posting a note, I received a comment with a link to the website to get the datasets. And since I notice that a lot of practitioners visit my blog (i.e. neither academic nor students), I decided to post a note to request datasets. For my Econometrics course in the Master program in Rennes, I am looking for datasets with individual observations, with continuous response variables, and a few continuous (but also, maybe, one or two discrete) explanatory variables. I found it hard to find such datasets on the internet. If you have datasets you are willing to share, please send me an email (here).