Writing about my job: Data Scientist

by technicalities3 min read19th Jul 20215 comments

60

Job profileData scienceCareer choice
Frontpage

Data Scientist: Person who is worse at statistics than any statistician & worse at software engineering than any software engineer.

~ Will Cukierski

 

What: Data scientist in a multinational, in London. First hire in a new team. 

When: 2016-2019.
 

Background

When I arrived I had almost no ML experience; one Master's project. I did have 2 years of ordinary software dev experience, and given a new team with no infrastructure and vast amounts of engineering needed before the first model, this was enough.

The market was incredibly hot then, as it is now - about 30% annual turnover. This greatly lowers the bar. DM me if you want an introduction to some desperate managers.

(It is perhaps the best it will ever be to work in data: after the data deluge, before auto ML really gets there.)

Even so, my overall record is 3 offers out of 8 applications, 6 of which I applied for after I had real experience.

 

Day in the life

  • 1 hour meetings. Really not many meetings. Morning "standup" (10mins strictly). Usually one or two 1 hour things, including running our constant hiring rounds.
  • 3 hours data munging. The received wisdom is that half of the job involves just getting the data into a fit state to model. (Kaggle is not a data science platform, it simulates the easy and fun fifth of the job, after all the data janitorship and before the exhausting deployment and internal sales.)
  • 3 hours modelling. Usually one big modelling task. Insurance pricing, an ANN floorplan reader, a metre-precise 3D model of Britain's rivers, medical risk scoring models, countermeasures against machine-learning model extraction, meta-heuristic solvers...
  • 1 hour tech support. I was in the actuarial wing, their existing modelling function. Because my peers were not engineers or data people, I would usually spend an hour or two helping them with dev stuff - getting their ssh to work, writing scrapers, tutoring Python, improving SQL queries, code review, etc.
  • Probably most "big data" / data science projects fail. Our rate was better, only about one third fails. Inflated expectations and terrible data.

Skills developed

  • ML.
  • Interpretability. My industry is heavily regulated, so we were rarely able to just chuck a neural network at the problem. Shapley values are cool.
  • Cluster computing. We started with Hadoop, which is pretty outdated by now. We moved, painfully, to Spark on Databricks and Azure. This is great tech but extremely fiddly if you haven't encountered distributed or lazy systems before.
  • Dozens of kinds of data. Scrapers for house prices, council tax, fire accidents, house blueprints, the nasty Acxiom or Palantir lookups.
  • Vastly better sense of how the UK works. I know lots of weird things like the data reporting format and tempo of the UK's many fire stations, or the postcode coding system, or why adding safety features to a car increases its insurance premiums, or how to extract CAD coordinate data without paying for AutoCAD (...). Insurance professionals are doing social science, and moreover doing it with a relatively clear loss function and controlled experiments.
  • Hiring. I ran my first hiring round 2 months after joining. Turns out it is really stressful on the other side; CVs and interviews are really very little information. About 1 in 4 did ok on the simple classification task we set. No performance difference between undergrads and PhDs.

Bottom line

Extremely flexible hours, challenging nonroutine tasks, unlimited remote work, very good pay per hour (and 15% annual wage growth), massive amounts of autonomy (relative to manual work), friendly smart colleagues. I think I stayed late 3 times in 3 years - on one occasion this earned me a dinner with the big boss(?!). In-house yoga classes. Beautiful buildings. They paid for my second degree and gave 10% time off to study.

  • Entry salary, £41k.
  • Exit salary (2.5 years): £62k
  • + about £7k perks p.a. (3% annual bonus, training, tuition fees, travel, conference passes, bike scheme)


All that objective stuff said: there was something missing for someone odd like me.

See also

60

5 comments, sorted by Highlighting new comments since Today at 5:34 AM
New Comment

Thanks for the writeup. Minor point about salary, is £41k entry-level is typical for London? According to Glassdoor average base pay for US is $116k USD, equivalent to £85k. Their page for Data Scientists in London puts the average at £52k.

I get that this is an average overall levels of seniority, but it's also just your base pay. My impression from Levels.fyi is that at large US companies, base pay is only around 67-75% of total compensation.

So I guess what I'm asking is, given your experience, which of the following statements would you agree with:

  • The aggregate data is wrong or misleading
  • You're being underpaid
  • There really is a huge pay difference between the UK and US
  • Something else?

Big old US >> UK pay gap imo. Partial explanation for that: 32 days holiday in the UK vs 10 days US. 

(My base pay was 85% of total; 100% seems pretty normal in UK tech.)

Other big factor: this was in a sorta sleepy industry that tacitly trades off money for working the contracted 37.5 h week, unlike say startups. Per hour it was decent, particularly given 10% study time. 

If we say hustling places have a 50 h week (which is what one fancy startup actually told me they expected), then 41 looks fine

I also had a sticker shock here at the number. Thanks for including the Glassdoor links, I was very surprised that base pay in the US overall is higher than London (which is presumably the most expensive UK market).

I would guess US market (at least those reporting on Glassdoor) skews heavily SF/NYC, maybe Seattle.

FWIW I made $187K/yr in total comp (£136K/yr) in Chicago as a data scientist after four years of experience. My starting salary was $83K/yr in total comp (£60K/yr) with no experience. In both jobs, I worked about 30hrs/wk. My day-to-day experience was rather identical to this post.