*DeepDive* Data Science vs. Data Engineering




Data Driven show

Summary: Frank and Andy talked about doing a Deep Dive show where they take a deep look into a particular data science technology, term, or methodology.  And now, they deliver!<br> <br> In this very first Deep Dive, Frank and Andy discuss the differences between Data Science and Data Engineering, where they overlap, where they differ, and why so many C-level execs can't seem to figure out the deltas.<br> <br> Links<br> Sponsor: Audible.com - Get a free audio book when you sign up for a free trial! <br> Sponsor: Enterprise Data &amp; Analytics<br> <br> Notable Quotes<br> Frank's new courses are up at WintellectNow (01:30)<br> David Goggins (03:00)<br> Dive! Dive! Dive! It's a deep dive on Data Science vs. Data Engineering (06:00)<br> "Clean data" means different things to different people. (09:30)<br> "Shaping the data." (11:00)<br> Our conversation with Buck Woody (12:30)<br> Andy's screed on managing NULLs (14:00)<br> Andy's screed on managing dupes (17:00)<br> Frank, on aggregation and schema changes... (21:21)<br> Attempted NoSQL definition (23:45)<br> On MySQL... (25:00)<br> Maybe "No" stands for "Not only" (26:45)<br> "What sorcery is this?!" (28:30)<br> Kevin Hazzard's article on Database Design if we started today (29:15)<br> Andy's opinion: We're not using the SSD-ness of SSD's (31:30)<br> "I don't know how much simpler you can get." - Andy (33:00)<br> Denny Cherry's company: Denny Cherry and Associates (34:45)<br> "... somewhere between useless and lying..." (35:45)<br> Frank on HDFS (38:00)<br> ClearDB wiped out 13 years of Frank's blog data, and we're still bothered by that. (40:30)<br> sklearn (42:50)<br> Correlation is not causation. (45:30)<br> How to Lie with Statistics (45:45)<br> Movie/TV Reference: Star Trek TNG (46:15)<br> CNTK (Microsoft Cognitive Toolkit) (48:00)<br> Frank, on selling ice cream... (49:25)<br> On over-fitting (55:30)<br> Training the model (56:30)<br> Request for feedback! (57:30)