Data Science Essentials in Python: Collect – Organize – Explore – Predict – Value

0
(0)

Data Science Essentials in Python: Collect – Organize – Explore – Predict – Value
 

  • Author:Dmitry Zinoviev
  • Length: 200 pages
  • Edition: 1
  • Publisher: Pragmatic Bookshelf
  • Publication Date: 2016-08-20
  • ISBN-10: 1680501844
  • ISBN-13: 9781680501841
  • Sales Rank: #886264 (See Top 100 Books)
  • Download:Register/Login to Download
  • Buy Print:Buy from amazon


    Book Description

    Go from messy, unstructured artifacts stored in SQL and NoSQL databases to a neat, well-organized dataset with this quick reference for the busy data scientist. Understand text mining, machine learning, and network analysis; process numeric data with the NumPy and Pandas modules; describe and analyze data using statistical and network-theoretical methods; and see actual examples of data analysis at work. This one-stop solution covers the essential data science you need in Python.

    Data science is one of the fastest-growing disciplines in terms of academic research, student enrollment, and employment. Python, with its flexibility and scalability, is quickly overtaking the R language for data-scientific projects. Keep Python data-science concepts at your fingertips with this modular, quick reference to the tools used to acquire, clean, analyze, and store data.

    This one-stop solution covers essential Python, databases, network analysis, natural language processing, elements of machine learning, and visualization. Access structured and unstructured text and numeric data from local files, databases, and the Internet. Arrange, rearrange, and clean the data. Work with relational and non-relational databases, data visualization, and simple predictive analysis (regressions, clustering, and decision trees). See how typical data analysis problems are handled. And try your hand at your own solutions to a variety of medium-scale projects that are fun to work on and look good on your resume.

    Keep this handy quick guide at your side whether you’re a student, an entry-level data science professional converting from R to Python, or a seasoned Python developer who doesn’t want to memorize every function and option.

    What You Need:

    You need a decent distribution of Python 3.3 or above that includes at least NLTK, Pandas, NumPy, Matplotlib, Networkx, SciKit-Learn, and BeautifulSoup. A great distribution that meets the requirements is Anaconda, available for free from www.continuum.io. If you plan to set up your own database servers, you also need MySQL (www.mysql.com) and MongoDB (www.mongodb.com). Both packages are free and run on Windows, Linux, and Mac OS.

    Table of Contents

    Chapter 1. What Is Data Science?
    Unit 1. Data Analysis Sequence
    Unit 2. Data Acquisition Pipeline
    Unit 3. Report Structure

    Chapter 2. Core Python for Data Science
    Unit 4. Understanding Basic String Functions
    Unit 5. Choosing the Right Data Structure
    Unit 6. Comprehending Lists Through List Comprehension
    Unit 7. Counting with Counters
    Unit 8. Working with Files
    Unit 9. Reaching the Web
    Unit 10. Pattern Matching with Regular Expressions
    Unit 11. Globbing File Names and Other Strings
    Unit 12. Pickling and Unpickling Data

    Chapter 3. Working with Text Data
    Unit 13. Processing HTML Files
    Unit 14. Handling CSV Files
    Unit 15. Reading JSON Files
    Unit 16. Processing Texts in Natural Languages

    Chapter 4. Working with Databases
    Unit 17. Setting Up a MySQL Database
    Unit 18. Using a MySQL Database: Command Line
    Unit 19. Using a MySQL Database: pymysql
    Unit 20. Taming Document Stores: MongoDB

    Chapter 5. Working with Tabular Numeric Data
    Unit 21. Creating Arrays
    Unit 22. Transposing and Reshaping
    Unit 23. Indexing and Slicing
    Unit 24. Broadcasting
    Unit 25. Demystifying Universal Functions
    Unit 26. Understanding Conditional Functions
    Unit 27. Aggregating and Ordering Arrays
    Unit 28. Treating Arrays as Sets
    Unit 29. Saving and Reading Arrays
    Unit 30. Generating a Synthetic Sine Wave

    Chapter 6. Working with Data Series and Frames
    Unit 31. Getting Used to Pandas Data Structures
    Unit 32. Reshaping Data
    Unit 33. Handling Missing Data
    Unit 34. Combining Data
    Unit 35. Ordering and Describing Data
    Unit 36. Transforming Data
    Unit 37. Taming Pandas File I/O

    Chapter 7. Working with Network Data
    Unit 38. Dissecting Graphs
    Unit 39. Network Analysis Sequence
    Unit 40. Harnessing Networkx

    Chapter 8. Plotting
    Unit 41. Basic Plotting with PyPlot
    Unit 42. Getting to Know Other Plot Types
    Unit 43. Mastering Embellishments
    Unit 44. Plotting with Pandas

    Chapter 9. Probability and Statistics
    Unit 45. Reviewing Probability Distributions
    Unit 46. Recollecting Statistical Measures
    Unit 47. Doing Stats the Python Way

    Chapter 10. Machine Learning
    Unit 48. Designing a Predictive Experiment
    Unit 49. Fitting a Linear Regression
    Unit 50. Grouping Data with K-Means Clustering
    Unit 51. Surviving in Random Decision Forests

    Appendix A1. Further Reading
    Appendix A2. Solutions to Single-Star Projects

    中文:

    书名:Data Science Essentials in Python: Collect – Organize – Explore – Predict – Value

    从存储在SQL和NoSQL数据库中的杂乱、非结构化的构件转变为整洁、组织良好的数据集,这是为忙碌的数据科学家提供的快速参考。了解文本挖掘、机器学习和网络分析;使用NumPy和Pandas模块处理数字数据;使用统计和网络理论方法描述和分析数据;查看实际工作中的数据分析示例。这个一站式解决方案涵盖了您在Python中所需的基本数据科学。

    就学术研究、学生入学和就业而言,数据科学是增长最快的学科之一。在数据科学项目中,凭借其灵活性和可伸缩性,Python正在迅速超过R语言。通过这个模块化、快速参考用于获取、清理、分析和存储数据的工具,让您随时掌握Python数据科学概念。

    这一一站式解决方案涵盖了基本的Python、数据库、网络分析、自然语言处理、机器学习元素和可视化。从本地文件、数据库和Internet访问结构化和非结构化文本和数字数据。排列、重新排列和清理数据。使用关系和非关系数据库、数据可视化和简单的预测分析(回归、聚类和决策树)。了解如何处理典型的数据分析问题。在各种中等规模的项目中尝试你自己的解决方案,这些项目工作起来很有趣,在你的简历上也很好看。

    无论您是学生、从R转换到Python的初级数据科学专业人员,还是不想记住每个函数和选项的经验丰富的Python开发人员,都可以随身携带这本便捷的快速指南。

    您需要的:

    您需要一个像样的Python3.3或更高版本的发行版,其中至少包括NLTK、Pandas、NumPy、Matplotlib、Networkx、SciKit-Learning和BeautifulSoup。满足要求的一个很好的发行版是Anaconda,可以从www.Continum.io免费获得。如果您计划设置自己的数据库服务器,您还需要MySQL(www.mysql.com)和MongoDB(www.mongob.com)。这两个包都是免费的,可以在Windows、Linux和Mac OS上运行。

    目录表

    Chapter 1. What Is Data Science?
    Unit 1. Data Analysis Sequence
    Unit 2. Data Acquisition Pipeline
    Unit 3. Report Structure

    第2章:数据科学的核心Python
    Unit 4. Understanding Basic String Functions
    Unit 5. Choosing the Right Data Structure
    单元6.通过列表理解理解列表
    单元7.用计数器计数
    第8单元:使用文件
    Unit 9. Reaching the Web
    单元10.使用正则表达式进行模式匹配
    单元11.合并文件名和其他字符串
    单元12.酸洗和取消挑选数据

    Chapter 3. Working with Text Data
    单元13.处理HTML文件
    Unit 14. Handling CSV Files
    单元15.阅读JSON文件
    Unit 16. Processing Texts in Natural Languages

    第4章.使用数据库
    第17单元:设置MySQL数据库
    单元18.使用MySQL数据库:命令行
    单元19.使用MySQL数据库:pymysql
    Unit 20. Taming Document Stores: MongoDB

    Chapter 5. Working with Tabular Numeric Data
    第21单元。创建数组
    第22单元。调换和重塑
    第23单元。索引和切片
    第24单元。广播
    第25单元。揭开泛函的神秘面纱
    Unit 26. Understanding Conditional Functions
    第27单元。聚合和排序数组
    Unit 28. Treating Arrays as Sets
    第29单元。保存和读取数组
    第30单元。生成合成正弦波

    第6章:使用数据系列和帧
    第31单元。习惯Pandas数据结构
    Unit 32. Reshaping Data
    Unit 33. Handling Missing Data
    34单元。组合数据
    第35单元。对数据进行排序和描述
    36单元。转换数据
    Unit 37. Taming Pandas File I/O

    第7章:使用网络数据
    38号单元。剖分图
    第39单元。网络分析序列
    Unit 40. Harnessing Networkx

    第八章.策划
    第41单元。使用PyPlot进行基本打印
    42单元。了解其他剧情类型
    Unit 43. Mastering Embellishments
    第44单元。与熊猫一起谋划

    第九章概率与统计
    45单元。回顾概率分布
    Unit 46. Recollecting Statistical Measures
    47单元。以Python的方式进行统计

    Chapter 10. Machine Learning
    Unit 48. Designing a Predictive Experiment
    第49单元。拟合线性回归
    第50单元。用K-均值聚类对数据进行分组
    第51单元。在随机决策森林中生存

    附录A1。进一步阅读
    附录A2。一星级项目的解决方案

  • 下载电子版:下载地址
  • 购买纸质版:亚马逊商城

    点击星号评分!

    平均分 0 / 5. 投票数: 0

    还没有投票!请为他投一票。

  • 推荐阅读

    评论 抢沙发

    评论前必须登录!

     

    登录

    找回密码

    注册